1
|
Robles-Zazueta CA, Crespo-Herrera LA, Piñera-Chavez FJ, Rivera-Amado C, Aradottir GI. Climate change impacts on crop breeding: Targeting interacting biotic and abiotic stresses for wheat improvement. THE PLANT GENOME 2024; 17:e20365. [PMID: 37415292 DOI: 10.1002/tpg2.20365] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 05/23/2023] [Accepted: 05/30/2023] [Indexed: 07/08/2023]
Abstract
Wheat (Triticum aestivum L.) as a staple crop is closely interwoven into the development of modern society. Its influence on culture and economic development is global. Recent instability in wheat markets has demonstrated its importance in guaranteeing food security across national borders. Climate change threatens food security as it interacts with a multitude of factors impacting wheat production. The challenge needs to be addressed with a multidisciplinary perspective delivered across research, private, and government sectors. Many experimental studies have identified the major biotic and abiotic stresses impacting wheat production, but fewer have addressed the combinations of stresses that occur simultaneously or sequentially during the wheat growth cycle. Here, we argue that biotic and abiotic stress interactions, and the genetics and genomics underlying them, have been insufficiently addressed by the crop science community. We propose this as a reason for the limited transfer of practical and feasible climate adaptation knowledge from research projects into routine farming practice. To address this gap, we propose that novel methodology integration can align large volumes of data available from crop breeding programs with increasingly cheaper omics tools to predict wheat performance under different climate change scenarios. Underlying this is our proposal that breeders design and deliver future wheat ideotypes based on new or enhanced understanding of the genetic and physiological processes that are triggered when wheat is subjected to combinations of stresses. By defining this to a trait and/or genetic level, new insights can be made for yield improvement under future climate conditions.
Collapse
Affiliation(s)
- Carlos A Robles-Zazueta
- Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, México
| | | | | | - Carolina Rivera-Amado
- Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, México
| | | |
Collapse
|
2
|
Ortiz R, Reslow F, Montesinos-López A, Huicho J, Pérez-Rodríguez P, Montesinos-López OA, Crossa J. Partial least squares enhance multi-trait genomic prediction of potato cultivars in new environments. Sci Rep 2023; 13:9947. [PMID: 37336933 DOI: 10.1038/s41598-023-37169-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 06/17/2023] [Indexed: 06/21/2023] Open
Abstract
It is of paramount importance in plant breeding to have methods dealing with large numbers of predictor variables and few sample observations, as well as efficient methods for dealing with high correlation in predictors and measured traits. This paper explores in terms of prediction performance the partial least squares (PLS) method under single-trait (ST) and multi-trait (MT) prediction of potato traits. The first prediction was for tested lines in tested environments under a five-fold cross-validation (5FCV) strategy and the second prediction was for tested lines in untested environments (herein denoted as leave one environment out cross validation, LOEO). There was a good performance in terms of predictions (with accuracy mostly > 0.5 for Pearson's correlation) the accuracy of 5FCV was better than LOEO. Hence, we have empirical evidence that the ST and MT PLS framework is a very valuable tool for prediction in the context of potato breeding data.
Collapse
Affiliation(s)
- Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), P.O. Box 190, SE 23436, Lomma, Sweden.
| | - Fredrik Reslow
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), P.O. Box 190, SE 23436, Lomma, Sweden
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), 44430, Guadalajara, México
| | - José Huicho
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz Km. 45, El Batán, 56237, Texcoco, Edo. de México, México
| | | | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz Km. 45, El Batán, 56237, Texcoco, Edo. de México, México.
- Colegio de Postgraduados (COLPOS), 56230, Montecillos, Edo. de México, México.
- Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia.
| |
Collapse
|
3
|
Montesinos-López OA, Montesinos-López A. Two simple methods to improve the accuracy of the genomic selection methodology. BMC Genomics 2023; 24:220. [PMID: 37101112 PMCID: PMC10131336 DOI: 10.1186/s12864-023-09294-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 04/04/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND Genomic selection (GS) is revolutionizing plant and animal breeding. However, still its practical implementation is challenging since it is affected by many factors that when they are not under control make this methodology not effective. Also, due to the fact that it is formulated as a regression problem in general has low sensitivity to select the best candidate individuals since a top percentage is selected according to a ranking of predicted breeding values. RESULTS For this reason, in this paper we propose two methods to improve the prediction accuracy of this methodology. One of the methods consist in reformulating the GS (nowadays formulated as a regression problem) methodology as a binary classification problem. The other consists only in a postprocessing step that adjust the threshold used for classification of the lines predicted in its original scale (continues scale) to guarantee similar sensitivity and specificity. The postprocessing method is applied for the resulting predictions after obtaining the predictions using the conventional regression model. Both methods assume that we defined with anticipation a threshold, to divide the training data as top lines and not top lines, and this threshold can be decided in terms of a quantile (for example 80%, 90%, etc.) or as the average (or maximum) of the performance of the checks. In the reformulation method it is required to label as one those lines in the training set that are equal or larger than the specified threshold and as zero otherwise. Then we train a binary classification model with the conventional inputs, but using the binary response variable in place of the continuous response variable. The training of the binary classification should be done to guarantee a more similar sensitivity and specificity, to guarantee a reasonable probability of classification of the top lines. CONCLUSIONS We evaluated the proposed models in seven data sets and we found that the two proposed methods outperformed by large margin the conventional regression model (by 402.9% in terms of sensitivity, by 110.04% in terms of F1 score and by 70.96% in terms of Kappa coefficient, with the postprocessing methods). However, between the two proposed methods the postprocessing method was better than the reformulation as binary classification model. The simple postprocessing method to improve the accuracy of the conventional genomic regression models avoid the need to reformulate the conventional regression models as binary classification models with similar or better performance, that significantly improve the selection of the top best candidate lines. In general both proposed methods are simple and can easily be adopted for use in practical breeding programs, with the guarantee that will improve significantly the selection of the top best candidates lines.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Jalisco, 44430, Guadalajara, México.
| |
Collapse
|
4
|
Jubair S, Domaratzki M. Crop genomic selection with deep learning and environmental data: A survey. Front Artif Intell 2023; 5:1040295. [PMID: 36703955 PMCID: PMC9871498 DOI: 10.3389/frai.2022.1040295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/22/2022] [Indexed: 01/12/2023] Open
Abstract
Machine learning techniques for crop genomic selections, especially for single-environment plants, are well-developed. These machine learning models, which use dense genome-wide markers to predict phenotype, routinely perform well on single-environment datasets, especially for complex traits affected by multiple markers. On the other hand, machine learning models for predicting crop phenotype, especially deep learning models, using datasets that span different environmental conditions, have only recently emerged. Models that can accept heterogeneous data sources, such as temperature, soil conditions and precipitation, are natural choices for modeling GxE in multi-environment prediction. Here, we review emerging deep learning techniques that incorporate environmental data directly into genomic selection models.
Collapse
Affiliation(s)
- Sheikh Jubair
- Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada,*Correspondence: Sheikh Jubair ✉
| | - Mike Domaratzki
- Department of Computer Science, University of Western Ontario, London, ON, Canada
| |
Collapse
|
5
|
Costa-Neto G, Crespo-Herrera L, Fradgley N, Gardner K, Bentley AR, Dreisigacker S, Fritsche-Neto R, Montesinos-López OA, Crossa J. Envirome-wide associations enhance multi-year genome-based prediction of historical wheat breeding data. G3 (BETHESDA, MD.) 2022; 13:6861853. [PMID: 36454213 PMCID: PMC9911085 DOI: 10.1093/g3journal/jkac313] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/02/2022] [Accepted: 11/03/2022] [Indexed: 12/03/2022]
Abstract
Linking high-throughput environmental data (enviromics) to genomic prediction (GP) is a cost-effective strategy for increasing selection intensity under genotype-by-environment interactions (G × E). This study developed a data-driven approach based on Environment-Phenotype Association (EPA) aimed at recycling important G × E information from historical breeding data. EPA was developed in two applications: (1) scanning a secondary source of genetic variation, weighted from the shared reaction-norms of past-evaluated genotypes and (2) pinpointing weights of the similarity among trial-sites (locations), given the historical impact of each envirotyping data variable for a given site. These results were then used as a dimensionality reduction strategy, integrating historical data to feed multi-environment GP models, which led to the development of four new G × E kernels considering genomics, enviromics, and EPA outcomes. The wheat trial data used included 36 locations, 8 years, and three target populations of environments (TPEs) in India. Four prediction scenarios and six kernel models within/across TPEs were tested. Our results suggest that the conventional GBLUP, without enviromic data or when omitting EPA, is inefficient in predicting the performance of wheat lines in future years. Nevertheless, when EPA was introduced as an intermediary learning step to reduce the dimensionality of the G × E kernels while connecting phenotypic and environmental-wide variation, a significant enhancement of G × E prediction accuracy was evident. EPA revealed that the effect of seasonality makes strategies such as "covariable selection" unfeasible because G × E is year-germplasm specific. We propose that the EPA effectively serves as a "reinforcement learner" algorithm capable of uncovering the effect of seasonality over the reaction-norms, with the benefits of better forecasting the similarities between past and future trialing sites. EPA combines the benefits of dimensionality reduction while reducing the uncertainty of genotype-by-year predictions and increasing the resolution of GP for the genotype-specific level.
Collapse
Affiliation(s)
- Germano Costa-Neto
- Institute for Genomics Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Leonardo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Nick Fradgley
- NIAB, 93 Lawrence Weaver Road, Cambridge CB3 0LE, UK
| | - Keith Gardner
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Alison R Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | | | - Osval A Montesinos-López
- Corresponding authors: Facultad de Telematica, Universidad de Colima, Mexico. ; and International Maize and Wheat Improvement Center (CIMMYT) and Colegio de Post-Graduados, Mexico.
| | - Jose Crossa
- Corresponding authors: Facultad de Telematica, Universidad de Colima, Mexico. ; and International Maize and Wheat Improvement Center (CIMMYT) and Colegio de Post-Graduados, Mexico.
| |
Collapse
|
6
|
Montesinos-López OA, Montesinos-López A, Bernal Sandoval DA, Mosqueda-Gonzalez BA, Valenzo-Jiménez MA, Crossa J. Multi-trait genome prediction of new environments with partial least squares. Front Genet 2022; 13:966775. [PMID: 36134027 PMCID: PMC9483856 DOI: 10.3389/fgene.2022.966775] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 07/18/2022] [Indexed: 11/18/2022] Open
Abstract
The genomic selection (GS) methodology proposed over 20 years ago by Meuwissen et al. (Genetics, 2001) has revolutionized plant breeding. A predictive methodology that trains statistical machine learning algorithms with phenotypic and genotypic data of a reference population and makes predictions for genotyped candidate lines, GS saves significant resources in the selection of candidate individuals. However, its practical implementation is still challenging when the plant breeder is interested in the prediction of future seasons or new locations and/or environments, which is called the “leave one environment out” issue. Furthermore, because the distributions of the training and testing set do not match, most statistical machine learning methods struggle to produce moderate or reasonable prediction accuracies. For this reason, the main objective of this study was to explore the use of the multi-trait partial least square (MT-PLS) regression methodology for this specific task, benchmarking its performance with the Bayesian Multi-trait Genomic Best Linear Unbiased Predictor (MT-GBLUP) method. The benchmarking process was performed with five actual data sets. We found that in all data sets the MT-PLS method outperformed the popular MT-GBLUP method by 349.8% (under predictor E + G), 484.4% (under predictor E + G + GE; where E denotes environments, G genotypes and GE the genotype by environment interaction) and 15.9% (under predictor G + GE) across traits. Our results provide empirical evidence of the power of the MT-PLS methodology for the prediction of future seasons or new environments. Furthermore, the comparison between single univariate-trait (UT) versus MT for GBLUP and PLS gave an increase in prediction accuracy of MT-GBLUP versus UT-GBLUP, but not for MT-PLS versus UT-PLS.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Mexico
- *Correspondence: Abelardo Montesinos-López, , José Crossa,
| | | | | | - Marco Alberto Valenzo-Jiménez
- Universidad Michoacana de San Nicolas de Hidalgo (UMSNH), Avenida Francisco J. Mujica S/N Ciudad Universitaria, Morelia, MC, Mexico
| | - José Crossa
- International Maize and Wheat Improvement Center, Texcoco, Edo. de Mexico, Mexico
- Colegio de Porstgraduados, Montecillos, Edo. de Mexico, Mexico
- *Correspondence: Abelardo Montesinos-López, , José Crossa,
| |
Collapse
|
7
|
Montesinos-López OA, Montesinos-López A, Cano-Paez B, Hernández-Suárez CM, Santana-Mancilla PC, Crossa J. A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library. Genes (Basel) 2022; 13:genes13081494. [PMID: 36011405 PMCID: PMC9407886 DOI: 10.3390/genes13081494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/10/2022] [Accepted: 08/19/2022] [Indexed: 11/30/2022] Open
Abstract
Genomic selection (GS) changed the way plant breeders select genotypes. GS takes advantage of phenotypic and genotypic information to training a statistical machine learning model, which is used to predict phenotypic (or breeding) values of new lines for which only genotypic information is available. Therefore, many statistical machine learning methods have been proposed for this task. Multi-trait (MT) genomic prediction models take advantage of correlated traits to improve prediction accuracy. Therefore, some multivariate statistical machine learning methods are popular for GS. In this paper, we compare the prediction performance of three MT methods: the MT genomic best linear unbiased predictor (GBLUP), the MT partial least squares (PLS) and the multi-trait random forest (RF) methods. Benchmarking was performed with six real datasets. We found that the three investigated methods produce similar results, but under predictors with genotype (G) and environment (E), that is, E + G, the MT GBLUP achieved superior performance, whereas under predictors E + G + genotype × environment (GE) and G + GE, random forest achieved the best results. We also found that the best predictions were achieved under the predictors E + G and E + G + GE. Here, we also provide the R code for the implementation of these three statistical machine learning methods in the sparse kernel method (SKM) library, which offers not only options for single-trait prediction with various statistical machine learning methods but also some options for MT predictions that can help to capture improved complex patterns in datasets that are common in genomic selection.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44100, Mexico
- Correspondence: (A.M.-L.); (J.C.)
| | - Bernabe Cano-Paez
- Facultad de Ciencias, Universidad Nacional Autónoma de México (UNAM), México City 04510, Mexico
| | - Carlos Moisés Hernández-Suárez
- Instituto de Ciencias Tecnología e Innovación, Universidad Francisco Gavidia, El Progreso St., No. 2748, Colonia Flor Blanca, San Salvador CP 1101, El Salvador
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco 56237, Mexico
- Colegio de Postgraduados, Montecillo 56230, Mexico
- Correspondence: (A.M.-L.); (J.C.)
| |
Collapse
|