1
|
Crossa J, Martini JWR, Vitale P, Pérez-Rodríguez P, Costa-Neto G, Fritsche-Neto R, Runcie D, Cuevas J, Toledo F, Li H, De Vita P, Gerard G, Dreisigacker S, Crespo-Herrera L, Saint Pierre C, Bentley A, Lillemo M, Ortiz R, Montesinos-López OA, Montesinos-López A. Expanding genomic prediction in plant breeding: harnessing big data, machine learning, and advanced software. TRENDS IN PLANT SCIENCE 2025:S1360-1385(24)00345-5. [PMID: 39890501 DOI: 10.1016/j.tplants.2024.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 12/05/2024] [Accepted: 12/12/2024] [Indexed: 02/03/2025]
Abstract
With growing evidence that genomic selection (GS) improves genetic gains in plant breeding, it is timely to review the key factors that improve its efficiency. In this feature review, we focus on the statistical machine learning (ML) methods and software that are democratizing GS methodology. We outline the principles of genomic-enabled prediction and discuss how statistical ML tools enhance GS efficiency with big data. Additionally, we examine various statistical ML tools developed in recent years for predicting traits across continuous, binary, categorical, and count phenotypes. We highlight the unique advantages of deep learning (DL) models used in genomic prediction (GP). Finally, we review software developed to democratize the use of GP models and recent data management tools that support the adoption of GS methodology.
Collapse
Affiliation(s)
- José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico; Colegio de Postgraduados, Montecillos, Edo. de México CP 56230, Mexico
| | | | - Paolo Vitale
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | | | | | | | - Daniel Runcie
- Department of Plant Sciences at the University of California, Davis, CA, USA
| | - Jaime Cuevas
- Universidad de Quintana Roo, Chetumal, Quintana Roo, 77019, Mexico
| | - Fernando Toledo
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - H Li
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Pasquale De Vita
- Research Center for Cereal and Industrial Crops (CREA-CI), CREA - Council for Agricultural Research and Economics, Foggia, Italy
| | - Guillermo Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Leonardo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz Km. 45, El Batán, CP 56237, Texcoco, Edo. de México, Mexico
| | - Alison Bentley
- Australian National University, Research School of Biology, Canberra, Australia
| | - Morten Lillemo
- Norwegian University of Life Science (NMBU), Department of Plant Science, Ås, Norway
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), P.O. Box 190 Sundsvagen 10, SE 23422 Lomma, Sweden
| | | | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, Mexico.
| |
Collapse
|
2
|
Bankin M, Tyrykin Y, Duk M, Samsonova M, Kozlov K. Modeling Chickpea Productivity with Artificial Image Objects and Convolutional Neural Network. PLANTS (BASEL, SWITZERLAND) 2024; 13:2444. [PMID: 39273927 PMCID: PMC11397516 DOI: 10.3390/plants13172444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 08/23/2024] [Accepted: 08/28/2024] [Indexed: 09/15/2024]
Abstract
The chickpea plays a significant role in global agriculture and occupies an increasing share in the human diet. The main aim of the research was to develop a model for the prediction of two chickpea productivity traits in the available dataset. Genomic data for accessions were encoded in Artificial Image Objects, and a model for the thousand-seed weight (TSW) and number of seeds per plant (SNpP) prediction was constructed using a Convolutional Neural Network, dictionary learning and sparse coding for feature extraction, and extreme gradient boosting for regression. The model was capable of predicting both traits with an acceptable accuracy of 84-85%. The most important factors for model solution were identified using the dense regression attention maps method. The SNPs important for the SNpP and TSW traits were found in 34 and 49 genes, respectively. Genomic prediction with a constructed model can help breeding programs harness genotypic and phenotypic diversity to more effectively produce varieties with a desired phenotype.
Collapse
Affiliation(s)
- Mikhail Bankin
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| | - Yaroslav Tyrykin
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| | - Maria Duk
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| | - Maria Samsonova
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| | - Konstantin Kozlov
- Mathematical Biology and Bioinformatics Lab, PhysMech Institute, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
| |
Collapse
|
3
|
Heilmann PG, Frisch M, Abbadi A, Kox T, Herzog E. Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP. FRONTIERS IN PLANT SCIENCE 2023; 14:1178902. [PMID: 37546247 PMCID: PMC10401275 DOI: 10.3389/fpls.2023.1178902] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 06/26/2023] [Indexed: 08/08/2023]
Abstract
Testcross factorials in newly established hybrid breeding programs are often highly unbalanced, incomplete, and characterized by predominance of special combining ability (SCA) over general combining ability (GCA). This results in a low efficiency of GCA-based selection. Machine learning algorithms might improve prediction of hybrid performance in such testcross factorials, as they have been successfully applied to find complex underlying patterns in sparse data. Our objective was to compare the prediction accuracy of machine learning algorithms to that of GCA-based prediction and genomic best linear unbiased prediction (GBLUP) in six unbalanced incomplete factorials from hybrid breeding programs of rapeseed, wheat, and corn. We investigated a range of machine learning algorithms with three different types of predictor variables: (a) information on parentage of hybrids, (b) in addition hybrid performance of crosses of the parental lines with other crossing partners, and (c) genotypic marker data. In two highly incomplete and unbalanced factorials from rapeseed, in which the SCA variance contributed considerably to the genetic variance, stacked ensembles of gradient boosting machines based on parentage information outperformed GCA prediction. The stacked ensembles increased prediction accuracy from 0.39 to 0.45, and from 0.48 to 0.54 compared to GCA prediction. The prediction accuracy reached by stacked ensembles without marker data reached values comparable to those of GBLUP that requires marker data. We conclude that hybrid prediction with stacked ensembles of gradient boosting machines based on parentage information is a promising approach that is worth further investigations with other data sets in which SCA variance is high.
Collapse
Affiliation(s)
| | - Matthias Frisch
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, Germany
| | | | | | - Eva Herzog
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, Germany
| |
Collapse
|
4
|
Guo T, Li X. Machine learning for predicting phenotype from genotype and environment. Curr Opin Biotechnol 2023; 79:102853. [PMID: 36463837 DOI: 10.1016/j.copbio.2022.102853] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/01/2022] [Accepted: 11/07/2022] [Indexed: 12/03/2022]
Abstract
Predicting phenotype with genomic and environmental information is critically needed and challenging. Machine learning methods have emerged as powerful tools to make accurate predictions from large and complex biological data. Here, we review the progress of phenotype prediction models enabled or improved by machine learning methods. We categorized the applications into three scenarios: prediction with genotypic information, with environmental information, and with both. In each scenario, we illustrate the practicality of prediction models, the advantages of machine learning, and the challenges of modeling complex relationships. We discuss the promising potential of leveraging machine learning and genetics theories to develop models that can predict phenotype and also interpret the biological consequences of changes in genotype and environment.
Collapse
Affiliation(s)
- Tingting Guo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China.
| | - Xianran Li
- USDA, Agricultural Research Service, Wheat Health, Genetics, and Quality Research Unit, Pullman, WA 99164, USA; Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA.
| |
Collapse
|