1
|
Ribeiro PCO, Howard R, Jarquin D, Oliveira ICM, Chaves S, Carneiro PCS, Souza VF, Schaffert RE, Damasceno CMB, Parrella RAC, Dias KOG, Pastina MM. Prediction of biomass sorghum hybrids using environmental feature-enriched genomic combining ability models in tropical environments. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2025; 138:113. [PMID: 40343517 DOI: 10.1007/s00122-025-04895-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 04/02/2025] [Indexed: 05/11/2025]
Abstract
KEY MESSAGE Incorporating environmental features improved the predictive ability of genomic prediction models under multi-environment trials in tropical conditions. Gathering environmental and genomic information can benefit the breeding of sorghum hybrids by overcoming complications imposed by the genotype-by-environment interaction (GEI). In this study, we explored the value of combining environmental features (EFs) and genomic data to enhance predictions for biomass sorghum hybrid breeding, addressing GEI complexities. We also investigated if considering specific time windows for EFs improves the prediction. We used a historical dataset from a tropical biomass sorghum breeding program featuring 253 genotypes across 64 trials. Initially, a first-stage analysis was performed to obtain the adjusted means (EBLUEs) and scrutinize the impact of 29 EFs (geographic, climatic, and soil-related EFs) on GEI. Subsequently, in the second-stage analysis, we used data from 221 hybrids that had both parents genotyped to evaluate the predictive ability and assertiveness of 12 models with different effects. The most relevant EFs included soil organic carbon, insolation on a horizontal surface, longitude, temperature at dew point, and nitrogen content. Across three cross-validation scenarios (CV1, CV0, and CV00), the most effective model encompassed main combining ability effects, GEI, and G ω I (genotype-by-specific environmental effects interaction), utilizing an environmental kinship matrix ( Ω ) derived from mean EF values. Only in CV2, a model with a similar structure but utilizing Ω from specific time windows outperformed others. Our findings highlight the potential of integrating environmental and genomic data to refine predictive models for optimizing biomass sorghum hybrid breeding strategies.
Collapse
Affiliation(s)
- Pedro C O Ribeiro
- Department of General Biology, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | - Reka Howard
- Department of Statistics, University of Nebraska - Lincoln (UNL), Lincoln, NE, USA
| | - Diego Jarquin
- Department of Agronomy, University of Florida, Gainesville, FL, USA
| | - Isadora C M Oliveira
- Embrapa Milho e Sorgo, Brazilian Agricultural Research Corporation (Embrapa), Sete Lagoas, Minas Gerais, Brazil
| | - Saulo Chaves
- Department of General Biology, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
- Department of Genetics, "Luiz de Queiroz" College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil
| | - Pedro C S Carneiro
- Department of General Biology, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | - Vander F Souza
- Embrapa Milho e Sorgo, Brazilian Agricultural Research Corporation (Embrapa), Sete Lagoas, Minas Gerais, Brazil
| | - Robert E Schaffert
- Embrapa Milho e Sorgo, Brazilian Agricultural Research Corporation (Embrapa), Sete Lagoas, Minas Gerais, Brazil
| | - Cynthia M B Damasceno
- Embrapa Milho e Sorgo, Brazilian Agricultural Research Corporation (Embrapa), Sete Lagoas, Minas Gerais, Brazil
| | - Rafael A C Parrella
- Embrapa Milho e Sorgo, Brazilian Agricultural Research Corporation (Embrapa), Sete Lagoas, Minas Gerais, Brazil
| | - Kaio Olimpio G Dias
- Department of General Biology, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil.
- Institute of Artificial and Computational Intelligence (IDATA), Federal University of Viçosa, Viçosa, Minas Gerais, Brazil.
| | - Maria M Pastina
- Embrapa Milho e Sorgo, Brazilian Agricultural Research Corporation (Embrapa), Sete Lagoas, Minas Gerais, Brazil.
| |
Collapse
|
2
|
He K, Yu T, Gao S, Chen S, Li L, Zhang X, Huang C, Xu Y, Wang J, Prasanna BM, Hearne S, Li X, Li H. Leveraging Automated Machine Learning for Environmental Data-Driven Genetic Analysis and Genomic Prediction in Maize Hybrids. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2412423. [PMID: 40047344 PMCID: PMC12061318 DOI: 10.1002/advs.202412423] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Revised: 02/14/2025] [Indexed: 05/10/2025]
Abstract
Genotype, environment, and genotype-by-environment (G×E) interactions play a critical role in shaping crop phenotypes. Here, a large-scale, multi-environment hybrid maize dataset is used to construct and validate an automated machine learning framework that integrates environmental and genomic data for improved accuracy and efficiency in genetic analyses and genomic predictions. Dimensionality-reduced environmental parameters (RD_EPs) aligned with developmental stages are applied to establish linear relationships between RD_EPs and traits to assess the influence of environment on phenotype. Genome-wide association study identifies 539 phenotypic plasticity trait-associated markers (PP-TAMs), 223 environmental stability TAMs (Main-TAMs), and 92 G×E-TAMs, revealing distinct genetic bases for PP and G×E interactions. Training genomic prediction models with both TAMs and RD_EPs increase prediction accuracy by 14.02% to 28.42% over that of genome-wide marker approaches. These results demonstrate the potential of utilizing environmental data for improving genetic analysis and genomic selection, offering a scalable approach for developing climate-adaptive maize varieties.
Collapse
Affiliation(s)
- Kunhui He
- State Key Laboratory of Crop Gene Resources and BreedingInstitute of Crop SciencesChinese Academy of Agricultural Sciences (CAAS)CIMMYT‐China OfficeBeijing100081China
- Nanfan Research InstituteCAASSanyaHainan572024China
| | - Tingxi Yu
- State Key Laboratory of Crop Gene Resources and BreedingInstitute of Crop SciencesChinese Academy of Agricultural Sciences (CAAS)CIMMYT‐China OfficeBeijing100081China
- Nanfan Research InstituteCAASSanyaHainan572024China
| | - Shang Gao
- State Key Laboratory of Crop Gene Resources and BreedingInstitute of Crop SciencesChinese Academy of Agricultural Sciences (CAAS)CIMMYT‐China OfficeBeijing100081China
- Nanfan Research InstituteCAASSanyaHainan572024China
| | - Shoukun Chen
- State Key Laboratory of Crop Gene Resources and BreedingInstitute of Crop SciencesChinese Academy of Agricultural Sciences (CAAS)CIMMYT‐China OfficeBeijing100081China
- Nanfan Research InstituteCAASSanyaHainan572024China
| | - Liang Li
- State Key Laboratory of Crop Gene Resources and BreedingInstitute of Crop SciencesChinese Academy of Agricultural Sciences (CAAS)CIMMYT‐China OfficeBeijing100081China
| | - Xuecai Zhang
- State Key Laboratory of Crop Gene Resources and BreedingInstitute of Crop SciencesChinese Academy of Agricultural Sciences (CAAS)CIMMYT‐China OfficeBeijing100081China
- Nanfan Research InstituteCAASSanyaHainan572024China
- International Maize and Wheat Improvement Center (CIMMYT)Apdo. Postal 6‐641Texcoco D.F.06600Mexico
| | - Changling Huang
- State Key Laboratory of Crop Gene Resources and BreedingInstitute of Crop SciencesChinese Academy of Agricultural Sciences (CAAS)CIMMYT‐China OfficeBeijing100081China
- Nanfan Research InstituteCAASSanyaHainan572024China
| | - Yunbi Xu
- State Key Laboratory of Crop Gene Resources and BreedingInstitute of Crop SciencesChinese Academy of Agricultural Sciences (CAAS)CIMMYT‐China OfficeBeijing100081China
| | - Jiankang Wang
- State Key Laboratory of Crop Gene Resources and BreedingInstitute of Crop SciencesChinese Academy of Agricultural Sciences (CAAS)CIMMYT‐China OfficeBeijing100081China
- Nanfan Research InstituteCAASSanyaHainan572024China
| | | | - Sarah Hearne
- International Maize and Wheat Improvement Center (CIMMYT)Apdo. Postal 6‐641Texcoco D.F.06600Mexico
| | - Xinhai Li
- State Key Laboratory of Crop Gene Resources and BreedingInstitute of Crop SciencesChinese Academy of Agricultural Sciences (CAAS)CIMMYT‐China OfficeBeijing100081China
| | - Huihui Li
- State Key Laboratory of Crop Gene Resources and BreedingInstitute of Crop SciencesChinese Academy of Agricultural Sciences (CAAS)CIMMYT‐China OfficeBeijing100081China
- Nanfan Research InstituteCAASSanyaHainan572024China
| |
Collapse
|
3
|
Wang J, Liu L, He K, Gebrewahid TW, Gao S, Tian Q, Li Z, Song Y, Guo Y, Li Y, Cui Q, Zhang L, Wang J, Huang C, Li L, Guo T, Li H. Accurate genomic prediction for grain yield and grain moisture content of maize hybrids using multi-environment data. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2025; 67:1379-1394. [PMID: 39960172 DOI: 10.1111/jipb.13857] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Accepted: 01/14/2025] [Indexed: 05/10/2025]
Abstract
Incorporating genotype-by-environment (GE) interaction effects into genomic prediction (GP) models with multi-environment climate data can improve selection accuracy to accelerate crop breeding but has received little research attention. Here, we conducted a cross-region GP study of grain moisture content (GMC) and grain yield (GY) in maize hybrids in two major Chinese growing regions using data for 19 climatic factors across 34 environments in 2020 and 2021. Predictions were conducted in 2,126 hybrids generated from 475 maize inbred lines, using 9,355 single nucleotide polymorphism markers for genotyping. Models based on genomic best linear unbiased prediction (GBLUP) incorporating GE interaction effects of 19 climatic factors associated with day length, transpiration, temperature, and radiation (GBLUP-GE19CF) trained on whole data set outperformed the traditional GBLUP or BayesB models in predicting GMC or GY by 10-fold cross-validation, achieving prediction accuracies of 0.731 and 0.331, respectively. To refine the climate data, we examined 84 statistical features associated with these climatic factors and identified nine factors most correlated with GMC or GY. Principal component analysis of climate data yielded nine principal components responsible for 97% of the variability in the data. Incorporating these nine factors or principal components into the GBLUP-GE framework with a similarity matrix of environments (GBLUP-GE9CF and GBLUP-GEPCA) provided similar prediction accuracies but could reduce the computational burden. In addition, increasing the number of test set environments in the training set from 8 to 14 increased the prediction accuracy of GBLUP-GE19CF trained with monthly average climate data for 2020-2021. Examining prediction accuracy based on concordance, the proportion of overlapping hybrids between the top 50% of predicted and observed values for GMC and GY, indicated that concordance exceeded 50% for the GBLUP-GE19CF model, confirming the reliability of our predictions. This study can provide practical guidance for optimizing GPs for maize breeding programs in multi-environment selection.
Collapse
Affiliation(s)
- Jingxin Wang
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024, China
| | - Liwei Liu
- Key Laboratory of Maize Engineering Breeding, Ministry of Agriculture and Rural Affairs, Zhangye, 734000, China
- Jinxiang Seed Co. Ltd, Zhangye, 734000, China
| | - Kunhui He
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024, China
| | - Takele Weldu Gebrewahid
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024, China
- College of Agriculture, Aksum University-Shire Campus, Shire, 314, Ethiopia
| | - Shang Gao
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024, China
| | - Qingzhen Tian
- Key Laboratory of Maize Engineering Breeding, Ministry of Agriculture and Rural Affairs, Zhangye, 734000, China
- Jinxiang Seed Co. Ltd, Zhangye, 734000, China
| | - Zhanyi Li
- Key Laboratory of Maize Engineering Breeding, Ministry of Agriculture and Rural Affairs, Zhangye, 734000, China
- Jinxiang Seed Co. Ltd, Zhangye, 734000, China
| | - Yiqun Song
- Key Laboratory of Maize Engineering Breeding, Ministry of Agriculture and Rural Affairs, Zhangye, 734000, China
- Jinxiang Seed Co. Ltd, Zhangye, 734000, China
| | - Yiliang Guo
- Key Laboratory of Maize Engineering Breeding, Ministry of Agriculture and Rural Affairs, Zhangye, 734000, China
- Jinxiang Seed Co. Ltd, Zhangye, 734000, China
| | - Yanwei Li
- Key Laboratory of Maize Engineering Breeding, Ministry of Agriculture and Rural Affairs, Zhangye, 734000, China
- Jinxiang Seed Co. Ltd, Zhangye, 734000, China
| | - Qinxin Cui
- Key Laboratory of Maize Engineering Breeding, Ministry of Agriculture and Rural Affairs, Zhangye, 734000, China
- Jinxiang Seed Co. Ltd, Zhangye, 734000, China
| | - Luyan Zhang
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jiankang Wang
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024, China
| | - Changling Huang
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024, China
| | - Liang Li
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Tingting Guo
- Key Laboratory of Maize Engineering Breeding, Ministry of Agriculture and Rural Affairs, Zhangye, 734000, China
- Jinxiang Seed Co. Ltd, Zhangye, 734000, China
| | - Huihui Li
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024, China
| |
Collapse
|
4
|
Lell M, Gogna A, Kloesgen V, Avenhaus U, Dörnte J, Eckhoff WM, Eschholz T, Gils M, Kirchhoff M, Koch M, Kollers S, Pfeiffer N, Rapp M, Wimmer V, Wolf M, Reif J, Zhao Y. Breaking down data silos across companies to train genome-wide predictions: A feasibility study in wheat. PLANT BIOTECHNOLOGY JOURNAL 2025. [PMID: 40253615 DOI: 10.1111/pbi.70095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 03/07/2025] [Accepted: 04/07/2025] [Indexed: 04/22/2025]
Abstract
Big data, combined with artificial intelligence (AI) techniques, holds the potential to significantly enhance the accuracy of genome-wide predictions. Motivated by the success reported for wheat hybrids, we extended the scope to inbred lines by integrating phenotypic and genotypic data from four commercial wheat breeding programs. Acting as an academic data trustee, we merged these data with historical experimental series from previous public-private partnerships. The integrated data spanned 12 years, 168 environments, and provided a genomic prediction training set of up to ~9500 genotypes for grain yield, plant height and heading date. Despite the heterogeneous phenotypic and genotypic data, we were able to obtain high-quality data by implementing rigorous data curation, including SNP imputation. We utilized the data to compare genomic best linear unbiased predictions with convolutional neural network-based genomic prediction. Our analysis revealed that we could flexibly combine experimental series for genomic prediction, with prediction ability steadily improving as the training set sizes increased, peaking at around 4000 genotypes. As training set sizes were further increased, the gains in prediction ability decreased, approaching a plateau well below the theoretical limit defined by the square root of the heritability. Potential avenues, such as designed training sets or novel non-linear prediction approaches, could overcome this plateau and help to more fully exploit the high-value big data generated by breaking down data silos across companies.
Collapse
Affiliation(s)
- Moritz Lell
- Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Abhishek Gogna
- Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Vincent Kloesgen
- Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Ulrike Avenhaus
- W. von Borries-Eckendorf GmbH & Co. KG, Leopoldshöhe, Germany
| | - Jost Dörnte
- Deutsche Saatveredelung AG, Lippstadt, Germany
| | | | | | - Mario Gils
- Nordsaat Saatzucht GmbH, Langenstein, Germany
| | | | | | | | | | - Matthias Rapp
- W. von Borries-Eckendorf GmbH & Co. KG, Leopoldshöhe, Germany
| | | | | | - Jochen Reif
- Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Yusheng Zhao
- Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany
| |
Collapse
|
5
|
Xavier A, Runcie D, Habier D. Megavariate methods capture complex genotype-by-environment interactions. Genetics 2025; 229:iyae179. [PMID: 39495661 PMCID: PMC12005252 DOI: 10.1093/genetics/iyae179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Accepted: 10/26/2024] [Indexed: 11/06/2024] Open
Abstract
Genomic prediction models that capture genotype-by-environment (GxE) interaction are useful for predicting site-specific performance by leveraging information among related individuals and correlated environments, but implementing such models is computationally challenging. This study describes the algorithm of these scalable approaches, including 2 models with latent representations of GxE interactions, namely MegaLMM and MegaSEM, and an efficient multivariate mixed-model solver, namely Pseudo-expectation Gauss-Seidel (PEGS), fitting different covariance structures [unstructured, extended factor analytic (XFA), Heteroskedastic compound symmetry (HCS)]. Accuracy and runtime are benchmarked on simulated scenarios with varying numbers of genotypes and environments. MegaLMM and PEGS-based XFA and HCS models provided the highest accuracy under sparse testing with 100 testing environments. PEGS-based unstructured model was orders of magnitude faster than restricted maximum likelihood (REML) based multivariate genomic best linear unbiased predictions (GBLUP) while providing the same accuracy. MegaSEM provided the lowest runtime, fitting a model with 200 traits and 20,000 individuals in ∼5 min, and a model with 2,000 traits and 2,000 individuals in less than 3 min. With the genomes-to-fields data, the most accurate predictions were attained with the univariate model fitted across environments and by averaging environment-level genomic estimated breeding values (GEBVs) from models with HCS and XFA covariance structures.
Collapse
Affiliation(s)
- Alencar Xavier
- Corteva Agrisciences, Seed Product Development, 8305 NW 62nd Ave, Johnston, IA 50131, USA
- Purdue University, Department of Agronomy, 915 Mitch Daniels Blvd, West Lafayette, IN 47907, USA
| | - Daniel Runcie
- University of California Davis, Department of Plant Sciences, One Shield Ave, Davis, CA 95616, USA
| | - David Habier
- Corteva Agrisciences, Seed Product Development, 8305 NW 62nd Ave, Johnston, IA 50131, USA
| |
Collapse
|
6
|
Vitale P, Montesinos-López O, Gerard G, Velu G, Tadesse Z, Montesinos-López A, Dreisigacker S, Pacheco A, Toledo F, Saint Pierre C, Pérez-Rodríguez P, Gardner K, Crespo-Herrera L, Crossa J. Improving wheat grain yield genomic prediction accuracy using historical data. G3 (BETHESDA, MD.) 2025; 15:jkaf038. [PMID: 40056458 PMCID: PMC12005153 DOI: 10.1093/g3journal/jkaf038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/31/2024] [Revised: 02/08/2025] [Accepted: 02/12/2025] [Indexed: 03/10/2025]
Abstract
Genomic selection is an essential tool to improve genetic gain in wheat breeding. This study aimed to enhance prediction accuracy for grain yield across various selection environments using CIMMYT's (International Maize and Wheat Improvement Center) historical dataset. Ten years of grain yield data from 6 selection environments were analyzed, with the populations of 5 years (2018-2023) as the validation population and earlier years (back to 2013-2014) as the training population. Generally, we observed that as the number of training years increased, the prediction accuracy tended to improve or stabilize. For instance, in the late heat stress selection environment (beds late heat stress), prediction accuracy increased from 0.11 (1 training year) to 0.23 (5 years), stabilizing at 0.26. Similar trends were observed in the intermediate drought selection environment (beds with 2 irrigations), with prediction accuracy rising from 0.12 (1 year) to 0.21 (4 years) but minimal improvement beyond that. Conversely, some selection environments, such as flat 5 irrigations (flat optimal environment), did not significantly increase, with the prediction accuracy fluctuating around 0.09-0.14 regardless of the number of training years used. Additionally, average genetic diversity within the training population and the validation population influenced prediction accuracy. Indeed, a negative correlation between prediction accuracy and the genetic distance was observed. This highlights the need to balance genetic diversity to enhance the predictive power of genomic selection models. These findings exhibit the benefits of using an extended historical dataset while considering genetic diversity to maximize prediction accuracy in genomic selection strategies for wheat breeding, ultimately supporting the development of high-yielding varieties.
Collapse
Affiliation(s)
- Paolo Vitale
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | | | - Guillermo Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Govindan Velu
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Zerihun Tadesse
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Jalisco, Mexico
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Angela Pacheco
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Fernando Toledo
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | | | - Keith Gardner
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - Leonardo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera México-Veracruz, El Batan, Edo. de México 5623, Mexico
- Colegio de Postgraduados, Montecillo, Edo. de México 56231, Mexico
| |
Collapse
|
7
|
Avagyan V, Boer MP, Solin J, van Dijk ADJ, Bustos-Korts D, van Rossum BJ, Ramakers JJC, van Eeuwijk F, Kruijer W. Penalized factorial regression as a flexible and computationally attractive reaction norm model for prediction in the presence of GxE. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2025; 138:88. [PMID: 40155554 PMCID: PMC11953130 DOI: 10.1007/s00122-025-04865-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 02/24/2025] [Indexed: 04/01/2025]
Abstract
KEY MESSAGE Penalized factorial regression offers a computationally attractive alternative to kernel and deep learning methods for prediction of genotype by environment interactions. For two representative data sets on wheat and maize, prediction accuracies were comparable, while computing requirements and time were clearly lower. A longstanding challenge in plant breeding and genetics is the prediction of yield for new environments in the presence of genotype by environment interaction ( G × E ). The genotypes in this case are promising candidate varieties at an advanced stage of breeding programs or are part of statutory variety trials or post registration trials. The genotypes have been tested in a limited set of trials and the question is how these genotypes will perform in future growing conditions. A reaction norm approach seems adequate to address this challenge. Reaction norms are functions with genotype-specific parameters that express the phenotype as a function of environmental inputs. G × E follows from differences in genotype-specific slope or rate parameters. Prediction of yield for new environments requires the identification of suitable reaction norm functions and the estimation of genotype-specific parameters together with knowledge about the environmental conditions. Here, we present penalized factorial regression with simple linear reaction norms for individual genotypes whose slopes are regularized by imposing a penalty upon them. Different types of penalization provide shrinkage, automatic selection of environmental covariates (EC's) and protection against overfitting for prediction of yield with medium to large numbers of EC's. Illustrations of our approach are given for a maize and a wheat data set. For these data, our approach compares well to alternative methods based on Bayesian regression and deep learning with respect to prediction accuracy, while computational demands are clearly lower.
Collapse
Affiliation(s)
- Vahe Avagyan
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
| | - Martin P Boer
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
| | - Junita Solin
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
| | - Aalt D J van Dijk
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
- Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Daniela Bustos-Korts
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
- Faculty of Agricultural Sciences, Universidad Austral de Chile, Valdivia, Chile
| | | | - Jip J C Ramakers
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
| | - Fred van Eeuwijk
- Biometris, Wageningen University and Research, Wageningen, The Netherlands.
| | - Willem Kruijer
- Biometris, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
8
|
McBreen J, Babar MA, Jarquin D, Ampatzidis Y, Khan N, Kunwar S, Acharya JP, Adewale S, Brown‐Guedira G. Enhancing genomic-based forward prediction accuracy in wheat by integrating UAV-derived hyperspectral and environmental data with machine learning under heat-stressed environments. THE PLANT GENOME 2025; 18:e20554. [PMID: 39779660 PMCID: PMC11711122 DOI: 10.1002/tpg2.20554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Revised: 12/10/2024] [Accepted: 12/13/2024] [Indexed: 01/11/2025]
Abstract
Integrating genomic, hyperspectral imaging (HSI), and environmental data enhances wheat yield predictions, with HSI providing detailed spectral insights for predicting complex grain yield (GY) traits. Incorporating HSI data with single nucleotide polymorphic markers (SNPs) resulted in a substantial improvement in predictive ability compared to the conventional genomic prediction models. Over the course of several years, the prediction ability varied due to diverse weather conditions. The most comprehensive parametric model tested, which included SNPs, HSI, and environmental covariates data, consistently achieved the best results, closely followed by machine learning (ML) approaches when considering the same omics data. For example, the most comprehensive model (M9), under the forward prediction cross-validation scheme, predicted the GY of the 2023 growing season using data from 2021 and 2022 for a correlation between predicted and observed values of 0.53. This model demonstrated superior performance compared to less complex models, emphasizing the advantage of integrating numerous data sources and their interactive effects. Furthermore, when comparing the top 25% of the predicted lines versus the corresponding observed lines with the highest GY, the M9 model returned a coincide index (CI) of 55% (i.e., in both sets, 55% of the top 25% values were common), whereas for the highest performing ML model (gradient boosting regression), the CI was of 46%. This study highlights the potential of multi-data source approaches to accelerate the selection of heat-tolerant wheat genotypes.
Collapse
Affiliation(s)
- Jordan McBreen
- Department of AgronomyUniversity of FloridaGainesvilleFloridaUSA
| | - Md Ali Babar
- Department of AgronomyUniversity of FloridaGainesvilleFloridaUSA
| | - Diego Jarquin
- Department of AgronomyUniversity of FloridaGainesvilleFloridaUSA
| | - Yiannis Ampatzidis
- Agricultural and Biological Engineering Department, Southwest Florida Research and Education CenterUniversity of Florida, IFASImmokaleeFloridaUSA
| | - Naeem Khan
- Department of AgronomyUniversity of FloridaGainesvilleFloridaUSA
| | - Sudip Kunwar
- Department of AgronomyUniversity of FloridaGainesvilleFloridaUSA
| | | | - Samuel Adewale
- Department of AgronomyUniversity of FloridaGainesvilleFloridaUSA
| | | |
Collapse
|
9
|
Hudson O, Brawner J. Using genome-wide associations and host-by-pathogen predictions to identify allelic interactions that control disease resistance. THE PLANT GENOME 2025; 18:e70006. [PMID: 39994874 PMCID: PMC11850958 DOI: 10.1002/tpg2.70006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 01/08/2025] [Accepted: 01/15/2025] [Indexed: 02/26/2025]
Abstract
Characterizing the molecular mechanisms underlying disease symptom expression has been used to improve human health and disease resistance in crops and animal breeds. Quantitative trait loci and genome-wide association studies (GWAS) are widely used to identify genomic regions that are involved in disease progression. This study extends traditional GWAS significance tests of host and pathogen marker main effects by utilizing dual-genome reaction norm models to evaluate the importance of host-single nucleotide polymorphism (SNP) by pathogen-SNP interactions. Disease symptom severity data from Fusarium ear rot (FER) on maize (Zea mays L.) is used to demonstrate the use of both genomes in genomic selection models for breeding and the identification of loci that interact across organisms to impact FER disease development. Dual genome prediction models improved heritability estimates, error variances, and model accuracy while providing predictions for host-by-pathogen interactions that may be used to test the significance of SNP-SNP interactions. Independent GWAS for maize and Fusarium populations identified significantly associated loci and predictions that were used to evaluate the importance of interactions using two different association tests. Predictions from dual genome models were used to evaluate the significance of the SNP-SNP interactions that may be associated with population structure or polygenic effects. As well, association tests incorporating host and pathogen markers in models that also included genomic relationship matrices were used to account for population structure. Subsequent evaluation of protein-protein interactions from candidate genes near the interacting SNPs provides a further in silico evaluation method to expedite the identification of interacting genes.
Collapse
Affiliation(s)
- Owen Hudson
- Department of Plant PathologyUniversity of FloridaGainesvilleFloridaUSA
| | - Jeremy Brawner
- Department of Plant PathologyUniversity of FloridaGainesvilleFloridaUSA
- Genics Ltd.Saint LuciaQueenslandAustralia
| |
Collapse
|
10
|
Mothukuri SR, Beyene Y, Gültas M, Burgueño J, Griebel S. Optimization of sparse phenotyping strategy in multi-environmental trials in maize. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2025; 138:62. [PMID: 40016556 PMCID: PMC11868319 DOI: 10.1007/s00122-025-04825-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Accepted: 01/15/2025] [Indexed: 03/01/2025]
Abstract
KEY MESSAGE The relatedness between the genotypes of the training and the testing set using sparse phenotyping experiments helps optimize the line allocation by utilizing the relationship measurements to reduce cost without compromising the genetic gain. The phenotyping needs to be optimized and aims to achieve desired precision at low costs because selection decisions are mainly based on multi-environmental trials. Optimization of sparse phenotyping is possible in plant breeding by applying relationship measurements and genomic prediction. Our research utilized genomic data and relationship measurements between the training (full testing genotypes) and testing sets (sparse testing genotypes) to optimize the allocation of genotypes to subsets in sparse testing. Different sparse phenotyping designs were mimicked based on the percentage (%) of lines in the full set, the number of partially tested lines, the number of tested environments, and balanced and unbalanced methods for allocating the lines among the environments. The eight relationship measurements were utilized to calculate the relatedness between full and sparse set genotypes. The results demonstrate that balanced and allocating 50% of lines to the full set designs have shown a higher Pearson correlation in terms of accuracy measurements than assigning the 30% of lines to the full set and balanced sparse methods. By reducing untested environments per sparse set, results enhance the accuracy of measurements. The relationship measurements exhibit a low significant Pearson correlation ranging from 0.20 to 0.31 using the accuracy measurements in sparse phenotyping experiments. The positive Pearson correlation shows that the maximization of the accuracy measurements can be helpful to the optimization of the line allocation on sparse phenotyping designs.
Collapse
Affiliation(s)
- S R Mothukuri
- Faculty of Agriculture, University of Göttingen, Büsgenweg 5, 37077, Göttingen, Germany.
| | - Y Beyene
- Global Maize Program, International Maize and Wheat Improvement Center, ICRAF House, United Nations Avenue, 41Village Market Gigiri, Nairobi, 00621, Kenya
| | - M Gültas
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494, Soest, Germany
| | - J Burgueño
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center, Carretera México-Veracruz, Km. 45, El Batán, 56237, Texcoco, México.
| | - S Griebel
- Department of Crop Sciences, Faculty of Agricultural Sciences, Georg-August-University Göttingen, 37075, Göttingen, Germany
| |
Collapse
|
11
|
Washburn JD, Varela JI, Xavier A, Chen Q, Ertl D, Gage JL, Holland JB, Lima DC, Romay MC, Lopez-Cruz M, de los Campos G, Barber W, Zimmer C, Trucillo Silva I, Rocha F, Rincent R, Ali B, Hu H, Runcie DE, Gusev K, Slabodkin A, Bax P, Aubert J, Gangloff H, Mary-Huard T, Vanrenterghem T, Quesada-Traver C, Yates S, Ariza-Suárez D, Ulrich A, Wyler M, Kick DR, Bellis ES, Causey JL, Soriano Chavez E, Wang Y, Piyush V, Fernando GD, Hu RK, Kumar R, Timon AJ, Venkatesh R, Segura Abá K, Chen H, Ranaweera T, Shiu SH, Wang P, Gordon MJ, Amos BK, Busato S, Perondi D, Gogna A, Psaroudakis D, Chen CPJ, Al-Mamun HA, Danilevicz MF, Upadhyaya SR, Edwards D, de Leon N. Global genotype by environment prediction competition reveals that diverse modeling strategies can deliver satisfactory maize yield estimates. Genetics 2025; 229:iyae195. [PMID: 39576009 PMCID: PMC12054733 DOI: 10.1093/genetics/iyae195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 11/13/2024] [Indexed: 11/27/2024] Open
Abstract
Predicting phenotypes from a combination of genetic and environmental factors is a grand challenge of modern biology. Slight improvements in this area have the potential to save lives, improve food and fuel security, permit better care of the planet, and create other positive outcomes. In 2022 and 2023, the first open-to-the-public Genomes to Fields initiative Genotype by Environment prediction competition was held using a large dataset including genomic variation, phenotype and weather measurements, and field management notes gathered by the project over 9 years. The competition attracted registrants from around the world with representation from academic, government, industry, and nonprofit institutions as well as unaffiliated. These participants came from diverse disciplines, including plant science, animal science, breeding, statistics, computational biology, and others. Some participants had no formal genetics or plant-related training, and some were just beginning their graduate education. The teams applied varied methods and strategies, providing a wealth of modeling knowledge based on a common dataset. The winner's strategy involved 2 models combining machine learning and traditional breeding tools: 1 model emphasized environment using features extracted by random forest, ridge regression, and least squares, and 1 focused on genetics. Other high-performing teams' methods included quantitative genetics, machine learning/deep learning, mechanistic models, and model ensembles. The dataset factors used, such as genetics, weather, and management data, were also diverse, demonstrating that no single model or strategy is far superior to all others within the context of this competition.
Collapse
Affiliation(s)
- Jacob D Washburn
- USDA-ARS, MWA-PGRU, 302-A Curtis Hall, University of Missouri, Columbia, MO 65211, USA
| | - José Ignacio Varela
- Department of Plant and Agroecosystem Sciences, University of Wisconsin—Madison, 1575 Linden Drive, Madison, WI 53706, USA
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA 50131, USA
| | - Alencar Xavier
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA 50131, USA
- Department of Agronomy, Purdue University, 915 Mitch Daniels Blvd, West Lafayette, IN 47907, USA
| | - Qiuyue Chen
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - David Ertl
- Iowa Corn Promotion Board, Johnston, IA 50131, USA
| | - Joseph L Gage
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - James B Holland
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
- USDA-ARS, Plant Science Research Unit, Raleigh, NC 27695, USA
| | - Dayane Cristina Lima
- Department of Plant and Agroecosystem Sciences, University of Wisconsin—Madison, 1575 Linden Drive, Madison, WI 53706, USA
| | - Maria Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Marco Lopez-Cruz
- Departments of Epidemiology and Biostatistics and Statistics and Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr, East Lansing, MI 48823, USA
| | - Gustavo de los Campos
- Departments of Epidemiology and Biostatistics and Statistics and Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr, East Lansing, MI 48823, USA
| | - Wesley Barber
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA 50131, USA
| | | | | | - Fabiani Rocha
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA 50131, USA
| | - Renaud Rincent
- Université Paris—Saclay, INRAE, CNRS, AgroParisTech, GQE—Le Moulon, 91190 Gif-sur-Yvette, France
| | - Baber Ali
- Université Paris—Saclay, INRAE, CNRS, AgroParisTech, GQE—Le Moulon, 91190 Gif-sur-Yvette, France
| | - Haixiao Hu
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA 95616, USA
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA 95616, USA
| | - Kirill Gusev
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE 19808, USA
| | - Andrei Slabodkin
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE 19808, USA
| | - Phillip Bax
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE 19808, USA
| | - Julie Aubert
- Université Paris—Saclay, AgroParisTech, INRAE, UMR MIA Paris—Saclay, 91120 Palaiseau, France
| | - Hugo Gangloff
- Université Paris—Saclay, AgroParisTech, INRAE, UMR MIA Paris—Saclay, 91120 Palaiseau, France
| | - Tristan Mary-Huard
- Université Paris—Saclay, INRAE, CNRS, AgroParisTech, GQE—Le Moulon, 91190 Gif-sur-Yvette, France
- Université Paris—Saclay, AgroParisTech, INRAE, UMR MIA Paris—Saclay, 91120 Palaiseau, France
| | - Theodore Vanrenterghem
- Université Paris—Saclay, AgroParisTech, INRAE, UMR MIA Paris—Saclay, 91120 Palaiseau, France
| | - Carles Quesada-Traver
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Steven Yates
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Daniel Ariza-Suárez
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Argeo Ulrich
- Puregene AG, Etzmatt 273, CH-4314 Zeiningen, Switzerland
- Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zürich, Switzerland
| | - Michele Wyler
- MWSchmid GmbH, Hauptstrasse 34, CH-8750 Glarus, Switzerland
| | - Daniel R Kick
- USDA-ARS, MWA-PGRU, 302-A Curtis Hall, University of Missouri, Columbia, MO 65211, USA
| | - Emily S Bellis
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd, Jonesboro, AR 72401, USA
| | - Jason L Causey
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd, Jonesboro, AR 72401, USA
| | - Emilio Soriano Chavez
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd, Jonesboro, AR 72401, USA
| | - Yixing Wang
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd, Jonesboro, AR 72401, USA
| | - Ved Piyush
- Department of Statistics, University of Nebraska—Lincoln, 340 Hardin Hall North Wing, Lincoln, NE 68583, USA
| | - Gayara D Fernando
- Department of Statistics, University of Nebraska—Lincoln, 340 Hardin Hall North Wing, Lincoln, NE 68583, USA
| | - Robert K Hu
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Rachit Kumar
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
- Medical Scientist Training Program, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104, USA
| | - Annan J Timon
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Rasika Venkatesh
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Kenia Segura Abá
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI 48824, USA
| | - Huan Chen
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI 48824, USA
| | - Thilanka Ranaweera
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Shin-Han Shiu
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Peiran Wang
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - Max J Gordon
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - B Kirtley Amos
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - Sebastiano Busato
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - Daniel Perondi
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - Abhishek Gogna
- Department of Breeding Research, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben 6466, Germany
| | - Dennis Psaroudakis
- Department of Molecular Genetics, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben 6466, Germany
| | | | - Hawlader A Al-Mamun
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| | - Monica F Danilevicz
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| | - Shriprabha R Upadhyaya
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| | - David Edwards
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| | - Natalia de Leon
- Department of Plant and Agroecosystem Sciences, University of Wisconsin—Madison, 1575 Linden Drive, Madison, WI 53706, USA
| |
Collapse
|
12
|
Tian Z, Nepomuceno AL, Song Q, Stupar RM, Liu B, Kong F, Ma J, Lee SH, Jackson SA. Soybean2035: A decadal vision for soybean functional genomics and breeding. MOLECULAR PLANT 2025; 18:245-271. [PMID: 39772289 DOI: 10.1016/j.molp.2025.01.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Revised: 12/29/2024] [Accepted: 01/05/2025] [Indexed: 01/31/2025]
Abstract
Soybean, the fourth most important crop in the world, uniquely serves as a source of both plant oil and plant protein for the world's food and animal feed. Although soybean production has increased approximately 13-fold over the past 60 years, the continually growing global population necessitates further increases in soybean production. In the past, especially in the last decade, significant progress has been made in both functional genomics and molecular breeding. However, many more challenges should be overcome to meet the anticipated future demand. Here, we summarize past achievements in the areas of soybean omics, functional genomics, and molecular breeding. Furthermore, we analyze trends in these areas, including shortages and challenges, and propose new directions, potential approaches, and possible outputs toward 2035. Our views and perspectives provide insight into accelerating the development of elite soybean varieties to meet the increasing demands of soybean production.
Collapse
Affiliation(s)
- Zhixi Tian
- Yazhouwan National Laboratory, Sanya, Hainan, China.
| | | | - Qingxin Song
- State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China.
| | - Robert M Stupar
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, USA.
| | - Bin Liu
- State Key Laboratory of Crop Gene Resources and Breeding, Key Laboratory of Soybean Biology (Beijing) (MARA), Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.
| | - Fanjiang Kong
- Guangdong Provincial Key Laboratory of Plant Adaptation and Molecular Design, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou, China.
| | - Jianxin Ma
- Department of Agronomy, Purdue University, West Lafayette, IN, USA.
| | - Suk-Ha Lee
- Department of Agriculture, Forestry and Bioresources and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea.
| | - Scott A Jackson
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA, USA.
| |
Collapse
|
13
|
Crossa J, Montesinos-Lopez OA, Costa-Neto G, Vitale P, Martini JWR, Runcie D, Fritsche-Neto R, Montesinos-Lopez A, Pérez-Rodríguez P, Gerard G, Dreisigacker S, Crespo-Herrera L, Pierre CS, Lillemo M, Cuevas J, Bentley A, Ortiz R. Machine learning algorithms translate big data into predictive breeding accuracy. TRENDS IN PLANT SCIENCE 2025; 30:167-184. [PMID: 39462718 DOI: 10.1016/j.tplants.2024.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 08/23/2024] [Accepted: 09/23/2024] [Indexed: 10/29/2024]
Abstract
Statistical machine learning (ML) extracts patterns from extensive genomic, phenotypic, and environmental data. ML algorithms automatically identify relevant features and use cross-validation to ensure robust models and improve prediction reliability in new lines. Furthermore, ML analyses of genotype-by-environment (G×E) interactions can offer insights into the genetic factors that affect performance in specific environments. By leveraging historical breeding data, ML streamlines strategies and automates analyses to reveal genomic patterns. In this review we examine the transformative impact of big data, including multi-trait genomics, phenomics, and environmental covariables, on genomic-enabled prediction in plant breeding. We discuss how big data and ML are revolutionizing the field by enhancing prediction accuracy, deepening our understanding of G×E interactions, and optimizing breeding strategies through the analysis of extensive and diverse datasets.
Collapse
Affiliation(s)
- José Crossa
- Louisiana State University, College of Agriculture, Baton Rouge, LA, USA; Colegio de Postgraduados, Montecillos, CP 56230, Estado de México, Mexico; International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico; Department of Statistics and Operations Research and Distinguished Scientist Fellowship Program, King Saud University, Riyadh 11451, Saudi Arabia
| | | | | | - Paolo Vitale
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico
| | | | - Daniel Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA, USA
| | | | - Abelardo Montesinos-Lopez
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430 Guadalajara, Jalisco, Mexico
| | | | - Guillermo Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico
| | - Susanna Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico
| | - Leonardo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico
| | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico
| | - Morten Lillemo
- Norwegian University of Life Science (NMBU), Department of Plant Science, Ås, Norway
| | - Jaime Cuevas
- Universidad de Quintana Roo, Chetumal, Quintana Roo, 77019, Mexico
| | - Alison Bentley
- Australian National University, Research School of Biology, Canberra, NSW, Australia.
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), PO Box 190 Sundsvagen 10, SE 23422 Lomma, Sweden.
| |
Collapse
|
14
|
Jung M, Quesada-Traver C, Roth M, Aranzana MJ, Muranty H, Rymenants M, Guerra W, Holzknecht E, Pradas N, Lozano L, Didelot F, Laurens F, Yates S, Studer B, Broggini GAL, Patocchi A. Integrative multi-environmental genomic prediction in apple. HORTICULTURE RESEARCH 2025; 12:uhae319. [PMID: 40041603 PMCID: PMC11879405 DOI: 10.1093/hr/uhae319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Accepted: 11/07/2024] [Indexed: 03/06/2025]
Abstract
Genomic prediction for multiple environments can aid the selection of genotypes suited to specific soil and climate conditions. Methodological advances allow effective integration of phenotypic, genomic (additive, nonadditive), and large-scale environmental (enviromic) data into multi-environmental genomic prediction models. These models can also account for genotype-by-environment interaction, utilize alternative relationship matrices (kernels), or substitute statistical approaches with deep learning. However, the application of multi-environmental genomic prediction in apple remained limited, likely due to the challenge of building multi-environmental datasets and structurally complex models. Here, we applied efficient statistical and deep learning models for multi-environmental genomic prediction of eleven apple traits with contrasting genetic architectures by integrating genomic- and enviromic-based model components. Incorporating genotype-by-environment interaction effects into statistical models improved predictive ability by up to 0.08 for nine traits compared to the benchmark model. This outcome, based on Gaussian and Deep kernels, shows these alternatives can effectively substitute the standard genomic best linear unbiased predictor (G-BLUP). Including nonadditive and enviromic-based effects resulted in a predictive ability very similar to the benchmark model. The deep learning approach achieved the highest predictive ability for three traits with oligogenic genetic architectures, outperforming the benchmark by up to 0.10. Our results demonstrate that the tested statistical models capture genotype-by-environment interactions particularly well, and the deep learning models efficiently integrate data from diverse sources. This study will foster the adoption of multi-environmental genomic prediction to select apple cultivars adapted to diverse environmental conditions, providing an opportunity to address climate change impacts.
Collapse
Affiliation(s)
- Michaela Jung
- Fruit Breeding, Agroscope, Mueller-Thurgau-Strasse 29, 8820 Waedenswil, Switzerland
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitaetstrasse 2, 8092 Zurich, Switzerland
| | - Carles Quesada-Traver
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitaetstrasse 2, 8092 Zurich, Switzerland
| | - Morgane Roth
- INRAE, Research Unit for Genetics and Improvement of Fruit and Vegetable (GAFL), 67 Allée des Chênes, 84143 Montfavet, France
| | - Maria José Aranzana
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, 08193 Barcelona, Spain
- IRTA (Institut de Recerca i Tecnologia Agroalimentàries), Caldes de Montbui, 08140 Barcelona, Spain
| | - Hélène Muranty
- Univ Angers, Institut Agro, INRAE, IRHS, SFR QuaSaV, F-49000 Angers, France
| | - Marijn Rymenants
- Better3fruit N.V., Steenberg 36, 3202 Rillaar, Belgium
- Laboratory for Plant Genetics and Crop Improvement, Division of Crop Biotechnics, Department of Biosystems, University of Leuven, Willem de Croylaan 42 - bus 2427, 3001 Leuven, Belgium
| | - Walter Guerra
- Research Centre Laimburg, Institute for Fruit Growing and Viticulture, Laimburg 1, 39040 Auer, Italy
| | - Elias Holzknecht
- Research Centre Laimburg, Institute for Fruit Growing and Viticulture, Laimburg 1, 39040 Auer, Italy
| | - Nicole Pradas
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, 08193 Barcelona, Spain
| | - Lidia Lozano
- IRTA (Institut de Recerca i Tecnologia Agroalimentàries), Caldes de Montbui, 08140 Barcelona, Spain
| | | | - François Laurens
- Univ Angers, Institut Agro, INRAE, IRHS, SFR QuaSaV, F-49000 Angers, France
| | - Steven Yates
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitaetstrasse 2, 8092 Zurich, Switzerland
| | - Bruno Studer
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitaetstrasse 2, 8092 Zurich, Switzerland
| | - Giovanni A L Broggini
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitaetstrasse 2, 8092 Zurich, Switzerland
| | - Andrea Patocchi
- Fruit Breeding, Agroscope, Mueller-Thurgau-Strasse 29, 8820 Waedenswil, Switzerland
| |
Collapse
|
15
|
Hu H, Rincent R, Runcie DE. MegaLMM improves genomic predictions in new environments using environmental covariates. Genetics 2025; 229:1-41. [PMID: 39471330 PMCID: PMC11708919 DOI: 10.1093/genetics/iyae171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 09/19/2024] [Accepted: 09/25/2024] [Indexed: 11/01/2024] Open
Abstract
Multienvironment trials (METs) are crucial for identifying varieties that perform well across a target population of environments. However, METs are typically too small to sufficiently represent all relevant environment-types, and face challenges from changing environment-types due to climate change. Statistical methods that enable prediction of variety performance for new environments beyond the METs are needed. We recently developed MegaLMM, a statistical model that can leverage hundreds of trials to significantly improve genetic value prediction accuracy within METs. Here, we extend MegaLMM to enable genomic prediction in new environments by learning regressions of latent factor loadings on Environmental Covariates (ECs) across trials. We evaluated the extended MegaLMM using the maize Genome-To-Fields dataset, consisting of 4,402 varieties cultivated in 195 trials with 87.1% of phenotypic values missing, and demonstrated its high accuracy in genomic prediction under various breeding scenarios. Furthermore, we showcased MegaLMM's superiority over univariate GBLUP in predicting trait performance of experimental genotypes in new environments. Finally, we explored the use of higher-dimensional quantitative ECs and discussed when and how detailed environmental data can be leveraged for genomic prediction from METs. We propose that MegaLMM can be applied to plant breeding of diverse crops and different fields of genetics where large-scale linear mixed models are utilized.
Collapse
Affiliation(s)
- Haixiao Hu
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| | - Renaud Rincent
- GQE - Le Moulon Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Gif-sur-Yvette 91190, France
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| |
Collapse
|
16
|
Resende RT, Xavier A, Silva PIT, Resende MPM, Jarquin D, Marcatti GE. GIS-based G × E modeling of maize hybrids through enviromic markers engineering. THE NEW PHYTOLOGIST 2025; 245:102-116. [PMID: 39014516 PMCID: PMC11617650 DOI: 10.1111/nph.19951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 06/22/2024] [Indexed: 07/18/2024]
Abstract
Through enviromics, precision breeding leverages innovative geotechnologies to customize crop varieties to specific environments, potentially improving both crop yield and genetic selection gains. In Brazil's four southernmost states, data from 183 distinct geographic field trials (also accounting for 2017-2021) covered information on 164 genotypes: 79 phenotyped maize hybrid genotypes for grain yield and their 85 nonphenotyped parents. Additionally, 1342 envirotypic covariates from weather, soil, sensor-based, and satellite sources were collected to engineer 10 K synthetic enviromic markers via machine learning. Soil, radiation light, and surface temperature variations remarkably affect differential genotype yield, hinting at ecophysiological adjustments including evapotranspiration and photosynthesis. The enviromic ensemble-based random regression model showcases superior predictive performance and efficiency compared to the baseline and kernel models, matching the best genotypes to specific geographic coordinates. Clustering analysis has identified regions that minimize genotype-environment (G × E) interactions. These findings underscore the potential of enviromics in crafting specific parental combinations to breed new, higher-yielding hybrid crops. The adequate use of envirotypic information can enhance the precision and efficiency of maize breeding by providing important inputs about the environmental factors that affect the average crop performance. Generating enviromic markers associated with grain yield can enable a better selection of hybrids for specific environments.
Collapse
Affiliation(s)
- Rafael T. Resende
- Plant Breeding Sector, School of Agronomy (EA)Federal University of Goiás (UFG)Av. Esperança, s/n, Samambaia CampusGoiâniaGO74690‐900Brazil
- TheCROP, A Precision Breeding ProjectAv. Esperança, n° 1533, FUNAPE, Samambaia Technological Park, Samambaia Campus – UFGGoiâniaGO74690‐612Brazil
| | - Alencar Xavier
- Corteva Agriscience8305 NW 62ndAveJohnstonIA50131USA
- Purdue University915 Mitch Daniels BlvdWest LafayetteIN47907USA
| | | | - Marcela P. M. Resende
- Plant Breeding Sector, School of Agronomy (EA)Federal University of Goiás (UFG)Av. Esperança, s/n, Samambaia CampusGoiâniaGO74690‐900Brazil
| | - Diego Jarquin
- University of Florida1604 McCarty Drive G052B McCarty Hall DGainesvilleFL32611USA
| | - Gustavo E. Marcatti
- TheCROP, A Precision Breeding ProjectAv. Esperança, n° 1533, FUNAPE, Samambaia Technological Park, Samambaia Campus – UFGGoiâniaGO74690‐612Brazil
- Forest Engineering DepartmentFederal University of São João del Rei (UFSJ)Sete Lagoas Campus, MG‐424 Highway, Km 47Sete LagoasMG35701‐970Brazil
| |
Collapse
|
17
|
Nannuru VKR, Dieseth JA, Lillemo M, Meuwissen THE. Evaluating genomic selection and speed breeding for Fusarium head blight resistance in wheat using stochastic simulations. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2025; 45:14. [PMID: 39803632 PMCID: PMC11717775 DOI: 10.1007/s11032-024-01527-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 12/15/2024] [Indexed: 01/16/2025]
Abstract
Genomic selection-based breeding programs offer significant advantages over conventional phenotypic selection, particularly in accelerating genetic gains in plant breeding, as demonstrated by simulations focused on combating Fusarium head blight (FHB) in wheat. FHB resistance, a crucial trait, is challenging to breed for due to its quantitative inheritance and environmental influence, leading to slow progress using conventional breeding methods. Stochastic simulations in our study compared various breeding schemes, incorporating genomic selection (GS) and combining it with speed breeding, against conventional phenotypic selection. Two datasets were simulated, reflecting real-life genotypic data (MASBASIS) and a simulated wheat breeding program (EXAMPLE). Initially a 20-year burn-in phase using a conventional phenotypic selection method followed by a 20-year advancement phase with three GS-based breeding programs (GSF2F8, GSF8, and SpeedBreeding + GS) were evaluated alongside over a conventional phenotypic selection method. Results consistently showed significant increases in genetic gain with GS-based programs compared to phenotypic selection, irrespective of the selection strategies employed. Among the GS schemes, SpeedBreeding + GS consistently outperformed others, generating the highest genetic gains. This combination effectively minimized generation intervals within the breeding cycle, enhancing efficiency. This study underscores the advantages of genomic selection in accelerating breeding gains for wheat, particularly in combating FHB. By leveraging genomic information and innovative techniques like speed breeding, breeders can efficiently select for desired traits, significantly reducing testing time and costs associated with conventional phenotypic methods. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-024-01527-z.
Collapse
Affiliation(s)
| | | | - Morten Lillemo
- Department of Plant Sciences, Norwegian University of Life Sciences, 1432 Ås, Norway
| | - Theodorus H. E. Meuwissen
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, 1432 Ås, Norway
| |
Collapse
|
18
|
de Verdal H, Segura V, Pot D, Salas N, Garin V, Rakotoson T, Raboin LM, VomBrocke K, Dusserre J, Castro Pacheco SA, Grenier C. Performance of phenomic selection in rice: Effects of population size and genotype-environment interactions on predictive ability. PLoS One 2024; 19:e0309502. [PMID: 39715250 DOI: 10.1371/journal.pone.0309502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Accepted: 12/03/2024] [Indexed: 12/25/2024] Open
Abstract
Phenomic prediction (PP), a novel approach utilizing Near Infrared Spectroscopy (NIRS) data, offers an alternative to genomic prediction (GP) for breeding applications. In PP, a hyperspectral relationship matrix replaces the genomic relationship matrix, potentially capturing both additive and non-additive genetic effects. While PP boasts advantages in cost and throughput compared to GP, the factors influencing its accuracy remain unclear and need to be defined. This study investigated the impact of various factors, namely the training population size, the multi-environment information integration, and the incorporations of genotype x environment (GxE) effects, on PP compared to GP. We evaluated the prediction accuracies for several agronomically important traits (days to flowering, plant height, yield, harvest index, thousand-grain weight, and grain nitrogen content) in a rice diversity panel grown in four distinct environments. Training population size and GxE effects inclusion had minimal influence on PP accuracy. The key factor impacting the accuracy of PP was the number of environments included. Using data from a single environment, GP generally outperformed PP. However, with data from multiple environments, using genotypic random effect and relationship matrix per environment, PP achieved comparable accuracies to GP. Combining PP and GP information did not significantly improve predictions compared to the best model using a single source of information (e.g., average predictive ability of GP, PP, and combined GP and PP for grain yield were of 0.44, 0.42, and 0.44, respectively). Our findings suggest that PP can be as accurate as GP when all genotypes have at least one NIRS measurement, potentially offering significant advantages for rice breeding programs, reducing the breeding cycles and lowering program costs.
Collapse
Affiliation(s)
- Hugues de Verdal
- AGAP Institut, Université Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
- AfricaRice Centre Régional Sénégal, BP96, Saint-Louis, Sénégal
| | - Vincent Segura
- AGAP Institut, Université Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- Geno-Vigne®, IFV-INRAE-Institut Agro, Montpellier, France
| | - David Pot
- AGAP Institut, Université Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| | - Niclolas Salas
- AGAP Institut, Université Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| | - Vincent Garin
- AGAP Institut, Université Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| | - Tatiana Rakotoson
- Institut d'Enseignement Supérieur d'Antsirabe Vakinankaratra (IESAV), Université d'Antananarivo, Antananarivo, Madagascar
| | | | - Kirsten VomBrocke
- AGAP Institut, Université Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| | | | - Sergio Antonion Castro Pacheco
- AGAP Institut, Université Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- CIRAD, UMR AGAP Institut, Antsirabe, Madagascar
- Dispositif en Partenariat Système de Production d'Altitudes Durable (DP-SPAD), Antsirabe, Madagascar
| | - Cecile Grenier
- AGAP Institut, Université Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| |
Collapse
|
19
|
Shahi D, Guo J, Pradhan S, Avci M, Bai G, Khan J, Baik BK, Mergoum M, Babar MA. Genome-Wide Association Study and Genomic Prediction of Soft Wheat End-Use Quality Traits Under Post-Anthesis Heat-Stressed Conditions. BIOLOGY 2024; 13:962. [PMID: 39765629 PMCID: PMC11727209 DOI: 10.3390/biology13120962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 11/15/2024] [Accepted: 11/17/2024] [Indexed: 01/15/2025]
Abstract
Wheat end-use quality is an important component of a wheat breeding program. Heat stress during grain filling impacts wheat quality traits, making it crucial to understand the genetic basis of wheat quality traits under post-anthesis heat stress. This study aimed to identify the genomic regions associated with wheat quality traits using genome-wide association studies (GWASs) and evaluate the prediction accuracy of different genomic selection (GS) models. A panel of 236 soft red facultative wheat genotypes was evaluated for end-use quality traits across four heat-stressed environments over three years. Significant phenotypic variation was observed across environments for traits such as grain yield (GY), grain protein (GP), grain hardness (GH), and flour yield (AFY). Heritability estimates ranged from 0.52 (GY) to 0.91 (GH). The GWASs revealed 136 significant marker-trait associations (MTAs) across all 21 chromosomes, with several MTAs located within candidate genes involved in stress responses and quality traits. Genomic selection models showed prediction accuracy values up to 0.60, with within-environment prediction outperforming across-environment prediction. These results suggest that integrating GWAS and GS approaches can enhance the selection of wheat quality traits under heat stress, contributing to the development of heat-tolerant varieties.
Collapse
Affiliation(s)
- Dipendra Shahi
- School of Plant, Environmental and Soil Sciences, Louisiana State Agricultural Center, Baton Rouge, LA 70803, USA;
| | - Jia Guo
- Inari Agriculture, 1281 Win Hentschel Blvd w1108, West Lafayette, IN 47906, USA;
| | - Sumit Pradhan
- Department of Agronomy, University of Florida, 3105 McCarty Hall B, Gainesville, FL 32611, USA; (S.P.); (M.A.)
| | - Muhsin Avci
- Department of Agronomy, University of Florida, 3105 McCarty Hall B, Gainesville, FL 32611, USA; (S.P.); (M.A.)
| | - Guihua Bai
- USDA-ARS Hard Winter Wheat Genetics Research Unit, Manhattan, KS 66506, USA;
| | - Jahangir Khan
- PARC-Balochistan Agricultural Research and Development Center, Quetta 87300, Pakistan;
| | - Byung-Kee Baik
- 16USDA-ARS, Corn, Soybean and Wheat Quality Research Laboratory Unit, Wooster, OH 44691, USA;
| | - Mohamed Mergoum
- 0260 Redding Building, Department of Agronomy, 1109 Experiment St, Griffin, GA 30223, USA;
| | - Md Ali Babar
- Department of Agronomy, University of Florida, 3105 McCarty Hall B, Gainesville, FL 32611, USA; (S.P.); (M.A.)
| |
Collapse
|
20
|
Kimutai JJC, Makumbi D, Burgueño J, Pérez-Rodríguez P, Crossa J, Gowda M, Menkir A, Pacheco A, Ifie BE, Tongoona P, Danquah EY, Prasanna BM. Genomic prediction of the performance of tropical doubled haploid maize lines under artificial Striga hermonthica (Del.) Benth. infestation. G3 (BETHESDA, MD.) 2024; 14:jkae186. [PMID: 39129203 PMCID: PMC11457060 DOI: 10.1093/g3journal/jkae186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 07/23/2024] [Accepted: 07/31/2024] [Indexed: 08/13/2024]
Abstract
Striga hermonthica (Del.) Benth., a parasitic weed, causes substantial yield losses in maize production in sub-Saharan Africa. Breeding for Striga resistance in maize is constrained by limited genetic diversity for Striga resistance within the elite germplasm and phenotyping capacity under artificial Striga infestation. Genomics-enabled approaches have the potential to accelerate identification of Striga resistant lines for hybrid development. The objectives of this study were to evaluate the accuracy of genomic selection for traits associated with Striga resistance and grain yield (GY) and to predict genetic values of tested and untested doubled haploid maize lines. We genotyped 606 doubled haploid lines with 8,439 rAmpSeq markers. A training set of 116 doubled haploid lines crossed to 2 testers was phenotyped under artificial Striga infestation at 3 locations in Kenya. Heritability for Striga resistance parameters ranged from 0.38-0.65 while that for GY was 0.54. The prediction accuracies for Striga resistance-associated traits across locations, as determined by cross-validation (CV) were 0.24-0.53 for CV0 and from 0.20 to 0.37 for CV2. For GY, the prediction accuracies were 0.59 and 0.56 for CV0 and CV2, respectively. The results revealed 300 doubled haploid lines with desirable genomic estimated breeding values for reduced number of emerged Striga plants (STR) at 8, 10, and 12 weeks after planting. The genomic estimated breeding values of doubled haploid lines for Striga resistance-associated traits in the training and testing sets were similar in magnitude. These results highlight the potential application of genomic selection in breeding for Striga resistance in maize. The integration of genomic-assisted strategies and doubled haploid technology for line development coupled with forward breeding for major adaptive traits will enhance genetic gains in breeding for Striga resistance in maize.
Collapse
Affiliation(s)
- Joan J C Kimutai
- Global Maize Program, International Maize and Wheat Improvement Center (CIMMYT), P.O. Box 1041–00621, Nairobi, Kenya
- West Africa Centre for Crop Improvement (WACCI), University of Ghana, PMB 30 Legon, Accra, Ghana
| | - Dan Makumbi
- Global Maize Program, International Maize and Wheat Improvement Center (CIMMYT), P.O. Box 1041–00621, Nairobi, Kenya
| | - Juan Burgueño
- Biometrics and Statistics Unit, CIMMYT, Apdo. Postal 6–641, 06600 Mexico DF, Mexico
| | - Paulino Pérez-Rodríguez
- Socioeconomía, Estadística e Informática, Colegio de Postgraduados, Edo. de México 56230, Montecillos, Mexico
| | - Jose Crossa
- Biometrics and Statistics Unit, CIMMYT, Apdo. Postal 6–641, 06600 Mexico DF, Mexico
- Socioeconomía, Estadística e Informática, Colegio de Postgraduados, Edo. de México 56230, Montecillos, Mexico
| | - Manje Gowda
- Global Maize Program, International Maize and Wheat Improvement Center (CIMMYT), P.O. Box 1041–00621, Nairobi, Kenya
| | - Abebe Menkir
- International Institute of Tropical Agriculture (IITA), Oyo Road, PMB 5320, Ibadan, 200001, Nigeria
| | - Angela Pacheco
- Biometrics and Statistics Unit, CIMMYT, Apdo. Postal 6–641, 06600 Mexico DF, Mexico
| | - Beatrice E Ifie
- West Africa Centre for Crop Improvement (WACCI), University of Ghana, PMB 30 Legon, Accra, Ghana
- Institute of Biological, Environmental & Rural Sciences (IBERS), Aberystwyth University, Aberystwyth, SY23 3EE Wales, UK
| | - Pangirayi Tongoona
- West Africa Centre for Crop Improvement (WACCI), University of Ghana, PMB 30 Legon, Accra, Ghana
| | - Eric Y Danquah
- West Africa Centre for Crop Improvement (WACCI), University of Ghana, PMB 30 Legon, Accra, Ghana
| | - Boddupalli M Prasanna
- Global Maize Program, International Maize and Wheat Improvement Center (CIMMYT), P.O. Box 1041–00621, Nairobi, Kenya
| |
Collapse
|
21
|
Ndlovu N, Gowda M, Beyene Y, Das B, Mahabaleswara SL, Makumbi D, Ogugo V, Burgueno J, Crossa J, Spillane C, McKeown PC, Brychkova G, Prasanna BM. A combination of joint linkage and genome-wide association study reveals putative candidate genes associated with resistance to northern corn leaf blight in tropical maize. FRONTIERS IN PLANT SCIENCE 2024; 15:1448961. [PMID: 39421144 PMCID: PMC11484028 DOI: 10.3389/fpls.2024.1448961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 09/05/2024] [Indexed: 10/19/2024]
Abstract
Northern corn leaf blight (NCLB), caused by Setosphaeria turcica, is a major fungal disease affecting maize production in sub-Saharan Africa. Utilizing host plant resistance to mitigate yield losses associated with NCLB can serve as a cost-effective strategy. In this study, we conducted a high-resolution genome-wide association study (GWAS) in an association mapping panel and linkage mapping with three doubled haploid (DH) and three F3 populations of tropical maize. These populations were phenotyped for NCLB resistance across six hotspot environments in Kenya. Across environments and genotypes, NCLB scores ranged from 2.12 to 5.17 (on a scale of 1-9). NCLB disease severity scores exhibited significant genotypic variance and moderate-to-high heritability. From the six biparental populations, 23 quantitative trait loci (QTLs) were identified, each explaining between 2.7% and 15.8% of the observed phenotypic variance. Collectively, the detected QTLs explained 34.28%, 51.37%, 41.12%, 12.46%, 12.11%, and 14.66% of the total phenotypic variance in DH populations 1, 2, and 3 and F3 populations 4, 5, and 6, respectively. GWAS, using 337,110 high-quality single nucleotide polymorphisms (SNPs), identified 15 marker-trait associations and several putative candidate genes linked to NCLB resistance in maize. Joint linkage association mapping (JLAM) identified 37 QTLs for NCLB resistance. Using linkage mapping, JLAM, and GWAS, several QTLs were identified within the genomic region spanning 4 to 15 Mbp on chromosome 2. This genomic region represents a promising target for enhancing NCLB resistance via marker-assisted breeding. Genome-wide predictions revealed moderate correlations with mean values of 0.45, 0.44, 0.55, and 0.42 for within GWAS panel, DH pop1, DH pop2, and DH pop3, respectively. Prediction by incorporating marker-by-environment interactions did not show much improvement. Overall, our findings indicate that NCLB resistance is quantitative in nature and is controlled by few major-effect and many minor-effect QTLs. We conclude that genomic regions consistently detected across mapping approaches and populations should be prioritized for improving NCLB resistance, while genome-wide prediction results can help incorporate both major- and minor-effect genes. This study contributes to a deeper understanding of the genetic and molecular mechanisms driving maize resistance to NCLB.
Collapse
Affiliation(s)
- Noel Ndlovu
- Global Maize Program, International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
- Agriculture & Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
| | - Manje Gowda
- Global Maize Program, International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Yoseph Beyene
- Global Maize Program, International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Biswanath Das
- Global Maize Program, International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Suresh L. Mahabaleswara
- Global Maize Program, International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Dan Makumbi
- Global Maize Program, International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Veronica Ogugo
- Global Maize Program, International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Juan Burgueno
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Estado. de México, Mexico
| | - Jose Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Estado. de México, Mexico
| | - Charles Spillane
- Agriculture & Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
| | - Peter C. McKeown
- Agriculture & Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
| | - Galina Brychkova
- Agriculture & Bioeconomy Research Centre, Ryan Institute, University of Galway, Galway, Ireland
| | - Boddupalli M. Prasanna
- Global Maize Program, International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| |
Collapse
|
22
|
Kusmec A, Yeh CT'E, Schnable PS. Data-driven identification of environmental variables influencing phenotypic plasticity to facilitate breeding for future climates. THE NEW PHYTOLOGIST 2024; 244:618-634. [PMID: 39183371 DOI: 10.1111/nph.19937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 05/20/2024] [Indexed: 08/27/2024]
Abstract
Phenotypic plasticity describes a genotype's ability to produce different phenotypes in response to different environments. Breeding crops that exhibit appropriate levels of plasticity for future climates will be crucial to meeting global demand, but knowledge of the critical environmental factors is limited to a handful of well-studied major crops. Using 727 maize (Zea mays L.) hybrids phenotyped for grain yield in 45 environments, we investigated the ability of a genetic algorithm and two other methods to identify environmental determinants of grain yield from a large set of candidate environmental variables constructed using minimal assumptions. The genetic algorithm identified pre- and postanthesis maximum temperature, mid-season solar radiation, and whole season net evapotranspiration as the four most important variables from a candidate set of 9150. Importantly, these four variables are supported by previous literature. After calculating reaction norms for each environmental variable, candidate genes were identified and gene annotations investigated to demonstrate how this method can generate insights into phenotypic plasticity. The genetic algorithm successfully identified known environmental determinants of hybrid maize grain yield. This demonstrates that the methodology could be applied to other less well-studied phenotypes and crops to improve understanding of phenotypic plasticity and facilitate breeding crops for future climates.
Collapse
Affiliation(s)
- Aaron Kusmec
- Department of Agronomy, Iowa State University, Ames, IA, 50011-3650, USA
| | | | - Patrick S Schnable
- Department of Agronomy, Iowa State University, Ames, IA, 50011-3650, USA
- Plant Sciences Institute, Iowa State University, Ames, IA, 50011-3650, USA
| |
Collapse
|
23
|
Tibbs-Cortes LE, Guo T, Andorf CM, Li X, Yu J. Comprehensive identification of genomic and environmental determinants of phenotypic plasticity in maize. Genome Res 2024; 34:1253-1263. [PMID: 39271292 PMCID: PMC11444181 DOI: 10.1101/gr.279027.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 08/06/2024] [Indexed: 09/15/2024]
Abstract
Maize phenotypes are plastic, determined by the complex interplay of genetics and environmental variables. Uncovering the genes responsible and understanding how their effects change across a large geographic region are challenging. In this study, we conducted systematic analysis to identify environmental indices that strongly influence 19 traits (including flowering time, plant architecture, and yield component traits) measured in the maize nested association mapping (NAM) population grown in 11 environments. Identified environmental indices based on day length, temperature, moisture, and combinations of these are biologically meaningful. Next, we leveraged a total of more than 20 million SNP and SV markers derived from recent de novo sequencing of the NAM founders for trait prediction and dissection. When combined with identified environmental indices, genomic prediction enables accurate performance predictions. Genome-wide association studies (GWASs) detected genetic loci associated with the plastic response to the identified environmental indices for all examined traits. By systematically uncovering the major environmental and genomic factors underlying phenotypic plasticity in a wide variety of traits and depositing our results as a track on the MaizeGDB genome browser, we provide a community resource as well as a comprehensive analytical framework to facilitate continuing complex trait dissection and prediction in maize and other crops. Our findings also provide a conceptual framework for the genetic architecture of phenotypic plasticity by accommodating two alternative models, regulatory gene model and allelic sensitivity model, as special cases of a continuum.
Collapse
Affiliation(s)
- Laura E Tibbs-Cortes
- Department of Agronomy, Iowa State University, Ames, Iowa 50011, USA
- USDA-ARS, Wheat Health, Genetics, and Quality Research Unit, Pullman, Washington 99164, USA
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, Iowa 50011, USA
| | - Tingting Guo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Carson M Andorf
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, Iowa 50011, USA
- Department of Computer Science, Iowa State University, Ames, Iowa 50011, USA
| | - Xianran Li
- USDA-ARS, Wheat Health, Genetics, and Quality Research Unit, Pullman, Washington 99164, USA;
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, Iowa 50011, USA;
| |
Collapse
|
24
|
Washburn JD, Varela JI, Xavier A, Chen Q, Ertl D, Gage JL, Holland JB, Lima DC, Romay MC, Lopez-Cruz M, de los Campos G, Barber W, Zimmer C, Silva IT, Rocha F, Rincent R, Ali B, Hu H, Runcie DE, Gusev K, Slabodkin A, Bax P, Aubert J, Gangloff H, Mary-Huard T, Vanrenterghem T, Quesada-Traver C, Yates S, Ariza-Suárez D, Ulrich A, Wyler M, Kick DR, Bellis ES, Causey JL, Chavez ES, Wang Y, Piyush V, Fernando GD, Hu RK, Kumar R, Timon AJ, Venkatesh R, Abá KS, Chen H, Ranaweera T, Shiu SH, Wang P, Gordon MJ, Amos BK, Busato S, Perondi D, Gogna A, Psaroudakis D, Chen CPJ, Al-Mamun HA, Danilevicz MF, Upadhyaya SR, Edwards D, de Leon N. Global Genotype by Environment Prediction Competition Reveals That Diverse Modeling Strategies Can Deliver Satisfactory Maize Yield Estimates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.13.612969. [PMID: 39345633 PMCID: PMC11429743 DOI: 10.1101/2024.09.13.612969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Predicting phenotypes from a combination of genetic and environmental factors is a grand challenge of modern biology. Slight improvements in this area have the potential to save lives, improve food and fuel security, permit better care of the planet, and create other positive outcomes. In 2022 and 2023 the first open-to-the-public Genomes to Fields (G2F) initiative Genotype by Environment (GxE) prediction competition was held using a large dataset including genomic variation, phenotype and weather measurements and field management notes, gathered by the project over nine years. The competition attracted registrants from around the world with representation from academic, government, industry, and non-profit institutions as well as unaffiliated. These participants came from diverse disciplines include plant science, animal science, breeding, statistics, computational biology and others. Some participants had no formal genetics or plant-related training, and some were just beginning their graduate education. The teams applied varied methods and strategies, providing a wealth of modeling knowledge based on a common dataset. The winner's strategy involved two models combining machine learning and traditional breeding tools: one model emphasized environment using features extracted by Random Forest, Ridge Regression and Least-squares, and one focused on genetics. Other high-performing teams' methods included quantitative genetics, classical machine learning/deep learning, mechanistic models, and model ensembles. The dataset factors used, such as genetics; weather; and management data, were also diverse, demonstrating that no single model or strategy is far superior to all others within the context of this competition.
Collapse
Affiliation(s)
- Jacob D. Washburn
- USDA-ARS-MWA-PGRU, 302-A Curtis Hall, U. of MO., Columbia, MO, 65211, USA
| | - José Ignacio Varela
- Department of Plant and Agroecosystem Sciences, University of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | - Alencar Xavier
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
- Department of Agronomy, Purdue University, 915 Mitch Daniels Blvd, West Lafayette, IN 47907, United States
| | - Qiuyue Chen
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - David Ertl
- Iowa Corn Promotion Board, Johnston, IA, 50131, USA
| | - Joseph L. Gage
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - James B. Holland
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
- USDA-ARS Plant Science Research Unit, Raleigh, NC, 27695, USA
| | - Dayane Cristina Lima
- Department of Plant and Agroecosystem Sciences, University of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
| | - Maria Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Marco Lopez-Cruz
- Departments of Epidemiology & Biostatistics and Statistics & Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr., East Lansing, MI, 48823, USA
| | - Gustavo de los Campos
- Departments of Epidemiology & Biostatistics and Statistics & Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr., East Lansing, MI, 48823, USA
| | - Wesley Barber
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | - Cristiano Zimmer
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | | | - Fabiani Rocha
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | - Renaud Rincent
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
| | - Baber Ali
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
| | - Haixiao Hu
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA, 95616, USA
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA, 95616, USA
| | - Kirill Gusev
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE, 19808, USA
| | - Andrei Slabodkin
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE, 19808, USA
| | - Phillip Bax
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE, 19808, USA
| | - Julie Aubert
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Hugo Gangloff
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Tristan Mary-Huard
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Theodore Vanrenterghem
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Carles Quesada-Traver
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Steven Yates
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Daniel Ariza-Suárez
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Argeo Ulrich
- Puregene AG, Etzmatt 273, CH-4314 Zeiningen, Switzerland
- Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zürich, Switzerland
| | - Michele Wyler
- MWSchmid GmbH, Hauptstrasse 34, CH-8750 Glarus, Switzerland
| | - Daniel R. Kick
- USDA-ARS-MWA-PGRU, 302-A Curtis Hall, U. of MO., Columbia, MO, 65211, USA
| | - Emily S. Bellis
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Jason L. Causey
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Emilio Soriano Chavez
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Yixing Wang
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Ved Piyush
- Department of Statistics, University of Nebraska - Lincoln, 340 Hardin Hall North Wing, Lincoln, NE, 68583, USA
| | - Gayara D. Fernando
- Department of Statistics, University of Nebraska - Lincoln, 340 Hardin Hall North Wing, Lincoln, NE, 68583, USA
| | - Robert K Hu
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | - Rachit Kumar
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
- Medical Scientist Training Program, Perelman School of Medicine at the University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA, 19104, USA
| | - Annan J. Timon
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | - Rasika Venkatesh
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | - Kenia Segura Abá
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI, 48824, USA
| | - Huan Chen
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI, 48824, USA
| | - Thilanka Ranaweera
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Shin-Han Shiu
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Peiran Wang
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Max J. Gordon
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - B K. Amos
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Sebastiano Busato
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Daniel Perondi
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Abhishek Gogna
- Department of Breeding Research, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben, 6466, Germany
| | - Dennis Psaroudakis
- Department of Molecular Genetics, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben, 6466, Germany
| | - C. P. James Chen
- School of Animal Sciences, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Hawlader A. Al-Mamun
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - Monica F. Danilevicz
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - Shriprabha R. Upadhyaya
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - Natalia de Leon
- Department of Plant and Agroecosystem Sciences, University of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
| |
Collapse
|
25
|
Bartholomé J, Ospina JO, Sandoval M, Espinosa N, Arcos J, Ospina Y, Frouin J, Beartschi C, Ghneim T, Grenier C. Genomic selection for tolerance to aluminum toxicity in a synthetic population of upland rice. PLoS One 2024; 19:e0307009. [PMID: 39173048 PMCID: PMC11341055 DOI: 10.1371/journal.pone.0307009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 06/28/2024] [Indexed: 08/24/2024] Open
Abstract
Over half of the world's arable land is acidic, which constrains cereal production. In South America, different rice-growing regions (Cerrado in Brazil and Llanos in Colombia and Venezuela) are particularly affected due to high aluminum toxicity levels. For this reason, efforts have been made to breed for tolerance to aluminum toxicity using synthetic populations. The breeding program of CIAT-CIRAD is a good example of the use of recurrent selection to increase productivity for the Llanos in Colombia. In this study, we evaluated the performance of genomic prediction models to optimize the breeding scheme by hastening the development of an improved synthetic population and elite lines. We characterized 334 families at the S0:4 generation in two conditions. One condition was the control, managed with liming, while the other had high aluminum toxicity. Four traits were considered: days to flowering (FL), plant height (PH), grain yield (YLD), and zinc concentration in the polished grain (ZN). The population presented a high tolerance to aluminum toxicity, with more than 72% of the families showing a higher yield under aluminum conditions. The performance of the families under the aluminum toxicity condition was predicted using four different models: a single-environment model and three multi-environment models. The multi-environment models differed in the way they integrated genotype-by-environment interactions. The best predictive abilities were achieved using multi-environment models: 0.67 for FL, 0.60 for PH, 0.53 for YLD, and 0.65 for ZN. The gain of multi-environment over single-environment models ranged from 71% for YLD to 430% for FL. The selection of the best-performing families based on multi-trait indices, including the four traits mentioned above, facilitated the identification of suitable families for recombination. This information will be used to develop a new cycle of recurrent selection through genomic selection.
Collapse
Affiliation(s)
- Jérôme Bartholomé
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- Alliance Bioversity CIAT, Cali, Colombia
| | | | | | - Natalia Espinosa
- Alliance Bioversity CIAT, Cali, Colombia
- FEDEARROZ–Fondo Nacional del Arroz, Bogotá, Colombia
| | - Jairo Arcos
- HarvestPlus Program, Alliance Bioversity CIAT, Cali, Colombia
| | | | - Julien Frouin
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Cédric Beartschi
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Thaura Ghneim
- Departamento de Ciencias Biológicas, Universidad ICESI, Cali, Colombia
| | - Cécile Grenier
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| |
Collapse
|
26
|
Hudson O, Resende MFR, Messina C, Holland J, Brawner J. Prediction of resistance, virulence, and host-by-pathogen interactions using dual-genome prediction models. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:196. [PMID: 39105819 PMCID: PMC11303470 DOI: 10.1007/s00122-024-04698-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 07/17/2024] [Indexed: 08/07/2024]
Abstract
KEY MESSAGE Integrating disease screening data and genomic data for host and pathogen populations into prediction models provides breeders and pathologists with a unified framework to develop disease resistance. Developing disease resistance in crops typically consists of exposing breeding populations to a virulent strain of the pathogen that is causing disease. While including a diverse set of pathogens in the experiments would be desirable for developing broad and durable disease resistance, it is logistically complex and uncommon, and limits our capacity to implement dual (host-by-pathogen)-genome prediction models. Data from an alternative disease screening system that challenges a diverse sweet corn population with a diverse set of pathogen isolates are provided to demonstrate the changes in genetic parameter estimates that result from using genomic data to provide connectivity across sparsely tested experimental treatments. An inflation in genetic variance estimates was observed when among isolate relatedness estimates were included in prediction models, which was moderated when host-by-pathogen interaction effects were incorporated into models. The complete model that included genomic similarity matrices for host, pathogen, and interaction effects indicated that the proportion of phenotypic variation in lesion size that is attributable to host, pathogen, and interaction effects was similar. Estimates of the stability of lesion size predictions for host varieties inoculated with different isolates and the stability of isolates used to inoculate different hosts were also similar. In this pathosystem, genetic parameter estimates indicate that host, pathogen, and host-by-pathogen interaction predictions may be used to identify crop varieties that are resistant to specific virulence mechanisms and to guide the deployment of these sources of resistance into pathogen populations where they will be more effective.
Collapse
Affiliation(s)
- Owen Hudson
- Plant Pathology, University of Florida, Gainesville, FL, USA
| | - Marcio F R Resende
- Horticultural Sciences Department, University of Florida, Gainesville, FL, USA
- Plant Breeding Graduate Program, University of Florida, Gainesville, FL, USA
| | - Charlie Messina
- Horticultural Sciences Department, University of Florida, Gainesville, FL, USA
- Plant Breeding Graduate Program, University of Florida, Gainesville, FL, USA
| | - James Holland
- USDA-ARS Plant Science Research Unit and Department of Crop and Soil Sciences, Raleigh, USA
- North Carolina Plant Sciences Initiative, North Carolina State University, Raleigh, NC, 27695, USA
| | - Jeremy Brawner
- Plant Pathology, University of Florida, Gainesville, FL, USA.
- Genetic Solutions, Genics, St Lucia, Australia.
| |
Collapse
|
27
|
Fernandes IK, Vieira CC, Dias KOG, Fernandes SB. Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:189. [PMID: 39044035 PMCID: PMC11266441 DOI: 10.1007/s00122-024-04687-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 06/29/2024] [Indexed: 07/25/2024]
Abstract
KEY MESSAGE Incorporating feature-engineered environmental data into machine learning-based genomic prediction models is an efficient approach to indirectly model genotype-by-environment interactions. Complementing phenotypic traits and molecular markers with high-dimensional data such as climate and soil information is becoming a common practice in breeding programs. This study explored new ways to combine non-genetic information in genomic prediction models using machine learning. Using the multi-environment trial data from the Genomes To Fields initiative, different models to predict maize grain yield were adjusted using various inputs: genetic, environmental, or a combination of both, either in an additive (genetic-and-environmental; G+E) or a multiplicative (genotype-by-environment interaction; GEI) manner. When including environmental data, the mean prediction accuracy of machine learning genomic prediction models increased up to 7% over the well-established Factor Analytic Multiplicative Mixed Model among the three cross-validation scenarios evaluated. Moreover, using the G+E model was more advantageous than the GEI model given the superior, or at least comparable, prediction accuracy, the lower usage of computational memory and time, and the flexibility of accounting for interactions by construction. Our results illustrate the flexibility provided by the ML framework, particularly with feature engineering. We show that the feature engineering stage offers a viable option for envirotyping and generates valuable information for machine learning-based genomic prediction models. Furthermore, we verified that the genotype-by-environment interactions may be considered using tree-based approaches without explicitly including interactions in the model. These findings support the growing interest in merging high-dimensional genotypic and environmental data into predictive modeling.
Collapse
Affiliation(s)
- Igor K Fernandes
- Department of Crop, Soil, and Environmental Sciences, Center for Agricultural Data Analytics, University of Arkansas, Fayetteville, AR, USA
| | - Caio C Vieira
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Kaio O G Dias
- Department of General Biology, Federal University of Viçosa, Viçosa, Brazil
| | - Samuel B Fernandes
- Department of Crop, Soil, and Environmental Sciences, Center for Agricultural Data Analytics, University of Arkansas, Fayetteville, AR, USA.
| |
Collapse
|
28
|
Nascimento M, Nascimento ACC, Azevedo CF, de Oliveira ACB, Caixeta ET, Jarquin D. Enhancing genomic prediction with Stacking Ensemble Learning in Arabica Coffee. FRONTIERS IN PLANT SCIENCE 2024; 15:1373318. [PMID: 39086911 PMCID: PMC11288849 DOI: 10.3389/fpls.2024.1373318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 06/12/2024] [Indexed: 08/02/2024]
Abstract
Coffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits [yield (YL), total number of the fruits (NF), leaf miner infestation (LM), and cercosporiosis incidence (Cer)] in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. Genomic Best Linear Unbiased Prediction (GBLUP), multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The SEL method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the PA of GS for complex traits.
Collapse
Affiliation(s)
- Moyses Nascimento
- Laboratory of Intelligence Computational and Statistical Learning (LICAE), Department of Statistics, Federal University of Viçosa, Viçosa, Brazil
- Agronomy Department, University of Florida, Gainesville, FL, United States
| | - Ana Carolina Campana Nascimento
- Laboratory of Intelligence Computational and Statistical Learning (LICAE), Department of Statistics, Federal University of Viçosa, Viçosa, Brazil
- Agronomy Department, University of Florida, Gainesville, FL, United States
| | - Camila Ferreira Azevedo
- Laboratory of Intelligence Computational and Statistical Learning (LICAE), Department of Statistics, Federal University of Viçosa, Viçosa, Brazil
| | | | | | - Diego Jarquin
- Agronomy Department, University of Florida, Gainesville, FL, United States
| |
Collapse
|
29
|
Tadese D, Piepho HP, Hartung J. Accuracy of prediction from multi-environment trials for new locations using pedigree information and environmental covariates: the case of sorghum (Sorghum bicolor (L.) Moench) breeding. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:181. [PMID: 38985188 PMCID: PMC11236881 DOI: 10.1007/s00122-024-04684-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 06/25/2024] [Indexed: 07/11/2024]
Abstract
KEY MESSAGES We investigate a method of extracting and fitting synthetic environmental covariates and pedigree information in multilocation trial data analysis to predict genotype performances in untested locations. Plant breeding trials are usually conducted across multiple testing locations to predict genotype performances in the targeted population of environments. The predictive accuracy can be increased by the use of adequate statistical models. We compared linear mixed models with and without synthetic covariates (SCs) and pedigree information under the identity, the diagonal and the factor-analytic variance-covariance structures of the genotype-by-location interactions. A comparison was made to evaluate the accuracy of different models in predicting genotype performances in untested locations using the mean squared error of predicted differences (MSEPD) and the Spearman rank correlation between predicted and adjusted means. A multi-environmental trial (MET) dataset evaluated for yield performance in the dry lowland sorghum (Sorghum bicolor (L.) Moench) breeding program of Ethiopia was used. For validating our models, we followed a leave-one-location-out cross-validation strategy. A total of 65 environmental covariates (ECs) obtained from the sorghum test locations were considered. The SCs were extracted from the ECs using multivariate partial least squares analysis and subsequently fitted in the linear mixed model. Then, the model was extended accounting for pedigree information. According to the MSEPD, models accounting for SC improve predictive accuracy of genotype performances in the three of the variance-covariance structures compared to others without SC. The rank correlation was also higher for the model with the SC. When the SC was fitted, the rank correlation was 0.58 for the factor analytic, 0.51 for the diagonal and 0.46 for the identity variance-covariance structures. Our approach indicates improvement in predictive accuracy with SC in the context of genotype-by-location interactions of a sorghum breeding in Ethiopia.
Collapse
Affiliation(s)
- Diriba Tadese
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Fruwirthstraße 23, 70599, Stuttgart, Germany.
| | - Hans-Peter Piepho
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Fruwirthstraße 23, 70599, Stuttgart, Germany
| | - Jens Hartung
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Fruwirthstraße 23, 70599, Stuttgart, Germany
| |
Collapse
|
30
|
Adak A, DeSalvio AJ, Arik MA, Murray SC. Field-based high-throughput phenotyping enhances phenomic and genomic predictions for grain yield and plant height across years in maize. G3 (BETHESDA, MD.) 2024; 14:jkae092. [PMID: 38776257 PMCID: PMC11228873 DOI: 10.1093/g3journal/jkae092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 04/24/2024] [Indexed: 05/24/2024]
Abstract
Field-based phenomic prediction employs novel features, like vegetation indices (VIs) from drone images, to predict key agronomic traits in maize, despite challenges in matching biomarker measurement time points across years or environments. This study utilized functional principal component analysis (FPCA) to summarize the variation of temporal VIs, uniquely allowing the integration of this data into phenomic prediction models tested across multiple years (2018-2021) and environments. The models, which included 1 genomic, 2 phenomic, 2 multikernel, and 1 multitrait type, were evaluated in 4 prediction scenarios (CV2, CV1, CV0, and CV00), relevant for plant breeding programs, assessing both tested and untested genotypes in observed and unobserved environments. Two hybrid populations (415 and 220 hybrids) demonstrated the visible atmospherically resistant index's strong temporal correlation with grain yield (up to 0.59) and plant height. The first 2 FPCAs explained 59.3 ± 13.9% and 74.2 ± 9.0% of the temporal variation of temporal data of VIs, respectively, facilitating predictions where flight times varied. Phenomic data, particularly when combined with genomic data, often were comparable to or numerically exceeded the base genomic model in prediction accuracy, particularly for grain yield in untested hybrids, although no significant differences in these models' performance were consistently observed. Overall, this approach underscores the effectiveness of FPCA and combined models in enhancing the prediction of grain yield and plant height across environments and diverse agricultural settings.
Collapse
Affiliation(s)
- Alper Adak
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843-2474, USA
| | - Aaron J DeSalvio
- Interdisciplinary Graduate Program in Genetics and Genomics (Department of Biochemistry and Biophysics), Texas A&M University, College Station, TX 77843-2128, USA
| | - Mustafa A Arik
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843-2474, USA
| | - Seth C Murray
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843-2474, USA
| |
Collapse
|
31
|
Gaur A, Jindal Y, Singh V, Tiwari R, Juliana P, Kaushik D, Kumar KJY, Ahlawat OP, Singh G, Sheoran S. GWAS elucidated grain yield genetics in Indian spring wheat under diverse water conditions. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:177. [PMID: 38972024 DOI: 10.1007/s00122-024-04680-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 06/11/2024] [Indexed: 07/08/2024]
Abstract
KEY MESSAGE Underpinned natural variations and key genes associated with yield under different water regimes, and identified genomic signatures of genetic gain in the Indian wheat breeding program. A novel KASP marker for TKW under water stress was developed and validated. A comprehensive genome-wide association study was conducted on 300 spring wheat genotypes to elucidate the natural variations associated with grain yield and its eleven contributing traits under fully irrigated, restricted water, and simulated no water conditions. Utilizing the 35K Wheat Breeders' Array, we identified 1155 quantitative trait nucleotides (QTNs), with 207 QTNs exhibiting stability across diverse conditions. These QTNs were further delimited into 539 genomic regions using a genome-wide LD value of 3.0 Mbp, revealing pleiotropic control across traits and conditions. Sub-genome A was significantly associated with traits under irrigated conditions, while sub-genome B showed more QTNs under water stressed conditions. Favourable alleles with significantly associated QTNs were delineated, with a notable pyramiding effect for enhancing trait performance. Additionally, allele of only 921 QTNs significantly affected the population mean. Allele profiling highlighted C-306 as a most potential source of drought tolerance. Moreover, 762 genes overlapping significant QTNs were identified, narrowing down to 27 putative candidate genes overlapping 29 novel and functional SNPs expressing (≥ 0.5 tpm) relevance across various growth conditions. A new KASP assay was developed, targeting a gene TraesCS2A03G1123700 regulating thousand kernel weight under severe drought condition. Genomic selection models (GBLUP, BayesB, MxE, and R-Norm) demonstrated an average prediction accuracy of 0.06-0.58 across environments, indicating potential for trait selection. Retrospective analysis of the Indian wheat breeding program supported a genetic gain in GY at the rate of ca. 0.56% per breeding cycle, since 1960, supporting the identification of genomic signatures driving trait selection and genetic gain. These findings offer insight into improving the rate of genetic gain in wheat breeding programs globally.
Collapse
Affiliation(s)
- Arpit Gaur
- Department of Genetics and Plant Breeding, CCS Haryana Agricultural University, Hisar, India
- Crop Improvement, ICAR- Indian Institute of Wheat and Barley Research, Karnal, India
| | - Yogesh Jindal
- Department of Genetics and Plant Breeding, CCS Haryana Agricultural University, Hisar, India
| | - Vikram Singh
- Department of Genetics and Plant Breeding, CCS Haryana Agricultural University, Hisar, India
| | - Ratan Tiwari
- Crop Improvement, ICAR- Indian Institute of Wheat and Barley Research, Karnal, India
| | | | - Deepak Kaushik
- Department of Genetics and Plant Breeding, CCS Haryana Agricultural University, Hisar, India
| | | | - Om Parkash Ahlawat
- Crop Improvement, ICAR- Indian Institute of Wheat and Barley Research, Karnal, India
| | - Gyanendra Singh
- Crop Improvement, ICAR- Indian Institute of Wheat and Barley Research, Karnal, India
| | - Sonia Sheoran
- Crop Improvement, ICAR- Indian Institute of Wheat and Barley Research, Karnal, India.
| |
Collapse
|
32
|
Ali B, Huguenin-Bizot B, Laurent M, Chaumont F, Maistriaux LC, Nicolas S, Duborjal H, Welcker C, Tardieu F, Mary-Huard T, Moreau L, Charcosset A, Runcie D, Rincent R. High-dimensional multi-omics measured in controlled conditions are useful for maize platform and field trait predictions. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:175. [PMID: 38958724 DOI: 10.1007/s00122-024-04679-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 06/15/2024] [Indexed: 07/04/2024]
Abstract
KEY MESSAGE Transcriptomics and proteomics information collected on a platform can predict additive and non-additive effects for platform traits and additive effects for field traits. The effects of climate change in the form of drought, heat stress, and irregular seasonal changes threaten global crop production. The ability of multi-omics data, such as transcripts and proteins, to reflect a plant's response to such climatic factors can be capitalized in prediction models to maximize crop improvement. Implementing multi-omics characterization in field evaluations is challenging due to high costs. It is, however, possible to do it on reference genotypes in controlled conditions. Using omics measured on a platform, we tested different multi-omics-based prediction approaches, using a high dimensional linear mixed model (MegaLMM) to predict genotypes for platform traits and agronomic field traits in a panel of 244 maize hybrids. We considered two prediction scenarios: in the first one, new hybrids are predicted (CV-NH), and in the second one, partially observed hybrids are predicted (CV-POH). For both scenarios, all hybrids were characterized for omics on the platform. We observed that omics can predict both additive and non-additive genetic effects for the platform traits, resulting in much higher predictive abilities than GBLUP. It highlights their efficiency in capturing regulatory processes in relation to growth conditions. For the field traits, we observed that the additive components of omics only slightly improved predictive abilities for predicting new hybrids (CV-NH, model MegaGAO) and for predicting partially observed hybrids (CV-POH, model GAOxW-BLUP) in comparison to GBLUP. We conclude that measuring the omics in the fields would be of considerable interest in predicting productivity if the costs of omics drop significantly.
Collapse
Affiliation(s)
- Baber Ali
- INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Université Paris-Saclay, 91190, Gif-Sur-Yvette, France
| | - Bertrand Huguenin-Bizot
- Laboratoire Reproduction Et Développement Des Plantes, CNRS, ENS de Lyon-46, Allée d'Italie, 69364, Lyon, France
| | - Maxime Laurent
- Louvain Institute of Biomolecular Science and Technology, UCLouvain, Louvain-La-Neuve, Belgium
| | - François Chaumont
- Louvain Institute of Biomolecular Science and Technology, UCLouvain, Louvain-La-Neuve, Belgium
| | - Laurie C Maistriaux
- Louvain Institute of Biomolecular Science and Technology, UCLouvain, Louvain-La-Neuve, Belgium
| | - Stéphane Nicolas
- INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Université Paris-Saclay, 91190, Gif-Sur-Yvette, France
| | - Hervé Duborjal
- Limagrain, Limagrain Fields Seeds, Research Centre, 63720, Chappes, France
| | | | | | - Tristan Mary-Huard
- INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Université Paris-Saclay, 91190, Gif-Sur-Yvette, France
| | - Laurence Moreau
- INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Université Paris-Saclay, 91190, Gif-Sur-Yvette, France
| | - Alain Charcosset
- INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Université Paris-Saclay, 91190, Gif-Sur-Yvette, France
| | - Daniel Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA, USA
| | - Renaud Rincent
- INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Université Paris-Saclay, 91190, Gif-Sur-Yvette, France.
| |
Collapse
|
33
|
Sadeh R, Ben-David R, Herrmann I, Peleg Z. Spectral-genomic chain-model approach enhances the wheat yield component prediction under the Mediterranean climate. PHYSIOLOGIA PLANTARUM 2024; 176:e14480. [PMID: 39187437 DOI: 10.1111/ppl.14480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 06/25/2024] [Accepted: 06/27/2024] [Indexed: 08/28/2024]
Abstract
In light of the changing climate that jeopardizes future food security, genomic selection is emerging as a valuable tool for breeders to enhance genetic gains and introduce high-yielding varieties. However, predicting grain yield is challenging due to the genetic and physiological complexities involved and the effect of genetic-by-environment interactions on prediction accuracy. We utilized a chained model approach to address these challenges, breaking down the complex prediction task into simpler steps. A diversity panel with a narrow phenological range was phenotyped across three Mediterranean environments for various morpho-physiological and yield-related traits. The results indicated that a multi-environment model outperformed a single-environment model in prediction accuracy for most traits. However, prediction accuracy for grain yield was not improved. Thus, in an attempt to ameliorate the grain yield prediction accuracy, we integrated a spectral estimation of spike number, being a major wheat yield component, with genomic data. A machine learning approach was used for spike number estimation from canopy hyperspectral reflectance captured by an unmanned aerial vehicle. The spectral-based estimated spike number was utilized as a secondary trait in a multi-trait genomic selection, significantly improving grain yield prediction accuracy. Moreover, the ability to predict the spike number based on data from previous seasons implies that it could be applied to new trials at various scales, even in small plot sizes. Overall, we demonstrate here that incorporating a novel spectral-genomic chain-model workflow, which utilizes spectral-based phenotypes as a secondary trait, improves the predictive accuracy of wheat grain yield.
Collapse
Affiliation(s)
- Roy Sadeh
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Roi Ben-David
- Institute of Plant Sciences, Agriculture Research Organization (ARO)-Volcani Institute, Rishon LeZion, Israel
| | - Ittai Herrmann
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Zvi Peleg
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, The Hebrew University of Jerusalem, Rehovot, Israel
| |
Collapse
|
34
|
Tiezzi F, Goda K, Morgante F. Using lifestyle information in polygenic modeling of blood pressure traits: a simple method to reduce bias. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.05.597631. [PMID: 38895222 PMCID: PMC11185601 DOI: 10.1101/2024.06.05.597631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Complex traits are determined by the effects of multiple genetic variants, multiple environmental factors, and potentially their interaction. Predicting complex trait phenotypes from genotypes is a fundamental task in quantitative genetics that was pioneered in agricultural breeding for selection purposes. However, it has recently become important in human genetics. While prediction accuracy for some human complex traits is appreciable, this remains low for most traits. A promising way to improve prediction accuracy is by including not only genetic information but also environmental information in prediction models. However, environmental factors can, in turn, be genetically determined. This phenomenon gives rise to a correlation between the genetic and environmental components of the phenotype, which violates the assumption of independence between the genetic and environmental components of most statistical methods for polygenic modeling. In this work, we investigated the impact of including 27 lifestyle variables as well as genotype information (and their interaction) for predicting diastolic blood pressure, systolic blood pressure, and pulse pressure in older individuals in UK Biobank. The 27 lifestyle variables were included as either raw variables or adjusted by genetic and other non-genetic factors. The results show that including both lifestyle and genetic data improved prediction accuracy compared to using either piece of information alone. Both prediction accuracy and bias can improve substantially for some traits when the models account for the lifestyle variables after their proper adjustment. Our work confirms the utility of including environmental information in polygenic models of complex traits and highlights the importance of proper handling of the environmental variables.
Collapse
Affiliation(s)
- Francesco Tiezzi
- Department of Agriculture, Food, Environment and Forestry (DAGRI), University of Florence, Florence, Italy
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Khushi Goda
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| |
Collapse
|
35
|
Carvalho HF, Rio S, García-Abadillo J, Isidro Y Sánchez J. Revisiting superiority and stability metrics of cultivar performances using genomic data: derivations of new estimators. PLANT METHODS 2024; 20:85. [PMID: 38844940 PMCID: PMC11155189 DOI: 10.1186/s13007-024-01207-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 05/08/2024] [Indexed: 06/10/2024]
Abstract
The selection of highly productive genotypes with stable performance across environments is a major challenge of plant breeding programs due to genotype-by-environment (GE) interactions. Over the years, different metrics have been proposed that aim at characterizing the superiority and/or stability of genotype performance across environments. However, these metrics are traditionally estimated using phenotypic values only and are not well suited to an unbalanced design in which genotypes are not observed in all environments. The objective of this research was to propose and evaluate new estimators of the following GE metrics: Ecovalence, Environmental Variance, Finlay-Wilkinson regression coefficient, and Lin-Binns superiority measure. Drawing from a multi-environment genomic prediction model, we derived the best linear unbiased prediction for each GE metric. These derivations included both a squared expectation and a variance term. To assess the effectiveness of our new estimators, we conducted simulations that varied in traits and environment parameters. In our results, new estimators consistently outperformed traditional phenotype-based estimators in terms of accuracy. By incorporating a variance term into our new estimators, in addition to the squared expectation term, we were able to improve the precision of our estimates, particularly for Ecovalence in situations where heritability was low and/or sparseness was high. All methods are implemented in a new R-package: GEmetrics. These genomic-based estimators enable estimating GE metrics in unbalanced designs and predicting GE metrics for new genotypes, which should help improve the selection efficiency of high-performance and stable genotypes across environments.
Collapse
Affiliation(s)
- Humberto Fanelli Carvalho
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA)-Universidad Politécnica de Madrid (UPM)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain
| | - Simon Rio
- CIRAD, UMR AGAP Institut, 34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Julian García-Abadillo
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA)-Universidad Politécnica de Madrid (UPM)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain
| | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA)-Universidad Politécnica de Madrid (UPM)-Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223, Pozuelo de Alarcón, Madrid, Spain.
| |
Collapse
|
36
|
Resende RT, Hickey L, Amaral CH, Peixoto LL, Marcatti GE, Xu Y. Satellite-enabled enviromics to enhance crop improvement. MOLECULAR PLANT 2024; 17:848-866. [PMID: 38637991 DOI: 10.1016/j.molp.2024.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 04/04/2024] [Accepted: 04/11/2024] [Indexed: 04/20/2024]
Abstract
Enviromics refers to the characterization of micro- and macroenvironments based on large-scale environmental datasets. By providing genotypic recommendations with predictive extrapolation at a site-specific level, enviromics could inform plant breeding decisions across varying conditions and anticipate productivity in a changing climate. Enviromics-based integration of statistics, envirotyping (i.e., determining environmental factors), and remote sensing could help unravel the complex interplay of genetics, environment, and management. To support this goal, exhaustive envirotyping to generate precise environmental profiles would significantly improve predictions of genotype performance and genetic gain in crops. Already, informatics management platforms aggregate diverse environmental datasets obtained using optical, thermal, radar, and light detection and ranging (LiDAR)sensors that capture detailed information about vegetation, surface structure, and terrain. This wealth of information, coupled with freely available climate data, fuels innovative enviromics research. While enviromics holds immense potential for breeding, a few obstacles remain, such as the need for (1) integrative methodologies to systematically collect field data to scale and expand observations across the landscape with satellite data; (2) state-of-the-art AI models for data integration, simulation, and prediction; (3) cyberinfrastructure for processing big data across scales and providing seamless interfaces to deliver forecasts to stakeholders; and (4) collaboration and data sharing among farmers, breeders, physiologists, geoinformatics experts, and programmers across research institutions. Overcoming these challenges is essential for leveraging the full potential of big data captured by satellites to transform 21st century agriculture and crop improvement through enviromics.
Collapse
Affiliation(s)
- Rafael T Resende
- Universidade Federal de Goiás (UFG), Agronomy Department, Plant Breeding Sector, Goiânia (GO) 74690-900, Brazil; TheCROP, a Precision-Breeding Startup: Enviromics, Phenomics, and Genomics, No Zip-code, Operating Virtually, Goiânia (GO) and Sete Lagoas (MG), Brazil.
| | - Lee Hickey
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Cibele H Amaral
- Earth Lab, Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, CO 80303, USA; Environmental Data Science Innovation & Inclusion Lab, Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, CO 80303, USA
| | - Lucas L Peixoto
- Universidade Federal de Goiás (UFG), Agronomy Department, Plant Breeding Sector, Goiânia (GO) 74690-900, Brazil
| | - Gustavo E Marcatti
- TheCROP, a Precision-Breeding Startup: Enviromics, Phenomics, and Genomics, No Zip-code, Operating Virtually, Goiânia (GO) and Sete Lagoas (MG), Brazil; Universidade Federal de São João del-Rei, Forest Engineering Department, Campus Sete Lagoas, Sete Lagoas (MG) 35701-970, Brazil
| | - Yunbi Xu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China; BGI Bioverse, Shenzhen 518083, China.
| |
Collapse
|
37
|
Montesinos-López OA, Crespo-Herrera L, Pierre CS, Cano-Paez B, Huerta-Prado GI, Mosqueda-González BA, Ramos-Pulido S, Gerard G, Alnowibet K, Fritsche-Neto R, Montesinos-López A, Crossa J. Feature engineering of environmental covariates improves plant genomic-enabled prediction. FRONTIERS IN PLANT SCIENCE 2024; 15:1349569. [PMID: 38812738 PMCID: PMC11135473 DOI: 10.3389/fpls.2024.1349569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/11/2024] [Indexed: 05/31/2024]
Abstract
Introduction Because Genomic selection (GS) is a predictive methodology, it needs to guarantee high-prediction accuracies for practical implementations. However, since many factors affect the prediction performance of this methodology, its practical implementation still needs to be improved in many breeding programs. For this reason, many strategies have been explored to improve the prediction performance of this methodology. Methods When environmental covariates are incorporated as inputs in the genomic prediction models, this information only sometimes helps increase prediction performance. For this reason, this investigation explores the use of feature engineering on the environmental covariates to enhance the prediction performance of genomic prediction models. Results and discussion We found that across data sets, feature engineering helps reduce prediction error regarding only the inclusion of the environmental covariates without feature engineering by 761.625% across predictors. These results are very promising regarding the potential of feature engineering to enhance prediction accuracy. However, since a significant gain in prediction accuracy was observed in only some data sets, further research is required to guarantee a robust feature engineering strategy to incorporate the environmental covariates.
Collapse
Affiliation(s)
| | | | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de Mexico, Mexico
| | - Bernabe Cano-Paez
- Facultad de Ciencias, Universidad Nacioanl Autónoma de México (UNAM), México City, Mexico
| | | | | | - Sofia Ramos-Pulido
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Guillermo Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de Mexico, Mexico
| | - Khalid Alnowibet
- Department of Statistics and Operations Research, King Saud University, Riyah, Saudi Arabia
| | | | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de Mexico, Mexico
- Louisiana State University, Baton Rouge, LA, United States
- Distinguished Scientist Fellowship Program, King Saud University, Riyah, Saudi Arabia
- Instituto de Socieconomia, Estadistica e Informatica, Colegio de Postgraduados, Montecillos, Edo. de México, Texcoco, Mexico
| |
Collapse
|
38
|
Peixoto MA, Leach KA, Jarquin D, Flannery P, Zystro J, Tracy WF, Bhering L, Resende MFR. Utilizing genomic prediction to boost hybrid performance in a sweet corn breeding program. FRONTIERS IN PLANT SCIENCE 2024; 15:1293307. [PMID: 38726298 PMCID: PMC11080654 DOI: 10.3389/fpls.2024.1293307] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 03/26/2024] [Indexed: 05/12/2024]
Abstract
Sweet corn breeding programs, like field corn, focus on the development of elite inbred lines to produce commercial hybrids. For this reason, genomic selection models can help the in silico prediction of hybrid crosses from the elite lines, which is hypothesized to improve the test cross scheme, leading to higher genetic gain in a breeding program. This study aimed to explore the potential of implementing genomic selection in a sweet corn breeding program through hybrid prediction in a within-site across-year and across-site framework. A total of 506 hybrids were evaluated in six environments (California, Florida, and Wisconsin, in the years 2020 and 2021). A total of 20 traits from three different groups were measured (plant-, ear-, and flavor-related traits) across the six environments. Eight statistical models were considered for prediction, as the combination of two genomic prediction models (GBLUP and RKHS) with two different kernels (additive and additive + dominance), and in a single- and multi-trait framework. Also, three different cross-validation schemes were tested (CV1, CV0, and CV00). The different models were then compared based on the correlation between the estimated breeding values/total genetic values and phenotypic measurements. Overall, heritabilities and correlations varied among the traits. The models implemented showed good accuracies for trait prediction. The GBLUP implementation outperformed RKHS in all cross-validation schemes and models. Models with additive plus dominance kernels presented a slight improvement over the models with only additive kernels for some of the models examined. In addition, models for within-site across-year and across-site performed better in the CV0 than the CV00 scheme, on average. Hence, GBLUP should be considered as a standard model for sweet corn hybrid prediction. In addition, we found that the implementation of genomic prediction in a sweet corn breeding program presented reliable results, which can improve the testcross stage by identifying the top candidates that will reach advanced field-testing stages.
Collapse
Affiliation(s)
- Marco Antônio Peixoto
- Laboratório de Biometria, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
- Department of Horticultural Sciences, University of Florida, Gainesville, FL, United States
| | - Kristen A. Leach
- Department of Horticultural Sciences, University of Florida, Gainesville, FL, United States
| | - Diego Jarquin
- Department of Agronomy, University of Florida, Gainesville, FL, United States
| | - Patrick Flannery
- Department of Plant and Agroecosystem Sciences, University of Wisconsin-Madison, Madison, WI, United States
| | - Jared Zystro
- Organic Seed Alliance, Port Townsend, WA, United States
| | - William F. Tracy
- Department of Plant and Agroecosystem Sciences, University of Wisconsin-Madison, Madison, WI, United States
| | - Leonardo Bhering
- Laboratório de Biometria, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Márcio F. R. Resende
- Department of Horticultural Sciences, University of Florida, Gainesville, FL, United States
| |
Collapse
|
39
|
Liu X, Wang M, Qin J, Liu Y, Wang S, Wu S, Zhang M, Zhong J, Wang J. GbyE: an integrated tool for genome widely association study and genome selection based on genetic by environmental interaction. BMC Genomics 2024; 25:386. [PMID: 38641604 PMCID: PMC11027269 DOI: 10.1186/s12864-024-10310-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 04/15/2024] [Indexed: 04/21/2024] Open
Abstract
BACKGROUND The growth and development of organism were dependent on the effect of genetic, environment, and their interaction. In recent decades, lots of candidate additive genetic markers and genes had been detected by using genome-widely association study (GWAS). However, restricted to computing power and practical tool, the interactive effect of markers and genes were not revealed clearly. And utilization of these interactive markers is difficult in the breeding and prediction, such as genome selection (GS). RESULTS Through the Power-FDR curve, the GbyE algorithm can detect more significant genetic loci at different levels of genetic correlation and heritability, especially at low heritability levels. The additive effect of GbyE exhibits high significance on certain chromosomes, while the interactive effect detects more significant sites on other chromosomes, which were not detected in the first two parts. In prediction accuracy testing, in most cases of heritability and genetic correlation, the majority of prediction accuracy of GbyE is significantly higher than that of the mean method, regardless of whether the rrBLUP model or BGLR model is used for statistics. The GbyE algorithm improves the prediction accuracy of the three Bayesian models BRR, BayesA, and BayesLASSO using information from genetic by environmental interaction (G × E) and increases the prediction accuracy by 9.4%, 9.1%, and 11%, respectively, relative to the Mean value method. The GbyE algorithm is significantly superior to the mean method in the absence of a single environment, regardless of the combination of heritability and genetic correlation, especially in the case of high genetic correlation and heritability. CONCLUSIONS Therefore, this study constructed a new genotype design model program (GbyE) for GWAS and GS using Kronecker product. which was able to clearly estimate the additive and interactive effects separately. The results showed that GbyE can provide higher statistical power for the GWAS and more prediction accuracy of the GS models. In addition, GbyE gives varying degrees of improvement of prediction accuracy in three Bayesian models (BRR, BayesA, and BayesCpi). Whatever the phenotype were missed in the single environment or multiple environments, the GbyE also makes better prediction for inference population set. This study helps us understand the interactive relationship between genomic and environment in the complex traits. The GbyE source code is available at the GitHub website ( https://github.com/liu-xinrui/GbyE ).
Collapse
Affiliation(s)
- Xinrui Liu
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 6110041, China
- Nanchong Academy of Agricultural Sciences, Nanchong, 637000, China
| | - Mingxiu Wang
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 6110041, China
| | - Jie Qin
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 6110041, China
| | - Yaxin Liu
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 6110041, China
| | - Shikai Wang
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 6110041, China
| | - Shiyu Wu
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 6110041, China
| | - Ming Zhang
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 6110041, China
| | - Jincheng Zhong
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 6110041, China
| | - Jiabo Wang
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 6110041, China.
| |
Collapse
|
40
|
Alemu A, Åstrand J, Montesinos-López OA, Isidro Y Sánchez J, Fernández-Gónzalez J, Tadesse W, Vetukuri RR, Carlsson AS, Ceplitis A, Crossa J, Ortiz R, Chawade A. Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024; 17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]
Abstract
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
Collapse
Affiliation(s)
- Admas Alemu
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Johanna Åstrand
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden; Lantmännen Lantbruk, Svalöv, Sweden
| | | | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Javier Fernández-Gónzalez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Wuletaw Tadesse
- International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat, Morocco
| | - Ramesh R Vetukuri
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Anders S Carlsson
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco, México 52640, Mexico
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| |
Collapse
|
41
|
Crozier D, Winans ND, Hoffmann L, Patil NY, Klein PE, Klein RR, Rooney WL. Evaluating and Predicting the Performance of Sorghum Lines in an Elite by Exotic Backcross-Nested Association Mapping Population. PLANTS (BASEL, SWITZERLAND) 2024; 13:879. [PMID: 38592905 PMCID: PMC10975396 DOI: 10.3390/plants13060879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/09/2024] [Accepted: 03/14/2024] [Indexed: 04/11/2024]
Abstract
Maintaining or introducing genetic diversity into plant breeding programs is necessary for continual genetic gain; however, diversity at the cost of reduced performance is not something sought by breeders. To this end, backcross-nested association mapping (BC-NAM) populations, in which the recurrent parent is an elite line, can be employed as a strategy to introgress diversity from unadapted accessions while maintaining agronomic performance. This study evaluates (i) the hybrid performance of sorghum lines from 18 BC1-NAM families and (ii) the potential of genomic prediction to screen lines from BC1-NAM families for hybrid performance prior to phenotypic evaluation. Despite the diverse geographical origins and agronomic performance of the unadapted parents for BC1-NAM families, many BC1-derived lines performed significantly better in the hybrid trials than the elite recurrent parent, R.Tx436. The genomic prediction accuracies for grain yield, plant height, and days to mid-anthesis were acceptable, but the prediction accuracies for plant height were lower than expected. While the prediction accuracies increased when including more individuals in the training set, improvements tended to plateau between two and five lines per family, with larger training sets being required for more complex traits such as grain yield. Therefore, genomic prediction models can be optimized in a large BC1-NAM population with a relatively low fraction of individuals needing to be evaluated. These results suggest that genomic prediction is an effective method of pre-screening lines within BC1-NAM families prior to evaluation in extensive hybrid field trials.
Collapse
Affiliation(s)
- Daniel Crozier
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Noah D. Winans
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Leo Hoffmann
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
- Department of Horticulture Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Nikhil Y. Patil
- Department of Horticultural Sciences, Texas A&M University, College Station, TX 77845, USA
- Health Sciences Center, University of Oklahoma, Oklahoma City, OK 73104, USA
| | - Patricia E. Klein
- Health Sciences Center, University of Oklahoma, Oklahoma City, OK 73104, USA
| | - Robert R. Klein
- Crop Germplasm Research Unit, United States Department of Agriculture Agricultural Research Service, College Station, TX 77843, USA;
| | - William L. Rooney
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
42
|
Araújo MS, Chaves SFS, Dias LAS, Ferreira FM, Pereira GR, Bezerra ARG, Alves RS, Heinemann AB, Breseghello F, Carneiro PCS, Krause MD, Costa-Neto G, Dias KOG. GIS-FA: an approach to integrating thematic maps, factor-analytic, and envirotyping for cultivar targeting. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:80. [PMID: 38472532 DOI: 10.1007/s00122-024-04579-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 02/06/2024] [Indexed: 03/14/2024]
Abstract
KEY MESSAGE We propose an "enviromics" prediction model for recommending cultivars based on thematic maps aimed at decision-makers. Parsimonious methods that capture genotype-by-environment interaction (GEI) in multi-environment trials (MET) are important in breeding programs. Understanding the causes and factors of GEI allows the utilization of genotype adaptations in the target population of environments through environmental features and factor-analytic (FA) models. Here, we present a novel predictive breeding approach called GIS-FA, which integrates geographic information systems (GIS) techniques, FA models, partial least squares (PLS) regression, and enviromics to predict phenotypic performance in untested environments. The GIS-FA approach enables: (i) the prediction of the phenotypic performance of tested genotypes in untested environments, (ii) the selection of the best-ranking genotypes based on their overall performance and stability using the FA selection tools, and (iii) the creation of thematic maps showing overall or pairwise performance and stability for decision-making. We exemplify the usage of the GIS-FA approach using two datasets of rice [Oryza sativa (L.)] and soybean [Glycine max (L.) Merr.] in MET spread over tropical areas. In summary, our novel predictive method allows the identification of new breeding scenarios by pinpointing groups of environments where genotypes demonstrate superior predicted performance. It also facilitates and optimizes cultivar recommendations by utilizing thematic maps.
Collapse
Affiliation(s)
- Maurício S Araújo
- Department of Agronomy, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | - Saulo F S Chaves
- Department of Agronomy, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | - Luiz A S Dias
- Department of Agronomy, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | - Filipe M Ferreira
- Department of Crop Science - College of Agricultural Sciences, São Paulo State University, Botucatu, São Paulo, Brazil
| | - Guilherme R Pereira
- Department of Agronomy, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Rodrigo S Alves
- Department of General Biology, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | - Alexandre B Heinemann
- Brazilian Agricultural Research Corporation (Embrapa Rice and Beans), Santo Antônio de Goiás, Goiás, Brazil
| | - Flávio Breseghello
- Brazilian Agricultural Research Corporation (Embrapa Rice and Beans), Santo Antônio de Goiás, Goiás, Brazil
| | - Pedro C S Carneiro
- Department of General Biology, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | | | | | - Kaio O G Dias
- Department of General Biology, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil.
| |
Collapse
|
43
|
Toda Y, Sasaki G, Ohmori Y, Yamasaki Y, Takahashi H, Takanashi H, Tsuda M, Kajiya-Kanegae H, Tsujimoto H, Kaga A, Hirai M, Nakazono M, Fujiwara T, Iwata H. Reaction norm for genomic prediction of plant growth: modeling drought stress response in soybean. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:77. [PMID: 38460027 PMCID: PMC10924738 DOI: 10.1007/s00122-024-04565-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 01/30/2024] [Indexed: 03/11/2024]
Abstract
KEY MESSAGE We proposed models to predict the effects of genomic and environmental factors on daily soybean growth and applied them to soybean growth data obtained with unmanned aerial vehicles. Advances in high-throughput phenotyping technology have made it possible to obtain time-series plant growth data in field trials, enabling genotype-by-environment interaction (G × E) modeling of plant growth. Although the reaction norm is an effective method for quantitatively evaluating G × E and has been implemented in genomic prediction models, no reaction norm models have been applied to plant growth data. Here, we propose a novel reaction norm model for plant growth using spline and random forest models, in which daily growth is explained by environmental factors one day prior. The proposed model was applied to soybean canopy area and height to evaluate the influence of drought stress levels. Changes in the canopy area and height of 198 cultivars were measured by remote sensing using unmanned aerial vehicles. Multiple drought stress levels were set as treatments, and their time-series soil moisture was measured. The models were evaluated using three cross-validation schemes. Although accuracy of the proposed models did not surpass that of single-trait genomic prediction, the results suggest that our model can capture G × E, especially the latter growth period for the random forest model. Also, significant variations in the G × E of the canopy height during the early growth period were visualized using the spline model. This result indicates the effectiveness of the proposed models on plant growth data and the possibility of revealing G × E in various growth stages in plant breeding by applying statistical or machine learning models to time-series phenotype data.
Collapse
Affiliation(s)
- Yusuke Toda
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Goshi Sasaki
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Yoshihiro Ohmori
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Yuji Yamasaki
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
- Arid Land Research Center, Tottori University, Tottori, Japan
| | - Hirokazu Takahashi
- Graduate School of Bioagricultural Sciences, Nagoya University, Nagoya, Japan
| | - Hideki Takanashi
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Mai Tsuda
- Tsukuba-Plant Innovation Research Center (T-PIRC), University of Tsukuba, Tsukuba, Japan
| | | | | | - Akito Kaga
- Institute of Crop Science, NARO, Tsukuba, Japan
| | - Masami Hirai
- RIKEN Center for Sustainable Resource Science, Tsukuba, Japan
| | - Mikio Nakazono
- Graduate School of Bioagricultural Sciences, Nagoya University, Nagoya, Japan
| | - Toru Fujiwara
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Hiroyoshi Iwata
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
44
|
Lopez-Cruz M, Pérez-Rodríguez P, de los Campos G. A fast algorithm to factorize high-dimensional tensor product matrices used in genetic models. G3 (BETHESDA, MD.) 2024; 14:jkae001. [PMID: 38180089 PMCID: PMC11090460 DOI: 10.1093/g3journal/jkae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 12/26/2023] [Accepted: 12/28/2023] [Indexed: 01/06/2024]
Abstract
Many genetic models (including models for epistatic effects as well as genetic-by-environment) involve covariance structures that are Hadamard products of lower rank matrices. Implementing these models requires factorizing large Hadamard product matrices. The available algorithms for factorization do not scale well for big data, making the use of some of these models not feasible with large sample sizes. Here, based on properties of Hadamard products and (related) Kronecker products, we propose an algorithm that produces an approximate decomposition that is orders of magnitude faster than the standard eigenvalue decomposition. In this article, we describe the algorithm, show how it can be used to factorize large Hadamard product matrices, present benchmarks, and illustrate the use of the method by presenting an analysis of data from the northern testing locations of the G × E project from the Genomes to Fields Initiative (n ∼ 60,000). We implemented the proposed algorithm in the open-source "tensorEVD" R package.
Collapse
Affiliation(s)
- Marco Lopez-Cruz
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| | - Paulino Pérez-Rodríguez
- Socioeconomía, Estadística e Informática, Colegio de Postgraduados, Montecillos, Edo. de México 56230, Mexico
| | - Gustavo de los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
45
|
Guo T, Wei J, Li X, Yu J. Environmental context of phenotypic plasticity in flowering time in sorghum and rice. JOURNAL OF EXPERIMENTAL BOTANY 2024; 75:1004-1015. [PMID: 37819624 PMCID: PMC10837014 DOI: 10.1093/jxb/erad398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 10/17/2023] [Indexed: 10/13/2023]
Abstract
Phenotypic plasticity is an important topic in biology and evolution. However, how to generate broadly applicable insights from individual studies remains a challenge. Here, with flowering time observed from a large geographical region for sorghum and rice genetic populations, we examine the consistency of parameter estimation for reaction norms of genotypes across different subsets of environments and searched for potential strategies to inform the study design. Both sample size and environmental mean range of the subset affected the consistency. The subset with either a large range of environmental mean or a large sample size resulted in genetic parameters consistent with the overall pattern. Furthermore, high accuracy through genomic prediction was obtained for reaction norm parameters of untested genotypes using models built from tested genotypes under the subsets of environments with either a large range or a large sample size. With 1428 and 1674 simulated settings, our analyses suggested that the distribution of environmental index values of a site should be considered in designing experiments. Overall, we showed that environmental context was critical, and considerations should be given to better cover the intended range of the environmental variable. Our findings have implications for the genetic architecture of complex traits, plant-environment interaction, and climate adaptation.
Collapse
Affiliation(s)
- Tingting Guo
- Hubei Hongshan Laboratory, Wuhan, Hubei, China
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Jialu Wei
- Department of Agronomy, Iowa State University, Ames, IA, USA
| | - Xianran Li
- USDA, Agricultural Research Service, Wheat Health, Genetics, and Quality Research Unit, Pullman, WA, USA
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA, USA
| |
Collapse
|
46
|
Kang HI, Kim IS, Shim D, Kang KS, Cheon KS. Genomic selection for growth characteristics in Korean red pine ( Pinus densiflora Seibold & Zucc.). FRONTIERS IN PLANT SCIENCE 2024; 15:1285094. [PMID: 38322820 PMCID: PMC10844423 DOI: 10.3389/fpls.2024.1285094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 01/05/2024] [Indexed: 02/08/2024]
Abstract
Traditionally, selective breeding has been used to improve tree growth. However, traditional selection methods are time-consuming and limit annual genetic gain. Genomic selection (GS) offers an alternative to progeny testing by estimating the genotype-based breeding values of individuals based on genomic information using molecular markers. In the present study, we introduced GS to an open-pollinated breeding population of Korean red pine (Pinus densiflora), which is in high demand in South Korea, to shorten the breeding cycle. We compared the prediction accuracies of GS for growth characteristics (diameter at breast height [DBH], height, straightness, and volume) in Korean red pines under various conditions (marker set, model, and training set) and evaluated the selection efficiency of GS compared to traditional selection methods. Training the GS model to include individuals from various environments using genomic best linear unbiased prediction (GBLUP) and markers with a minor allele frequency larger than 0.05 was effective. The optimized model had an accuracy of 0.164-0.498 and a predictive ability of 0.018-0.441. The predictive ability of GBLUP against that of additive best linear unbiased prediction (ABLUP) was 0.86-5.10, and against the square root of heritability was 0.19-0.76, indicating that GS for Korean red pine was as efficient as in previous studies on forest trees. Moreover, the response to GS was higher than that to traditional selection regarding the annual genetic gain. Therefore, we conclude that the trained GS model is more effective than the traditional breeding methods for Korean red pines. We anticipate that the next generation of trees selected by GS will lay the foundation for the accelerated breeding of Korean red pine.
Collapse
Affiliation(s)
- Hye-In Kang
- Division of Tree Improvement and Biotechnology, Department of Forest Bio-resources, National Institute of Forest Science, Suwon, Republic of Korea
| | - In Sik Kim
- Division of Tree Improvement and Biotechnology, Department of Forest Bio-resources, National Institute of Forest Science, Suwon, Republic of Korea
| | - Donghwan Shim
- Department of Biological Sciences, Chungnam National University, Daejeon, Republic of Korea
| | - Kyu-Suk Kang
- Department of Agriculture, Forestry and Bioresources, College of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Kyeong-Seong Cheon
- Division of Tree Improvement and Biotechnology, Department of Forest Bio-resources, National Institute of Forest Science, Suwon, Republic of Korea
| |
Collapse
|
47
|
Barreto CAV, das Graças Dias KO, de Sousa IC, Azevedo CF, Nascimento ACC, Guimarães LJM, Guimarães CT, Pastina MM, Nascimento M. Genomic prediction in multi-environment trials in maize using statistical and machine learning methods. Sci Rep 2024; 14:1062. [PMID: 38212638 PMCID: PMC10784464 DOI: 10.1038/s41598-024-51792-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 01/09/2024] [Indexed: 01/13/2024] Open
Abstract
In the context of multi-environment trials (MET), genomic prediction is proposed as a tool that allows the prediction of the phenotype of single cross hybrids that were not tested in field trials. This approach saves time and costs compared to traditional breeding methods. Thus, this study aimed to evaluate the genomic prediction of single cross maize hybrids not tested in MET, grain yield and female flowering time. We also aimed to propose an application of machine learning methodologies in MET in the prediction of hybrids and compare their performance with Genomic best linear unbiased prediction (GBLUP) with non-additive effects. Our results highlight that both methodologies are efficient and can be used in maize breeding programs to accurately predict the performance of hybrids in specific environments. The best methodology is case-dependent, specifically, to explore the potential of GBLUP, it is important to perform accurate modeling of the variance components to optimize the prediction of new hybrids. On the other hand, machine learning methodologies can capture non-additive effects without making any assumptions at the outset of the model. Overall, predicting the performance of new hybrids that were not evaluated in any field trials was more challenging than predicting hybrids in sparse test designs.
Collapse
Affiliation(s)
| | | | - Ithalo Coelho de Sousa
- Department of Mathematics and Statistics, Universidade Federal de Rondônia, Ji-Paraná, RO, Brazil
| | | | | | | | | | | | - Moysés Nascimento
- Department of Statistics, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil.
| |
Collapse
|
48
|
Lopez-Cruz M, Aguate FM, Washburn JD, de Leon N, Kaeppler SM, Lima DC, Tan R, Thompson A, De La Bretonne LW, de Los Campos G. Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America. Nat Commun 2023; 14:6904. [PMID: 37903778 PMCID: PMC10616096 DOI: 10.1038/s41467-023-42687-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 10/18/2023] [Indexed: 11/01/2023] Open
Abstract
Genotype-by-environment (G×E) interactions can significantly affect crop performance and stability. Investigating G×E requires extensive data sets with diverse cultivars tested over multiple locations and years. The Genomes-to-Fields (G2F) Initiative has tested maize hybrids in more than 130 year-locations in North America since 2014. Here, we curate and expand this data set by generating environmental covariates (using a crop model) for each of the trials. The resulting data set includes DNA genotypes and environmental data linked to more than 70,000 phenotypic records of grain yield and flowering traits for more than 4000 hybrids. We show how this valuable data set can serve as a benchmark in agricultural modeling and prediction, paving the way for countless G×E investigations in maize. We use multivariate analyses to characterize the data set's genetic and environmental structure, study the association of key environmental factors with traits, and provide benchmarks using genomic prediction models.
Collapse
Affiliation(s)
- Marco Lopez-Cruz
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA.
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| | - Fernando M Aguate
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Jacob D Washburn
- United States Department of Agriculture, Agricultural Research Service, University of Missouri, Columbia, MO, 65211, USA
| | - Natalia de Leon
- Department of Agronomy, University of Wisconsin, Madison, WI, 53706, USA
| | - Shawn M Kaeppler
- Department of Agronomy, University of Wisconsin, Madison, WI, 53706, USA
- Wisconsin Crop Innovation Center, University of Wisconsin, Middleton, WI, 53562, USA
| | | | - Ruijuan Tan
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, 48824, USA
| | - Addie Thompson
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, 48824, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI, 48824, USA
| | | | - Gustavo de Los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA.
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
49
|
Lozano AC, Ding H, Abe N, Lipka AE. Regularized multi-trait multi-locus linear mixed models for genome-wide association studies and genomic selection in crops. BMC Bioinformatics 2023; 24:399. [PMID: 37884874 PMCID: PMC10604903 DOI: 10.1186/s12859-023-05519-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 10/03/2023] [Indexed: 10/28/2023] Open
Abstract
BACKGROUND We consider two key problems in genomics involving multiple traits: multi-trait genome wide association studies (GWAS), where the goal is to detect genetic variants associated with the traits; and multi-trait genomic selection (GS), where the emphasis is on accurately predicting trait values. Multi-trait linear mixed models build on the linear mixed model to jointly model multiple traits. Existing estimation methods, however, are limited to the joint analysis of a small number of genotypes; in fact, most approaches consider one SNP at a time. Estimating multi-dimensional genetic and environment effects also results in considerable computational burden. Efficient approaches that incorporate regularization into multi-trait linear models (no random effects) have been recently proposed to identify genomic loci associated with multiple traits (Yu et al. in Multitask learning using task clustering with applications to predictive modeling and GWAS of plant varieties. arXiv:1710.01788 , 2017; Yu et al in Front Big Data 2:27, 2019), but these ignore population structure and familial relatedness (Yu et al in Nat Genet 38:203-208, 2006). RESULTS This work addresses this gap by proposing a novel class of regularized multi-trait linear mixed models along with scalable approaches for estimation in the presence of high-dimensional genotypes and a large number of traits. We evaluate the effectiveness of the proposed methods using datasets in maize and sorghum diversity panels, and demonstrate benefits in both achieving high prediction accuracy in GS and in identifying relevant marker-trait associations. CONCLUSIONS The proposed regularized multivariate linear mixed models are relevant for both GWAS and GS. We hope that they will facilitate agronomy-related research in plant biology and crop breeding endeavors.
Collapse
Affiliation(s)
- Aurélie C Lozano
- IBM Research AI, IBM T.J. Watson Reseach Center, Yorktown Heights, USA
| | | | - Naoki Abe
- IBM Research AI, IBM T.J. Watson Reseach Center, Yorktown Heights, USA
| | - Alexander E Lipka
- Department of Crop Sciences, University of Illinois, Urbana-Champaign, USA.
| |
Collapse
|
50
|
Bartholomé J, Frouin J, Brottier L, Cao TV, Boisnard A, Ahmadi N, Courtois B. Genomic selection for salinity tolerance in japonica rice. PLoS One 2023; 18:e0291833. [PMID: 37756295 PMCID: PMC10530037 DOI: 10.1371/journal.pone.0291833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 09/06/2023] [Indexed: 09/29/2023] Open
Abstract
Improving plant performance in salinity-prone conditions is a significant challenge in breeding programs. Genomic selection is currently integrated into many plant breeding programs as a tool for increasing selection intensity and precision for complex traits and for reducing breeding cycle length. A rice reference panel (RP) of 241 Oryza sativa L. japonica accessions genotyped with 20,255 SNPs grown in control and mild salinity stress conditions was evaluated at the vegetative stage for eight morphological traits and ion mass fractions (Na and K). Weak to strong genotype-by-condition interactions were found for the traits considered. Cross-validation showed that the predictive ability of genomic prediction methods ranged from 0.25 to 0.64 for multi-environment models with morphological traits and from 0.05 to 0.40 for indices of stress response and ion mass fractions. The performances of a breeding population (BP) comprising 393 japonica accessions were predicted with models trained on the RP. For validation of the predictive performances of the models, a subset of 41 accessions was selected from the BP and phenotyped under the same experimental conditions as the RP. The predictive abilities estimated on this subset ranged from 0.00 to 0.66 for the multi-environment models, depending on the traits, and were strongly correlated with the predictive abilities on cross-validation in the RP in salt condition (r = 0.69). We show here that genomic selection is efficient for predicting the salt stress tolerance of breeding lines. Genomic selection could improve the efficiency of rice breeding strategies for salinity-prone environments.
Collapse
Affiliation(s)
- Jérôme Bartholomé
- UMR AGAP Institut, CIRAD, Cali, Colombia
- UMR AGAP Institut, Institut Agro, Univ Montpellier, CIRAD, INRAE, Montpellier, France
- Alliance Bioversity-CIAT, Recta Palmira Cali, Colombia
| | - Julien Frouin
- UMR AGAP Institut, Institut Agro, Univ Montpellier, CIRAD, INRAE, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| | - Laurent Brottier
- UMR AGAP Institut, Institut Agro, Univ Montpellier, CIRAD, INRAE, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| | - Tuong-Vi Cao
- UMR AGAP Institut, Institut Agro, Univ Montpellier, CIRAD, INRAE, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| | | | - Nourollah Ahmadi
- UMR AGAP Institut, Institut Agro, Univ Montpellier, CIRAD, INRAE, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| | - Brigitte Courtois
- UMR AGAP Institut, Institut Agro, Univ Montpellier, CIRAD, INRAE, Montpellier, France
- CIRAD, UMR AGAP Institut, Montpellier, France
| |
Collapse
|