1
|
Shokor F, Croiseau P, Gangloff H, Saintilan R, Tribout T, Mary-Huard T, Cuyabano BCD. Deep learning and genomic best linear unbiased prediction integration: An approach to identify potential nonlinear genetic relationships between traits. J Dairy Sci 2025:S0022-0302(25)00260-7. [PMID: 40252763 DOI: 10.3168/jds.2024-26057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Accepted: 03/24/2025] [Indexed: 04/21/2025]
Abstract
Genomic prediction (GP) aims to predict the breeding values of multiple complex traits, usually assumed to be multivariate normally distributed by the largely used statistical methods, thus imposing linear genetic relationships between traits. Although these methods are valuable for GP they do not account for potential nonlinear genetic relationships between traits in scenarios. For individual traits, this oversight may minimally affect prediction accuracy, but it can limit genetic progress when selection involves multiple traits. Deep learning (DL) offers a promising alternative for capturing nonlinear genetic relationships due to its ability to identify complex patterns without prior assumptions about the data structure. We proposed a novel hybrid DLGBLUP model which uses the output of the traditional GBLUP, and enhances its predicted genetic values (PGV) by accounting for nonlinear genetic relationships between traits using DL. We simulated data with linear and nonlinear genetic relationships between traits in order to verify whether DLGBLUP was able to identify nonlinearity when present and avoid inducing it when absent. We found that DLGBLUP consistently provided more accurate PGV for traits simulated with strong nonlinear genetic relationships, accurately identifying these relationships. Over 7 generations of selection, a greater genetic progress was achieved with PGV that accounted for nonlinear relationships (DLGBLUP), compared with GBLUP. When applied to a real dataset from the French Holstein dairy cattle population, DLGBLUP detected nonlinear genetic relationships between pairs of traits, such as conception rate and protein content, and somatic cell count and fat yield, although, no significant increase in prediction accuracy was observed. The integration of DL into GP enabled the modeling of nonlinear genetic relationships between traits, a possibility not previously discussed, given the linear nature of GBLUP. The detection of nonlinear genetic relationships between traits in the French Holstein population when using DLGBLUP indicates the presence of such relationships in real breeding data, suggesting that it may be relevant to further explore nonlinear relationships. This possibility of nonlinear genetic relationships between traits offers a different perspective into multitrait evaluations, with potential to further improve selection strategies in commercial livestock breeding programs. This is particularly relevant when integrating new traits into multitrait evaluations or incorporating new subpopulations, which may introduce different forms of nonlinearity. Finally, it is shown that DL can be used as a complement to the statistical methods deployed in routine genetic evaluations, rather than as an alternative, by enhancing their performance.
Collapse
Affiliation(s)
- F Shokor
- Eliance, 75012 Paris, France; Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France.
| | - P Croiseau
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France
| | - H Gangloff
- Université Paris-Saclay, INRAE, AgroParisTech, UMR MIA Paris-Saclay, 91120 Palaiseau, France
| | - R Saintilan
- Eliance, 75012 Paris, France; Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France
| | - T Tribout
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France
| | - T Mary-Huard
- Université Paris-Saclay, INRAE, AgroParisTech, UMR MIA Paris-Saclay, 91120 Palaiseau, France; Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
| | - B C D Cuyabano
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350 Jouy-en-Josas, France
| |
Collapse
|
2
|
da Costa WG, Bandeira e Souza M, Azevedo CF, Nascimento M, Morgante CV, Borel JC, de Oliveira EJ. Optimizing drought tolerance in cassava through genomic selection. FRONTIERS IN PLANT SCIENCE 2024; 15:1483340. [PMID: 39737377 PMCID: PMC11683140 DOI: 10.3389/fpls.2024.1483340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Accepted: 11/29/2024] [Indexed: 01/01/2025]
Abstract
The complexity of selecting for drought tolerance in cassava, influenced by multiple factors, demands innovative approaches to plant selection. This study aimed to identify cassava clones with tolerance to water stress by employing truncated selection and selection based on genomic values for population improvement and genotype evaluation per se. The Best Linear Unbiased Predictions (BLUPs), Genomic Estimated Breeding Values (GEBVs), and Genomic Estimated Genotypic Values (GETGVs) were obtained based on different prediction models via genomic selection. The selection intensity ranged from 10 to 30%. A wide range of BLUPs for agronomic traits indicate desirable genetic variability for initiating genomic selection cycles to improve cassava's drought tolerance. SNP-based heritability (h 2) and broad-sense heritabilities (H 2) under water deficit were low magnitude (<0.40) for 8 to 12 agronomic traits evaluated. Genomic predictive abilities were below the levels of phenotypic heritability, varying by trait and prediction model, with the lowest and highest predictive abilities observed for starch content (0.15 - 0.22) and root length (0.34 - 0.36). Some agronomic traits of greater importance, such as fresh root yield (0.29 - 0.31) and shoot yield (0.31 - 0.32), showed good predictive ability, while dry matter content had lower predictive ability (0.16 - 0.22). The G-BLUP and RKHS methods presented higher predictive abilities, suggesting that incorporating kinship effects can be beneficial, especially in challenging environments. The selection differential based on a 15% selection intensity (62 genotypes) was higher for economically significant traits, such as starch content, shoot yield, and fresh root yield, both for population improvement (GEBVs) and for evaluating genotype's performance per (GETGVs). The lower costs of genotyping offer advantages over conventional phenotyping, making genomic selection a promising approach to increasing genetic gains for drought tolerance in cassava and reducing the breeding cycle to at least half the conventional time.
Collapse
Affiliation(s)
- Weverton Gomes da Costa
- Laboratório de Inteligência Computacional e Aprendizado Estatístico - LICAE, Departamento de Estatística, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Camila Ferreira Azevedo
- Laboratório de Inteligência Computacional e Aprendizado Estatístico - LICAE, Departamento de Estatística, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Moyses Nascimento
- Laboratório de Inteligência Computacional e Aprendizado Estatístico - LICAE, Departamento de Estatística, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | | | | | | |
Collapse
|
3
|
Dwivedi SL, Heslop‐Harrison P, Amas J, Ortiz R, Edwards D. Epistasis and pleiotropy-induced variation for plant breeding. PLANT BIOTECHNOLOGY JOURNAL 2024; 22:2788-2807. [PMID: 38875130 PMCID: PMC11536456 DOI: 10.1111/pbi.14405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 05/07/2024] [Accepted: 05/24/2024] [Indexed: 06/16/2024]
Abstract
Epistasis refers to nonallelic interaction between genes that cause bias in estimates of genetic parameters for a phenotype with interactions of two or more genes affecting the same trait. Partitioning of epistatic effects allows true estimation of the genetic parameters affecting phenotypes. Multigenic variation plays a central role in the evolution of complex characteristics, among which pleiotropy, where a single gene affects several phenotypic characters, has a large influence. While pleiotropic interactions provide functional specificity, they increase the challenge of gene discovery and functional analysis. Overcoming pleiotropy-based phenotypic trade-offs offers potential for assisting breeding for complex traits. Modelling higher order nonallelic epistatic interaction, pleiotropy and non-pleiotropy-induced variation, and genotype × environment interaction in genomic selection may provide new paths to increase the productivity and stress tolerance for next generation of crop cultivars. Advances in statistical models, software and algorithm developments, and genomic research have facilitated dissecting the nature and extent of pleiotropy and epistasis. We overview emerging approaches to exploit positive (and avoid negative) epistatic and pleiotropic interactions in a plant breeding context, including developing avenues of artificial intelligence, novel exploitation of large-scale genomics and phenomics data, and involvement of genes with minor effects to analyse epistatic interactions and pleiotropic quantitative trait loci, including missing heritability.
Collapse
Affiliation(s)
| | - Pat Heslop‐Harrison
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical GardenChinese Academy of SciencesGuangzhouChina
- Department of Genetics and Genome Biology, Institute for Environmental FuturesUniversity of LeicesterLeicesterUK
| | - Junrey Amas
- Centre for Applied Bioinformatics, School of Biological SciencesUniversity of Western AustraliaPerthWAAustralia
| | - Rodomiro Ortiz
- Department of Plant BreedingSwedish University of Agricultural SciencesAlnarpSweden
| | - David Edwards
- Centre for Applied Bioinformatics, School of Biological SciencesUniversity of Western AustraliaPerthWAAustralia
| |
Collapse
|
4
|
Nascimento M, Nascimento ACC, Azevedo CF, de Oliveira ACB, Caixeta ET, Jarquin D. Enhancing genomic prediction with Stacking Ensemble Learning in Arabica Coffee. FRONTIERS IN PLANT SCIENCE 2024; 15:1373318. [PMID: 39086911 PMCID: PMC11288849 DOI: 10.3389/fpls.2024.1373318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 06/12/2024] [Indexed: 08/02/2024]
Abstract
Coffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits [yield (YL), total number of the fruits (NF), leaf miner infestation (LM), and cercosporiosis incidence (Cer)] in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. Genomic Best Linear Unbiased Prediction (GBLUP), multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The SEL method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the PA of GS for complex traits.
Collapse
Affiliation(s)
- Moyses Nascimento
- Laboratory of Intelligence Computational and Statistical Learning (LICAE), Department of Statistics, Federal University of Viçosa, Viçosa, Brazil
- Agronomy Department, University of Florida, Gainesville, FL, United States
| | - Ana Carolina Campana Nascimento
- Laboratory of Intelligence Computational and Statistical Learning (LICAE), Department of Statistics, Federal University of Viçosa, Viçosa, Brazil
- Agronomy Department, University of Florida, Gainesville, FL, United States
| | - Camila Ferreira Azevedo
- Laboratory of Intelligence Computational and Statistical Learning (LICAE), Department of Statistics, Federal University of Viçosa, Viçosa, Brazil
| | | | | | - Diego Jarquin
- Agronomy Department, University of Florida, Gainesville, FL, United States
| |
Collapse
|
5
|
Barreto CAV, das Graças Dias KO, de Sousa IC, Azevedo CF, Nascimento ACC, Guimarães LJM, Guimarães CT, Pastina MM, Nascimento M. Genomic prediction in multi-environment trials in maize using statistical and machine learning methods. Sci Rep 2024; 14:1062. [PMID: 38212638 PMCID: PMC10784464 DOI: 10.1038/s41598-024-51792-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 01/09/2024] [Indexed: 01/13/2024] Open
Abstract
In the context of multi-environment trials (MET), genomic prediction is proposed as a tool that allows the prediction of the phenotype of single cross hybrids that were not tested in field trials. This approach saves time and costs compared to traditional breeding methods. Thus, this study aimed to evaluate the genomic prediction of single cross maize hybrids not tested in MET, grain yield and female flowering time. We also aimed to propose an application of machine learning methodologies in MET in the prediction of hybrids and compare their performance with Genomic best linear unbiased prediction (GBLUP) with non-additive effects. Our results highlight that both methodologies are efficient and can be used in maize breeding programs to accurately predict the performance of hybrids in specific environments. The best methodology is case-dependent, specifically, to explore the potential of GBLUP, it is important to perform accurate modeling of the variance components to optimize the prediction of new hybrids. On the other hand, machine learning methodologies can capture non-additive effects without making any assumptions at the outset of the model. Overall, predicting the performance of new hybrids that were not evaluated in any field trials was more challenging than predicting hybrids in sparse test designs.
Collapse
Affiliation(s)
| | | | - Ithalo Coelho de Sousa
- Department of Mathematics and Statistics, Universidade Federal de Rondônia, Ji-Paraná, RO, Brazil
| | | | | | | | | | | | - Moysés Nascimento
- Department of Statistics, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil.
| |
Collapse
|