1
|
Mora M, González P, Quevedo JR, Montañés E, Tusell L, Bergsma R, Piles M. Impact of multi-output and stacking methods on feed efficiency prediction from genotype using machine learning algorithms. J Anim Breed Genet 2023; 140:638-652. [PMID: 37403756 DOI: 10.1111/jbg.12815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/23/2023] [Accepted: 06/23/2023] [Indexed: 07/06/2023]
Abstract
Feeding represents the largest economic cost in meat production; therefore, selection to improve traits related to feed efficiency is a goal in most livestock breeding programs. Residual feed intake (RFI), that is, the difference between the actual and the expected feed intake based on animal's requirements, has been used as the selection criteria to improve feed efficiency since it was proposed by Kotch in 1963. In growing pigs, it is computed as the residual of the multiple regression model of daily feed intake (DFI), on average daily gain (ADG), backfat thickness (BFT), and metabolic body weight (MW). Recently, prediction using single-output machine learning algorithms and information from SNPs as predictor variables have been proposed for genomic selection in growing pigs, but like in other species, the prediction quality achieved for RFI has been generally poor. However, it has been suggested that it could be improved through multi-output or stacking methods. For this purpose, four strategies were implemented to predict RFI. Two of them correspond to the computation of RFI in an indirect way using the predicted values of its components obtained from (i) individual (multiple single-output strategy) or (ii) simultaneous predictions (multi-output strategy). The other two correspond to the direct prediction of RFI using (iii) the individual predictions of its components as predictor variables jointly with the genotype (stacking strategy), or (iv) using only the genotypes as predictors of RFI (single-output strategy). The single-output strategy was considered the benchmark. This research aimed to test the former three hypotheses using data recorded from 5828 growing pigs and 45,610 SNPs. For all the strategies two different learning methods were fitted: random forest (RF) and support vector regression (SVR). A nested cross-validation (CV) with an outer 10-folds CV and an inner threefold CV for hyperparameter tuning was implemented to test all strategies. This scheme was repeated using as predictor variables different subsets with an increasing number (from 200 to 3000) of the most informative SNPs identified with RF. Results showed that the highest prediction performance was achieved with 1000 SNPs, although the stability of feature selection was poor (0.13 points out of 1). For all SNP subsets, the benchmark showed the best prediction performance. Using the RF as a learner and the 1000 most informative SNPs as predictors, the mean (SD) of the 10 values obtained in the test sets were: 0.23 (0.04) for the Spearman correlation, 0.83 (0.04) for the zero-one loss, and 0.33 (0.03) for the rank distance loss. We conclude that the information on predicted components of RFI (DFI, ADG, MW, and BFT) does not contribute to improve the quality of the prediction of this trait in relation to the one obtained with the single-output strategy.
Collapse
Affiliation(s)
- Mónica Mora
- Departamento de Ciencia Animal, Universidad Politècnica de València, Valencia, Spain
- Animal Breeding and Genetics, Institute of Agrifood Research and Technology (IRTA), Barcelona, Spain
| | - Pablo González
- Artificial Intelligence Centre, University of Oviedo, Gijón, Spain
| | | | - Elena Montañés
- Artificial Intelligence Centre, University of Oviedo, Gijón, Spain
| | - Llibertat Tusell
- Animal Breeding and Genetics, Institute of Agrifood Research and Technology (IRTA), Barcelona, Spain
| | - Rob Bergsma
- Topigs Norsvin Research Center, Beuningen, Netherlands
| | - Miriam Piles
- Animal Breeding and Genetics, Institute of Agrifood Research and Technology (IRTA), Barcelona, Spain
| |
Collapse
|
2
|
Montesinos-López OA, Crossa J, Saint Pierre C, Gerard G, Valenzo-Jiménez MA, Vitale P, Valladares-Cellis PE, Buenrostro-Mariscal R, Montesinos-López A, Crespo-Herrera L. Multivariate Genomic Hybrid Prediction with Kernels and Parental Information. Int J Mol Sci 2023; 24:13799. [PMID: 37762107 PMCID: PMC10531250 DOI: 10.3390/ijms241813799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 08/28/2023] [Accepted: 09/01/2023] [Indexed: 09/29/2023] Open
Abstract
Genomic selection (GS) plays a pivotal role in hybrid prediction. It can enhance the selection of parental lines, accurately predict hybrid performance, and harness hybrid vigor. Likewise, it can optimize breeding strategies by reducing field trial requirements, expediting hybrid development, facilitating targeted trait improvement, and enhancing adaptability to diverse environments. Leveraging genomic information empowers breeders to make informed decisions and significantly improve the efficiency and success rate of hybrid breeding programs. In order to improve the genomic ability performance, we explored the incorporation of parental phenotypic information as covariates under a multi-trait framework. Approach 1, referred to as Pmean, directly utilized parental phenotypic information without any preprocessing. While approach 2, denoted as BV, replaced the direct use of phenotypic values of both parents with their respective breeding values. While an improvement in prediction performance was observed in both approaches, with a minimum 4.24% reduction in the normalized root mean square error (NRMSE), the direct incorporation of parental phenotypic information in the Pmean approach slightly outperformed the BV approach. We also compared these two approaches using linear and nonlinear kernels, but no relevant gain was observed. Finally, our results increase empirical evidence confirming that the integration of parental phenotypic information helps increase the prediction performance of hybrids.
Collapse
Affiliation(s)
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco 52640, México, Mexico; (J.C.); (C.S.P.); (G.G.); (P.V.)
- Colegio de Postgraduados, Montecillos 56230, México, Mexico
| | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco 52640, México, Mexico; (J.C.); (C.S.P.); (G.G.); (P.V.)
| | - Guillermo Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco 52640, México, Mexico; (J.C.); (C.S.P.); (G.G.); (P.V.)
| | - Marco Alberto Valenzo-Jiménez
- Universidad Michoacana de San Nicolas de Hidalgo (UMSNH), Avenida Francisco J. Mujica S/N Ciudad Universitaria, Morelia 58030, Michoacán, Mexico
| | - Paolo Vitale
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco 52640, México, Mexico; (J.C.); (C.S.P.); (G.G.); (P.V.)
| | | | | | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Jalisco, Mexico
| | - Leonardo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco 52640, México, Mexico; (J.C.); (C.S.P.); (G.G.); (P.V.)
| |
Collapse
|
3
|
Pravia MI, Navajas EA, Aguilar I, Ravagnolo O. Prediction ability of an alternative multi-trait genomic evaluation for residual feed intake. J Anim Breed Genet 2023; 140:508-518. [PMID: 37186475 DOI: 10.1111/jbg.12775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/04/2023] [Accepted: 04/06/2023] [Indexed: 05/17/2023]
Abstract
Selection for feed efficiency is the goal for many genetic breeding programs in beef cattle. Residual feed intake has been included in genetic evaluations to reduce feed intake without compromising performance traits as liveweight, body gain or carcass traits. However, measuring feed intake is expensive, and only a small percentage of selection candidates are phenotyped. Genomic selection has become a very important tool to achieve effective genetic progress in these traits. Another effective strategy has been the implementation of multi-trait prediction using easily recordable predictor traits on both reference animals and candidates without phenotypes, and this could be another inexpensive way to increase accuracy. The objective of this work was to analyse and compare the prediction ability of two alternative different approaches to predict GEBVs for RFI. The population of inference was Hereford bulls in Uruguay that were genotyped candidates for to selection. The first model was the conventional univariate model for RFI and the second model was a multi-trait model which included a predictor trait (weaning weight, WW), in addition to the traits used in the first one (dry matter intake, metabolic mid test weight, average daily gain and ultrasound back fat) (DMI, MWT, ADG, UBF, respectively). GEBVs from the multi-trait model were combined using selection index theory to derive RFI values. All analyses were performed using ssGBLUP procedure. The prediction ability of both models was tested using two validation strategies (30 different replicates of random groups of animals and validation across 9 different feed intake tests). The prediction quality was assessed by the following parameters: bias, dispersion, ratio of accuracies and the relative increase in accuracy by adding phenotypic information. All parameters showed that the univariate model outperforms the multi-trait model, regardless of the validation strategy considered. These results indicate that including WW as a proxy trait in a multi-trait analysis does not improve the prediction ability when all animals to be predicted are genotyped.
Collapse
Affiliation(s)
- Maria Isabel Pravia
- Instituto Nacional de Investigación Agropecuaria, INIA Uruguay, Canelones, Uruguay
| | - Elly Ana Navajas
- Instituto Nacional de Investigación Agropecuaria, INIA Uruguay, Canelones, Uruguay
| | - Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria, INIA Uruguay, Canelones, Uruguay
| | - Olga Ravagnolo
- Instituto Nacional de Investigación Agropecuaria, INIA Uruguay, Canelones, Uruguay
| |
Collapse
|
4
|
Li Y, Yang H, Guo J, Yang Y, Yu Q, Guo Y, Zhang C, Wang Z, Zuo P. Uncovering the candidate genes related to sheep body weight using multi-trait genome-wide association analysis. Front Vet Sci 2023; 10:1206383. [PMID: 37662987 PMCID: PMC10469697 DOI: 10.3389/fvets.2023.1206383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 08/04/2023] [Indexed: 09/05/2023] Open
Abstract
In sheep, body weight is an economically important trait. This study sought to map genetic loci related to weaning weight and yearling weight. To this end, a single-trait and multi-trait genome-wide association study (GWAS) was performed using a high-density 600 K single nucleotide polymorphism (SNP) chip. The results showed that 43 and 56 SNPs were significantly associated with weaning weight and yearling weight, respectively. A region associated with both weaning and yearling traits (OARX: 6.74-7.04 Mb) was identified, suggesting that the same genes could play a role in regulating both these traits. This region was found to contain three genes (TBL1X, SHROOM2 and GPR143). The most significant SNP was Affx-281066395, located at 6.94 Mb (p = 1.70 × 10-17), corresponding to the SHROOM2 gene. We also identified 93 novel SNPs elated to sheep weight using multi-trait GWAS analysis. A new genomic region (OAR10: 76.04-77.23 Mb) with 22 significant SNPs were discovered. Combining transcriptomic data from multiple tissues and genomic data in sheep, we found the HINT1, ASB11 and GPR143 genes may involve in sheep body weight. So, multi-omic anlaysis is a valuable strategy identifying candidate genes related to body weight.
Collapse
Affiliation(s)
- Yunna Li
- College of Animal Science and Technology, Northeast Agricultural University,, Harbin, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Hua Yang
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Jing Guo
- College of Animal Science and Technology, Northeast Agricultural University,, Harbin, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Yonglin Yang
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Qian Yu
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Yuanyuan Guo
- College of Animal Science and Technology, Northeast Agricultural University,, Harbin, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Chaoxin Zhang
- College of Animal Science and Technology, Northeast Agricultural University,, Harbin, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Zhipeng Wang
- College of Animal Science and Technology, Northeast Agricultural University,, Harbin, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Peng Zuo
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
- College of Science, Northeast Agricultural University, Harbin, China
| |
Collapse
|
5
|
Mora-Poblete F, Maldonado C, Henrique L, Uhdre R, Scapim CA, Mangolim CA. Multi-trait and multi-environment genomic prediction for flowering traits in maize: a deep learning approach. Front Plant Sci 2023; 14:1153040. [PMID: 37593046 PMCID: PMC10428628 DOI: 10.3389/fpls.2023.1153040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 07/12/2023] [Indexed: 08/19/2023]
Abstract
Maize (Zea mays L.), the third most widely cultivated cereal crop in the world, plays a critical role in global food security. To improve the efficiency of selecting superior genotypes in breeding programs, researchers have aimed to identify key genomic regions that impact agronomic traits. In this study, the performance of multi-trait, multi-environment deep learning models was compared to that of Bayesian models (Markov Chain Monte Carlo generalized linear mixed models (MCMCglmm), Bayesian Genomic Genotype-Environment Interaction (BGGE), and Bayesian Multi-Trait and Multi-Environment (BMTME)) in terms of the prediction accuracy of flowering-related traits (Anthesis-Silking Interval: ASI, Female Flowering: FF, and Male Flowering: MF). A tropical maize panel of 258 inbred lines from Brazil was evaluated in three sites (Cambira-2018, Sabaudia-2018, and Iguatemi-2020 and 2021) using approximately 290,000 single nucleotide polymorphisms (SNPs). The results demonstrated a 14.4% increase in prediction accuracy when employing multi-trait models compared to the use of a single trait in a single environment approach. The accuracy of predictions also improved by 6.4% when using a single trait in a multi-environment scheme compared to using multi-trait analysis. Additionally, deep learning models consistently outperformed Bayesian models in both single and multiple trait and environment approaches. A complementary genome-wide association study identified associations with 26 candidate genes related to flowering time traits, and 31 marker-trait associations were identified, accounting for 37%, 37%, and 22% of the phenotypic variation of ASI, FF and MF, respectively. In conclusion, our findings suggest that deep learning models have the potential to significantly improve the accuracy of predictions, regardless of the approach used and provide support for the efficacy of this method in genomic selection for flowering-related traits in tropical maize.
Collapse
Affiliation(s)
| | - Carlos Maldonado
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
| | - Luma Henrique
- Department of Agronomy, State University of Maringá, Paraná, Brazil
| | - Renan Uhdre
- Department of Agronomy, State University of Maringá, Paraná, Brazil
| | | | | |
Collapse
|
6
|
Zali H, Barati A, Pour-Aboughadareh A, Gholipour A, Koohkan S, Marzoghiyan A, Bocianowski J, Bujak H, Nowosad K. Identification of Superior Barley Genotypes Using Selection Index of Ideal Genotype (SIIG). Plants (Basel) 2023; 12:plants12091843. [PMID: 37176901 PMCID: PMC10181048 DOI: 10.3390/plants12091843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/15/2023]
Abstract
The main objective of the study was to evaluate and select the superior barley genotypes based on grain yield and some pheno-morphological traits using a new proposed selection index (SIIG). For this purpose, one-hundred-eight pure and four local cultivars (Norouz, Auxin, Nobahar, and WB-97-11) were evaluated as reference genotypes in four warm regions of Iran, including Ahvaz, Darab, Zabol, and Gonbad, during the 2020-2021 cropping seasons. The results of REML analysis showed that the heritability of all traits (except plant height) was higher in Gonbad than in other environments, while the lowest values were estimated in Ahvaz and Zabol environments. In addition, among the measured traits, the thousand kernel weight and grain filling period showed the highest and lowest values of heritability (0.83 and 0.01, respectively). The results showed that the seed yield of genotypes 1, 108, 3, 86, 5, 87, 19, 16, 15, 56, and 18 was higher than the four reference genotypes, and, on the other hand, the SIIG index of these genotypes was greater than or equal to 0.60. Based on the SIIG discriminator index, 4, 8, 31, and 28 genotypes with values greater than or equal to 0.60 were identified as superior for Darab, Ahvaz, Zabol, and Gonbad environments, respectively. As a conclusion, our results revealed that the SIIG index has ideal potential to identify genotypes with high yield and desirable traits. Therefore, the use of this index can be beneficial in screening better genotypes in the early stages of any breeding program for any crop.
Collapse
Affiliation(s)
- Hassan Zali
- Crop and Horticultural Science Research Department, Fars Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education and Extension Organization (AREEO), Darab P.O. Box 71558-63511, Iran
| | - Ali Barati
- Seed and Plant Improvement Institute, Agricultural Research, Education and Extension Organization (AREEO), Karaj P.O. Box 31587-77871, Iran
| | - Alireza Pour-Aboughadareh
- Seed and Plant Improvement Institute, Agricultural Research, Education and Extension Organization (AREEO), Karaj P.O. Box 31587-77871, Iran
| | - Ahmad Gholipour
- Crop and Horticultural Science Research Department, Golestan Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education and Extension Organization (AREEO), Gonbad P.O. Box 49156-77555, Iran
| | - Shirali Koohkan
- Crop and Horticultural Science Research Department, Sistan Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education and Extension Organization (AREEO), Zabol P.O. Box 98616-44534, Iran
| | - Akbar Marzoghiyan
- Crop and Horticultural Science Research Department, Khuzestan Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education and Extension Organization (AREEO), Ahvaz P.O. Box 61335-3341, Iran
| | - Jan Bocianowski
- Department of Mathematical and Statistical Methods, Poznań University of Life Sciences, Wojska Polskiego 28, 60-637 Poznań, Poland
| | - Henryk Bujak
- Department of Genetics, Plant Breeding and Seed Production, Wrocław University of Environmental and Life Sciences, Grunwaldzki 24A, 53-363 Wrocław, Poland
- Research Center for Cultivar Testing, Słupia Wielka 34, 63-022 Słupia Wielka, Poland
| | - Kamila Nowosad
- Department of Genetics, Plant Breeding and Seed Production, Wrocław University of Environmental and Life Sciences, Grunwaldzki 24A, 53-363 Wrocław, Poland
| |
Collapse
|
7
|
Montesinos-López OA, Saint Pierre C, Gezan SA, Bentley AR, Mosqueda-González BA, Montesinos-López A, van Eeuwijk F, Beyene Y, Gowda M, Gardner K, Gerard GS, Crespo-Herrera L, Crossa J. Optimizing Sparse Testing for Genomic Prediction of Plant Breeding Crops. Genes (Basel) 2023; 14:genes14040927. [PMID: 37107685 PMCID: PMC10137724 DOI: 10.3390/genes14040927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 04/07/2023] [Accepted: 04/13/2023] [Indexed: 04/29/2023] Open
Abstract
While sparse testing methods have been proposed by researchers to improve the efficiency of genomic selection (GS) in breeding programs, there are several factors that can hinder this. In this research, we evaluated four methods (M1-M4) for sparse testing allocation of lines to environments under multi-environmental trails for genomic prediction of unobserved lines. The sparse testing methods described in this study are applied in a two-stage analysis to build the genomic training and testing sets in a strategy that allows each location or environment to evaluate only a subset of all genotypes rather than all of them. To ensure a valid implementation, the sparse testing methods presented here require BLUEs (or BLUPs) of the lines to be computed at the first stage using an appropriate experimental design and statistical analyses in each location (or environment). The evaluation of the four cultivar allocation methods to environments of the second stage was done with four data sets (two large and two small) under a multi-trait and uni-trait framework. We found that the multi-trait model produced better genomic prediction (GP) accuracy than the uni-trait model and that methods M3 and M4 were slightly better than methods M1 and M2 for the allocation of lines to environments. Some of the most important findings, however, were that even under a scenario where we used a training-testing relation of 15-85%, the prediction accuracy of the four methods barely decreased. This indicates that genomic sparse testing methods for data sets under these scenarios can save considerable operational and financial resources with only a small loss in precision, which can be shown in our cost-benefit analysis.
Collapse
Affiliation(s)
| | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | | | - Alison R Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | - Brandon A Mosqueda-González
- Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Mexico City 07738, Mexico
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico
| | - Fred van Eeuwijk
- Department of Plant Science Mathematical and Statistical Methods-Biometrics, P.O. Box 16, 6700AA Wageningen, The Netherlands
| | - Yoseph Beyene
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | - Manje Gowda
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | - Keith Gardner
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | - Guillermo S Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | - Leonardo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
- Colegio de Postgraduados, Montecillos 56230, Mexico
| |
Collapse
|
8
|
Liang M, Cao S, Deng T, Du L, Li K, An B, Du Y, Xu L, Zhang L, Gao X, Li J, Guo P, Gao H. MAK: a machine learning framework improved genomic prediction via multi-target ensemble regressor chains and automatic selection of assistant traits. Brief Bioinform 2023; 24:7031157. [PMID: 36752363 DOI: 10.1093/bib/bbad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 01/13/2023] [Accepted: 01/20/2023] [Indexed: 02/09/2023] Open
Abstract
Incorporating the genotypic and phenotypic of the correlated traits into the multi-trait model can significantly improve the prediction accuracy of the target trait in animal and plant breeding, as well as human genetics. However, in most cases, the phenotypic information of the correlated and target trait of the individual to be evaluated was null simultaneously, particularly for the newborn. Therefore, we propose a machine learning framework, MAK, to improve the prediction accuracy of the target trait by constructing the multi-target ensemble regression chains and selecting the assistant trait automatically, which predicted the genomic estimated breeding values of the target trait using genotypic information only. The prediction ability of MAK was significantly more robust than the genomic best linear unbiased prediction, BayesB, BayesRR and the multi trait Bayesian method in the four real animal and plant datasets, and the computational efficiency of MAK was roughly 100 times faster than BayesB and BayesRR.
Collapse
Affiliation(s)
- Mang Liang
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Sheng Cao
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Tianyu Deng
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Lili Du
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Keanning Li
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Bingxing An
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Yueying Du
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Lingyang Xu
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Lupei Zhang
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Xue Gao
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Junya Li
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | | | - Huijiang Gao
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| |
Collapse
|
9
|
Qu J, Runcie D, Cheng H. Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits. Genetics 2023; 223:6931802. [PMID: 36529897 PMCID: PMC9991502 DOI: 10.1093/genetics/iyac183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 05/06/2022] [Accepted: 11/17/2022] [Indexed: 12/23/2022] Open
Abstract
Large-scale phenotype data are expected to increase the accuracy of genome-wide prediction and the power of genome-wide association analyses. However, genomic analyses of high-dimensional, highly correlated traits are challenging. We developed a method for implementing high-dimensional Bayesian multivariate regression to simultaneously analyze genetic variants underlying thousands of traits. As a demonstration, we implemented the BayesC prior in the R package MegaLMM. Applied to Genomic Prediction, MegaBayesC effectively integrated hyperspectral reflectance data from 620 hyperspectral wavelengths to improve the accuracy of genetic value prediction on grain yield in a wheat dataset. Applied to Genome-Wide Association Studies, we used simulations to show that MegaBayesC can accurately estimate the effect sizes of QTL across a range of genetic architectures and causes of correlations among traits. To apply MegaBayesC to a realistic scenario involving whole-genome marker data, we developed a 2-stage procedure involving a preliminary step of candidate marker selection prior to multivariate regression. We then used MegaBayesC to identify genetic associations with flowering time in Arabidopsis thaliana, leveraging expression data from 20,843 genes. MegaBayesC selected 15 single nucleotide polymorphisms as important for flowering time, with 13 located within 100 kb of known flowering-time related genes, a higher validation rate than achieved by a single-stage analysis using only the flowering time data itself. These results demonstrate that MegaBayesC can efficiently and effectively leverage high-dimensional phenotypes in genetic analyses.
Collapse
Affiliation(s)
- Jiayi Qu
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA
| | - Daniel Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| | - Hao Cheng
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| |
Collapse
|
10
|
Castro-Urrea FA, Urricariet MP, Stefanova KT, Li L, Moss WM, Guzzomi AL, Sass O, Siddique KHM, Cowling WA. Accuracy of Selection in Early Generations of Field Pea Breeding Increases by Exploiting the Information Contained in Correlated Traits. Plants (Basel) 2023; 12:1141. [PMID: 36903999 PMCID: PMC10005560 DOI: 10.3390/plants12051141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 02/21/2023] [Accepted: 02/27/2023] [Indexed: 06/18/2023]
Abstract
Accuracy of predicted breeding values (PBV) for low heritability traits may be increased in early generations by exploiting the information available in correlated traits. We compared the accuracy of PBV for 10 correlated traits with low to medium narrow-sense heritability (h2) in a genetically diverse field pea (Pisum sativum L.) population after univariate or multivariate linear mixed model (MLMM) analysis with pedigree information. In the contra-season, we crossed and selfed S1 parent plants, and in the main season we evaluated spaced plants of S0 cross progeny and S2+ (S2 or higher) self progeny of parent plants for the 10 traits. Stem strength traits included stem buckling (SB) (h2 = 0.05), compressed stem thickness (CST) (h2 = 0.12), internode length (IL) (h2 = 0.61) and angle of the main stem above horizontal at first flower (EAngle) (h2 = 0.46). Significant genetic correlations of the additive effects occurred between SB and CST (0.61), IL and EAngle (-0.90) and IL and CST (-0.36). The average accuracy of PBVs in S0 progeny increased from 0.799 to 0.841 and in S2+ progeny increased from 0.835 to 0.875 in univariate vs MLMM, respectively. An optimized mating design was constructed with optimal contribution selection based on an index of PBV for the 10 traits, and predicted genetic gain in the next cycle ranged from 1.4% (SB), 5.0% (CST), 10.5% (EAngle) and -10.5% (IL), with low achieved parental coancestry of 0.12. MLMM improved the potential genetic gain in annual cycles of early generation selection in field pea by increasing the accuracy of PBV.
Collapse
Affiliation(s)
- Felipe A. Castro-Urrea
- The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6009, Australia
- School of Agriculture and Environment, The University of Western Australia, Perth, WA 6009, Australia
| | - Maria P. Urricariet
- School of Agriculture and Environment, The University of Western Australia, Perth, WA 6009, Australia
- General Genetics Unit, Pontificia Universidad Católica Argentina, Buenos Aires C1107AAZ, Argentina
| | - Katia T. Stefanova
- The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6009, Australia
- SAGI West, School of Molecular and Life Sciences, Curtin University, Perth, WA 6845, Australia
| | - Li Li
- Animal Genetics and Breeding Unit, University of New England, Armidale, NSW 2351, Australia
| | - Wesley M. Moss
- Centre for Engineering Innovation: Agriculture & Ecological Restoration, The University of Western Australia, Shenton Park, WA 6008, Australia
- School of Engineering, The University of Western Australia, Perth, WA 6009, Australia
| | - Andrew L. Guzzomi
- The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6009, Australia
- Centre for Engineering Innovation: Agriculture & Ecological Restoration, The University of Western Australia, Shenton Park, WA 6008, Australia
- School of Engineering, The University of Western Australia, Perth, WA 6009, Australia
| | - Olaf Sass
- Norddeutsche Pflanzenzucht Hans-Georg Lembke KG, Hohenlieth-Hof 1, 24363 Holtsee, Germany
| | - Kadambot H. M. Siddique
- The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6009, Australia
- School of Agriculture and Environment, The University of Western Australia, Perth, WA 6009, Australia
| | - Wallace A. Cowling
- The UWA Institute of Agriculture, The University of Western Australia, Perth, WA 6009, Australia
- School of Agriculture and Environment, The University of Western Australia, Perth, WA 6009, Australia
| |
Collapse
|
11
|
Kismiantini, Montesinos-López A, Cano-Páez B, Montesinos-López JC, Chavira-Flores M, Montesinos-López OA, Crossa J. A Multi-Trait Gaussian Kernel Genomic Prediction Model under Three Tunning Strategies. Genes (Basel) 2022; 13. [PMID: 36553548 DOI: 10.3390/genes13122279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 11/27/2022] [Accepted: 12/01/2022] [Indexed: 12/12/2022] Open
Abstract
While genomic selection (GS) began revolutionizing plant breeding when it was proposed around 20 years ago, its practical implementation is still challenging as many factors affect its accuracy. One such factor is the choice of the statistical machine learning method. For this reason, we explore the tuning process under a multi-trait framework using the Gaussian kernel with a multi-trait Bayesian Best Linear Unbiased Predictor (GBLUP) model. We explored three methods of tuning (manual, grid search and Bayesian optimization) using 5 real datasets of breeding programs. We found that using grid search and Bayesian optimization improve between 1.9 and 6.8% the prediction accuracy regarding of using manual tuning. While the improvement in prediction accuracy in some cases can be marginal, it is very important to carry out the tuning process carefully to improve the accuracy of the GS methodology, even though this entails greater computational resources.
Collapse
|
12
|
Raffo MA, Sarup P, Andersen JR, Orabi J, Jahoor A, Jensen J. Integrating a growth degree-days based reaction norm methodology and multi-trait modeling for genomic prediction in wheat. Front Plant Sci 2022; 13:939448. [PMID: 36119585 PMCID: PMC9481302 DOI: 10.3389/fpls.2022.939448] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 08/08/2022] [Indexed: 05/26/2023]
Abstract
Multi-trait and multi-environment analyses can improve genomic prediction by exploiting between-trait correlations and genotype-by-environment interactions. In the context of reaction norm models, genotype-by-environment interactions can be described as functions of high-dimensional sets of markers and environmental covariates. However, comprehensive multi-trait reaction norm models accounting for marker × environmental covariates interactions are lacking. In this article, we propose to extend a reaction norm model incorporating genotype-by-environment interactions through (co)variance structures of markers and environmental covariates to a multi-trait reaction norm case. To do that, we propose a novel methodology for characterizing the environment at different growth stages based on growth degree-days (GDD). The proposed models were evaluated by variance components estimation and predictive performance for winter wheat grain yield and protein content in a set of 2,015 F6-lines. Cross-validation analyses were performed using leave-one-year-location-out (CV1) and leave-one-breeding-cycle-out (CV2) strategies. The modeling of genomic [SNPs] × environmental covariates interactions significantly improved predictive ability and reduced the variance inflation of predicted genetic values for grain yield and protein content in both cross-validation schemes. Trait-assisted genomic prediction was carried out for multi-trait models, and it significantly enhanced predictive ability and reduced variance inflation in all scenarios. The genotype by environment interaction modeling via genomic [SNPs] × environmental covariates interactions, combined with trait-assisted genomic prediction, boosted the benefits in predictive performance. The proposed multi-trait reaction norm methodology is a comprehensive approach that allows capitalizing on the benefits of multi-trait models accounting for between-trait correlations and reaction norm models exploiting high-dimensional genomic and environmental information.
Collapse
Affiliation(s)
- Miguel Angel Raffo
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Pernille Sarup
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
- Nordic Seed A/S, Odder, Denmark
| | | | | | - Ahmed Jahoor
- Nordic Seed A/S, Odder, Denmark
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Just Jensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| |
Collapse
|
13
|
Montesinos-López OA, Montesinos-López A, Cano-Paez B, Hernández-Suárez CM, Santana-Mancilla PC, Crossa J. A Comparison of Three Machine Learning Methods for Multivariate Genomic Prediction Using the Sparse Kernels Method (SKM) Library. Genes (Basel) 2022; 13:genes13081494. [PMID: 36011405 PMCID: PMC9407886 DOI: 10.3390/genes13081494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/10/2022] [Accepted: 08/19/2022] [Indexed: 11/30/2022] Open
Abstract
Genomic selection (GS) changed the way plant breeders select genotypes. GS takes advantage of phenotypic and genotypic information to training a statistical machine learning model, which is used to predict phenotypic (or breeding) values of new lines for which only genotypic information is available. Therefore, many statistical machine learning methods have been proposed for this task. Multi-trait (MT) genomic prediction models take advantage of correlated traits to improve prediction accuracy. Therefore, some multivariate statistical machine learning methods are popular for GS. In this paper, we compare the prediction performance of three MT methods: the MT genomic best linear unbiased predictor (GBLUP), the MT partial least squares (PLS) and the multi-trait random forest (RF) methods. Benchmarking was performed with six real datasets. We found that the three investigated methods produce similar results, but under predictors with genotype (G) and environment (E), that is, E + G, the MT GBLUP achieved superior performance, whereas under predictors E + G + genotype × environment (GE) and G + GE, random forest achieved the best results. We also found that the best predictions were achieved under the predictors E + G and E + G + GE. Here, we also provide the R code for the implementation of these three statistical machine learning methods in the sparse kernel method (SKM) library, which offers not only options for single-trait prediction with various statistical machine learning methods but also some options for MT predictions that can help to capture improved complex patterns in datasets that are common in genomic selection.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44100, Mexico
- Correspondence: (A.M.-L.); (J.C.)
| | - Bernabe Cano-Paez
- Facultad de Ciencias, Universidad Nacional Autónoma de México (UNAM), México City 04510, Mexico
| | - Carlos Moisés Hernández-Suárez
- Instituto de Ciencias Tecnología e Innovación, Universidad Francisco Gavidia, El Progreso St., No. 2748, Colonia Flor Blanca, San Salvador CP 1101, El Salvador
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco 56237, Mexico
- Colegio de Postgraduados, Montecillo 56230, Mexico
- Correspondence: (A.M.-L.); (J.C.)
| |
Collapse
|
14
|
Maalouf F, Abou-Khater L, Babiker Z, Jighly A, Alsamman AM, Hu J, Ma Y, Rispail N, Balech R, Hamweih A, Baum M, Kumar S. Genetic Dissection of Heat Stress Tolerance in Faba Bean ( Vicia faba L.) Using GWAS. Plants (Basel) 2022; 11:1108. [PMID: 35567109 PMCID: PMC9103424 DOI: 10.3390/plants11091108] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 03/31/2022] [Accepted: 04/01/2022] [Indexed: 05/19/2023]
Abstract
Heat waves are expected to become more frequent and intense, which will impact faba bean cultivation globally. Conventional breeding methods are effective but take considerable time to achieve breeding goals, and, therefore, the identification of molecular markers associated with key genes controlling heat tolerance can facilitate and accelerate efficient variety development. We phenotyped 134 accessions in six open field experiments during summer seasons at Terbol, Lebanon, at Hudeiba, Sudan, and at Central Ferry, WA, USA from 2015 to 2018. These accessions were genotyped using genotyping by sequencing (GBS), and 10,794 high quality single nucleotide polymorphisms (SNPs) were discovered. These accessions were clustered in one diverse large group, although several discrete groups may exist surrounding it. Fifteen lines belonging to different botanical groups were identified as tolerant to heat. SNPs associated with heat tolerance using single-trait (ST) and multi-trait (MT) genome-wide association studies (GWASs) showed 9 and 11 significant associations, respectively. Through the annotation of the discovered significant SNPs, we found that SNPs from transcription factor helix-loop-helix bHLH143-like S-adenosylmethionine carrier, putative pentatricopeptide repeat-containing protein At5g08310, protein NLP8-like, and photosystem II reaction center PSB28 proteins are associated with heat tolerance.
Collapse
Affiliation(s)
- Fouad Maalouf
- International Center for Agricultural Research in the Dry Areas (ICARDA), Beirut 1108-2010, Lebanon; (L.A.-K.); (R.B.)
| | - Lynn Abou-Khater
- International Center for Agricultural Research in the Dry Areas (ICARDA), Beirut 1108-2010, Lebanon; (L.A.-K.); (R.B.)
| | - Zayed Babiker
- Agricultural Research Cooperation (ARC)-Hudeiba Sudan, Wad Madani 21111, Sudan;
| | - Abdulqader Jighly
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia;
| | - Alsamman M. Alsamman
- Agricultural Genetic Engineering Research Institute, Cairo P.O. Box 12619, Egypt;
| | - Jinguo Hu
- USDA-ARS Plant Germplasm Introduction & Testing Research Unit, Pullman, WA 99163, USA;
| | - Yu Ma
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA;
| | - Nicolas Rispail
- Institute for Sustainable Agriculture, CSIC, 14004 Córdoba, Spain;
| | - Rind Balech
- International Center for Agricultural Research in the Dry Areas (ICARDA), Beirut 1108-2010, Lebanon; (L.A.-K.); (R.B.)
| | | | - Michael Baum
- Biodiversity and Integrated Gene Management Program, ICARDA, 10106 Rabat, Morocco; (M.B.); (S.K.)
| | - Shiv Kumar
- Biodiversity and Integrated Gene Management Program, ICARDA, 10106 Rabat, Morocco; (M.B.); (S.K.)
| |
Collapse
|
15
|
Sandhu KS, Patil SS, Aoun M, Carter AH. Multi-Trait Multi-Environment Genomic Prediction for End-Use Quality Traits in Winter Wheat. Front Genet 2022; 13:831020. [PMID: 35173770 PMCID: PMC8841657 DOI: 10.3389/fgene.2022.831020] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 01/06/2022] [Indexed: 11/13/2022] Open
Abstract
Soft white wheat is a wheat class used in foreign and domestic markets to make various end products requiring specific quality attributes. Due to associated cost, time, and amount of seed needed, phenotyping for the end-use quality trait is delayed until later generations. Previously, we explored the potential of using genomic selection (GS) for selecting superior genotypes earlier in the breeding program. Breeders typically measure multiple traits across various locations, and it opens up the avenue for exploring multi-trait-based GS models. This study's main objective was to explore the potential of using multi-trait GS models for predicting seven different end-use quality traits using cross-validation, independent prediction, and across-location predictions in a wheat breeding program. The population used consisted of 666 soft white wheat genotypes planted for 5 years at two locations in Washington, United States. We optimized and compared the performances of four uni-trait- and multi-trait-based GS models, namely, Bayes B, genomic best linear unbiased prediction (GBLUP), multilayer perceptron (MLP), and random forests. The prediction accuracies for multi-trait GS models were 5.5 and 7.9% superior to uni-trait models for the within-environment and across-location predictions. Multi-trait machine and deep learning models performed superior to GBLUP and Bayes B for across-location predictions, but their advantages diminished when the genotype by environment component was included in the model. The highest improvement in prediction accuracy, that is, 35% was obtained for flour protein content with the multi-trait MLP model. This study showed the potential of using multi-trait-based GS models to enhance prediction accuracy by using information from previously phenotyped traits. It would assist in speeding up the breeding cycle time in a cost-friendly manner.
Collapse
Affiliation(s)
- Karansher S. Sandhu
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Shruti Sunil Patil
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States1
| | - Meriem Aoun
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Arron H. Carter
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| |
Collapse
|
16
|
Guo H, Ayalew H, Seethepalli A, Dhakal K, Griffiths M, Ma X, York LM. Functional phenomics and genetics of the root economics space in winter wheat using high-throughput phenotyping of respiration and architecture. New Phytol 2021; 232:98-112. [PMID: 33683730 PMCID: PMC8518983 DOI: 10.1111/nph.17329] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 02/26/2021] [Indexed: 05/05/2023]
Abstract
The root economics space is a useful framework for plant ecology but is rarely considered for crop ecophysiology. In order to understand root trait integration in winter wheat, we combined functional phenomics with trait economic theory, utilizing genetic variation, high-throughput phenotyping, and multivariate analyses. We phenotyped a diversity panel of 276 genotypes for root respiration and architectural traits using a novel high-throughput method for CO2 flux and the open-source software RhizoVision Explorer to analyze scanned images. We uncovered substantial variation in specific root respiration (SRR) and specific root length (SRL), which were primary indicators of root metabolic and structural costs. Multiple linear regression analysis indicated that lateral root tips had the greatest SRR, and the residuals from this model were used as a new trait. Specific root respiration was negatively correlated with plant mass. Network analysis, using a Gaussian graphical model, identified root weight, SRL, diameter, and SRR as hub traits. Univariate and multivariate genetic analyses identified genetic regions associated with SRR, SRL, and root branching frequency, and proposed gene candidates. Combining functional phenomics and root economics is a promising approach to improving our understanding of crop ecophysiology. We identified root traits and genomic regions that could be harnessed to breed more efficient crops for sustainable agroecosystems.
Collapse
Affiliation(s)
- Haichao Guo
- Noble Research Institute LLC2510 Sam Noble ParkwayArdmoreOK73401USA
| | - Habtamu Ayalew
- Noble Research Institute LLC2510 Sam Noble ParkwayArdmoreOK73401USA
| | | | - Kundan Dhakal
- Noble Research Institute LLC2510 Sam Noble ParkwayArdmoreOK73401USA
| | - Marcus Griffiths
- Noble Research Institute LLC2510 Sam Noble ParkwayArdmoreOK73401USA
| | - Xue‐Feng Ma
- Noble Research Institute LLC2510 Sam Noble ParkwayArdmoreOK73401USA
| | - Larry M. York
- Noble Research Institute LLC2510 Sam Noble ParkwayArdmoreOK73401USA
| |
Collapse
|
17
|
Guo H, Ayalew H, Seethepalli A, Dhakal K, Griffiths M, Ma XF, York LM. Functional phenomics and genetics of the root economics space in winter wheat using high-throughput phenotyping of respiration and architecture. New Phytol 2021. [PMID: 33683730 DOI: 10.1101/2020.11.12.380238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
The root economics space is a useful framework for plant ecology but is rarely considered for crop ecophysiology. In order to understand root trait integration in winter wheat, we combined functional phenomics with trait economic theory, utilizing genetic variation, high-throughput phenotyping, and multivariate analyses. We phenotyped a diversity panel of 276 genotypes for root respiration and architectural traits using a novel high-throughput method for CO2 flux and the open-source software RhizoVision Explorer to analyze scanned images. We uncovered substantial variation in specific root respiration (SRR) and specific root length (SRL), which were primary indicators of root metabolic and structural costs. Multiple linear regression analysis indicated that lateral root tips had the greatest SRR, and the residuals from this model were used as a new trait. Specific root respiration was negatively correlated with plant mass. Network analysis, using a Gaussian graphical model, identified root weight, SRL, diameter, and SRR as hub traits. Univariate and multivariate genetic analyses identified genetic regions associated with SRR, SRL, and root branching frequency, and proposed gene candidates. Combining functional phenomics and root economics is a promising approach to improving our understanding of crop ecophysiology. We identified root traits and genomic regions that could be harnessed to breed more efficient crops for sustainable agroecosystems.
Collapse
Affiliation(s)
- Haichao Guo
- Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA
| | - Habtamu Ayalew
- Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA
| | - Anand Seethepalli
- Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA
| | - Kundan Dhakal
- Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA
| | - Marcus Griffiths
- Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA
| | - Xue-Feng Ma
- Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA
| | - Larry M York
- Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA
| |
Collapse
|
18
|
Brault C, Doligez A, Cunff L, Coupel-Ledru A, Simonneau T, Chiquet J, This P, Flutre T. Harnessing multivariate, penalized regression methods for genomic prediction and QTL detection of drought-related traits in grapevine. G3 (Bethesda) 2021; 11:6325507. [PMID: 34544146 PMCID: PMC8496232 DOI: 10.1093/g3journal/jkab248] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 07/02/2021] [Indexed: 11/13/2022]
Abstract
Viticulture has to cope with climate change and to decrease pesticide inputs, while maintaining yield and wine quality. Breeding is a key lever to meet this challenge, and genomic prediction a promising tool to accelerate breeding programs. Multivariate methods are potentially more accurate than univariate ones. Moreover, some prediction methods also provide marker selection, thus allowing quantitative trait loci (QTLs) detection and the identification of positional candidate genes. To study both genomic prediction and QTL detection for drought-related traits in grapevine, we applied several methods, interval mapping (IM) as well as univariate and multivariate penalized regression, in a bi-parental progeny. With a dense genetic map, we simulated two traits under four QTL configurations. The penalized regression method Elastic Net (EN) for genomic prediction, and controlling the marginal False Discovery Rate on EN selected markers to prioritize the QTLs. Indeed, penalized methods were more powerful than IM for QTL detection across various genetic architectures. Multivariate prediction did not perform better than its univariate counterpart, despite strong genetic correlation between traits. Using 14 traits measured in semi-controlled conditions under different watering conditions, penalized regression methods proved very efficient for intra-population prediction whatever the genetic architecture of the trait, with predictive abilities reaching 0.68. Compared to a previous study on the same traits, these methods applied on a denser map found new QTLs controlling traits linked to drought tolerance and provided relevant candidate genes. Overall, these findings provide a strong evidence base for implementing genomic prediction in grapevine breeding.
Collapse
Affiliation(s)
- Charlotte Brault
- Institut Français de la Vigne et du Vin, Montpellier F-34398, France.,UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier F-34398, France.,UMT Geno-Vigne®, IFV-INRAE-Institut Agro, Montpellier F-34398, France
| | - Agnès Doligez
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier F-34398, France.,UMT Geno-Vigne®, IFV-INRAE-Institut Agro, Montpellier F-34398, France
| | - Le Cunff
- Institut Français de la Vigne et du Vin, Montpellier F-34398, France.,UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier F-34398, France.,UMT Geno-Vigne®, IFV-INRAE-Institut Agro, Montpellier F-34398, France
| | - Aude Coupel-Ledru
- LEPSE, Univ Montpellier, INRAE, Institut Agro, Montpellier 34000, France
| | - Thierry Simonneau
- LEPSE, Univ Montpellier, INRAE, Institut Agro, Montpellier 34000, France
| | | | - Patrice This
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier F-34398, France.,UMT Geno-Vigne®, IFV-INRAE-Institut Agro, Montpellier F-34398, France
| | - Timothée Flutre
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Gif-sur-Yvette 91190, France
| |
Collapse
|
19
|
Fernandes SB, Zhang KS, Jamann TM, Lipka AE. How Well Can Multivariate and Univariate GWAS Distinguish Between True and Spurious Pleiotropy? Front Genet 2021; 11:602526. [PMID: 33584799 PMCID: PMC7873880 DOI: 10.3389/fgene.2020.602526] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 12/11/2020] [Indexed: 11/13/2022] Open
Abstract
Quantification of the simultaneous contributions of loci to multiple traits, a phenomenon called pleiotropy, is facilitated by the increased availability of high-throughput genotypic and phenotypic data. To understand the prevalence and nature of pleiotropy, the ability of multivariate and univariate genome-wide association study (GWAS) models to distinguish between pleiotropic and non-pleiotropic loci in linkage disequilibrium (LD) first needs to be evaluated. Therefore, we used publicly available maize and soybean genotypic data to simulate multiple pairs of traits that were either (i) controlled by quantitative trait nucleotides (QTNs) on separate chromosomes, (ii) controlled by QTNs in various degrees of LD with each other, or (iii) controlled by a single pleiotropic QTN. We showed that multivariate GWAS could not distinguish between QTNs in LD and a single pleiotropic QTN. In contrast, a unique QTN detection rate pattern was observed for univariate GWAS whenever the simulated QTNs were in high LD or pleiotropic. Collectively, these results suggest that multivariate and univariate GWAS should both be used to infer whether or not causal mutations underlying peak GWAS associations are pleiotropic. Therefore, we recommend that future studies use a combination of multivariate and univariate GWAS models, as both models could be useful for identifying and narrowing down candidate loci with potential pleiotropic effects for downstream biological experiments.
Collapse
Affiliation(s)
- Samuel B. Fernandes
- Department of Crop Science, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | | | | | - Alexander E. Lipka
- Department of Crop Science, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| |
Collapse
|
20
|
Crossa J, Fritsche-Neto R, Montesinos-Lopez OA, Costa-Neto G, Dreisigacker S, Montesinos-Lopez A, Bentley AR. The Modern Plant Breeding Triangle: Optimizing the Use of Genomics, Phenomics, and Enviromics Data. Front Plant Sci 2021; 12:651480. [PMID: 33936136 PMCID: PMC8085545 DOI: 10.3389/fpls.2021.651480] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 02/11/2021] [Indexed: 05/04/2023]
Affiliation(s)
- Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, de Mexico, Mexico
- Colegio de Postgraduados, Montecillo, Edo. de Mexico, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, “Luiz de Queiroz” Agriculture College, University of São Paulo, São Paulo, Brazil
| | | | - Germano Costa-Neto
- Department of Genetics, “Luiz de Queiroz” Agriculture College, University of São Paulo, São Paulo, Brazil
| | - Susanne Dreisigacker
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, de Mexico, Mexico
| | - Abelardo Montesinos-Lopez
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Mexico
| | - Alison R. Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Carretera México-Veracruz, de Mexico, Mexico
- *Correspondence: Alison R. Bentley
| |
Collapse
|
21
|
Pégard M, Segura V, Muñoz F, Bastien C, Jorge V, Sanchez L. Favorable Conditions for Genomic Evaluation to Outperform Classical Pedigree Evaluation Highlighted by a Proof-of-Concept Study in Poplar. Front Plant Sci 2020; 11:581954. [PMID: 33193528 PMCID: PMC7655903 DOI: 10.3389/fpls.2020.581954] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 09/22/2020] [Indexed: 06/11/2023]
Abstract
Forest trees like poplar are particular in many ways compared to other domesticated species. They have long juvenile phases, ongoing crop-wild gene flow, extensive outcrossing, and slow growth. All these particularities tend to make the conduction of breeding programs and evaluation stages costly both in time and resources. Perennials like trees are therefore good candidates for the implementation of genomic selection (GS) which is a good way to accelerate the breeding process, by unchaining selection from phenotypic evaluation without affecting precision. In this study, we tried to compare GS to pedigree-based traditional evaluation, and evaluated under which conditions genomic evaluation outperforms classical pedigree evaluation. Several conditions were evaluated as the constitution of the training population by cross-validation, the implementation of multi-trait, single trait, additive and non-additive models with different estimation methods (G-BLUP or weighted G-BLUP). Finally, the impact of the marker densification was tested through four marker density sets. The population under study corresponds to a pedigree of 24 parents and 1,011 offspring, structured into 35 full-sib families. Four evaluation batches were planted in the same location and seven traits were evaluated on 1 and 2 years old trees. The quality of prediction was reported by the accuracy, the Spearman rank correlation and prediction bias and tested with a cross-validation and an independent individual test set. Our results show that genomic evaluation performance could be comparable to the already well-optimized pedigree-based evaluation under certain conditions. Genomic evaluation appeared to be advantageous when using an independent test set and a set of less precise phenotypes. Genome-based methods showed advantages over pedigree counterparts when ranking candidates at the within-family levels, for most of the families. Our study also showed that looking at ranking criteria as Spearman rank correlation can reveal benefits to genomic selection hidden by biased predictions.
Collapse
Affiliation(s)
| | - Vincent Segura
- BioForA, INRA, ONF, Orléans, France
- AGAP, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | | | | | | | | |
Collapse
|
22
|
Abstract
Genome-wide association studies focusing on a single phenotype have been broadly conducted to identify genetic variants associated with a complex disease. The commonly applied single variant analysis is limited by failing to consider the complex interactions between variants, which motivated the development of association analyses focusing on genes or gene sets. Moreover, when multiple correlated phenotypes are available, methods based on a multi-trait analysis can improve the association power. However, most currently available multi-trait analyses are single variant-based analyses; thus have limited power when disease variants function as a group in a gene or a gene set. In this work, we propose a genome-wide gene-based multi-trait analysis method by considering genes as testing units. For a given phenotype, we adopt a rapid and powerful kernel-based testing method which can evaluate the joint effect of multiple variants within a gene. The joint effect, either linear or nonlinear, is captured through kernel functions. Given a series of candidate kernel functions, we propose an omnibus test strategy to integrate the test results based on different candidate kernels. A p-value combination method is then applied to integrate dependent p-values to assess the association between a gene and multiple correlated phenotypes. Simulation studies show a reasonable type I error control and an excellent power of the proposed method compared to its counterparts. We further show the utility of the method by applying it to two data sets: the Human Liver Cohort and the Alzheimer Disease Neuroimaging Initiative data set, and novel genes are identified. Our method has broad applications in other fields in which the interest is to evaluate the joint effect (linear or nonlinear) of a set of variants.
Collapse
Affiliation(s)
- Yamin Deng
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Tao He
- Department of Mathematics, San Francisco State University, San Francisco, CA, United States
| | - Ruiling Fang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Shaoyu Li
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC, United States
| | - Hongyan Cao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
23
|
Guo H, An J, Yu Z. Identifying Shared Risk Genes for Asthma, Hay Fever, and Eczema by Multi-Trait and Multiomic Association Analyses. Front Genet 2020; 11:270. [PMID: 32373153 PMCID: PMC7176997 DOI: 10.3389/fgene.2020.00270] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 03/05/2020] [Indexed: 12/03/2022] Open
Abstract
Asthma, hay fever and eczema are three comorbid diseases with high prevalence and heritability. Their common genetic architectures have not been well-elucidated. In this study, we first conducted a linkage disequilibrium score regression analysis to confirm the strong genetic correlations between asthma, hay fever and eczema. We then integrated three distinct association analyses (metaCCA multi-trait association analysis, MAGMA genome-wide and MetaXcan transcriptome-wide gene-based tests) to identify shared risk genes based on the large-scale GWAS results in the GeneATLAS database. MetaCCA can detect pleiotropic genes associated with these three diseases jointly. MAGMA and MetaXcan were performed separately to identify candidate risk genes for each of the three diseases. We finally identified 150 shared risk genes, in which 60 genes are novel. Functional enrichment analysis revealed that the shared risk genes are enriched in inflammatory bowel disease, T cells differentiation and other related biological pathways. Our work may provide help on treatment of asthma, hay fever and eczema in clinical applications.
Collapse
Affiliation(s)
- Hongping Guo
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, China.,School of Mathematics and Computer Science, Hanjiang Normal University, Hubei, China
| | - Jiyuan An
- Centre for Tropical Crops and Biocommodities, Queensland University of Technology, Brisbane, QLD, Australia
| | - Zuguo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, China.,School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, QLD, Australia
| |
Collapse
|
24
|
Stevens AK, Blanchard BE, Talley AE, Brown JL, Halvorson MA, Janssen T, King KM, Littlefield AK. State-Level Impulsivity, Affect, and Alcohol: A Psychometric Evaluation of the Momentary Impulsivity Scale Across Two Intensive Longitudinal Samples. J Res Pers 2020; 85:103914. [PMID: 32341603 PMCID: PMC7185258 DOI: 10.1016/j.jrp.2020.103914] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
We reexamined the psychometric properties of the Momentary Impulsivity Scale (MIS) in two young adult samples using daily diary (N=77) and ecological momentary assessment (N=147). A one-factor between- and within-person structure was supported, though "I felt impatient" loaded poorly within-person. MIS scores consistently related to emotion-driven trait impulsivity; however, MSSDs of MIS scores were unrelated to outcomes after accounting for aggregate MIS scores. We observed positive, within-person correlations with negative, but not positive, affect. Between-person MIS scores correlated with alcohol problems, though within-person MIS-alcohol relations were inconsistent. MIS scores were unrelated to laboratory-based impulsivity tasks. Findings inform the assessment of state-level impulsivity in young adults. Future research should prioritize expanding the MIS to capture the potential multidimensionality of state-level impulsivity.
Collapse
Affiliation(s)
- Angela K. Stevens
- Texas Tech University Department of Psychological Sciences, Texas Tech University, Psychology Building, Box 42051, Lubbock, TX 79409
- Center for Alcohol and Addiction Studies, Brown University School of Public Health, Providence, RI, 02912, USA
| | - Brittany E. Blanchard
- Texas Tech University Department of Psychological Sciences, Texas Tech University, Psychology Building, Box 42051, Lubbock, TX 79409
| | - Amelia E. Talley
- Texas Tech University Department of Psychological Sciences, Texas Tech University, Psychology Building, Box 42051, Lubbock, TX 79409
| | - Jennifer L. Brown
- Addiction Sciences Division, Department of Psychiatry & Behavioral Neuroscience, University of Cincinnati College of Medicine, 3131 Harvey Ave, Suite 104, Cincinnati, Ohio 45229
| | | | - Tim Janssen
- Center for Alcohol and Addiction Studies, Brown University School of Public Health, Providence, RI, 02912, USA
| | - Kevin M. King
- Department of Psychology, University of Washington, USA
| | - Andrew K. Littlefield
- Texas Tech University Department of Psychological Sciences, Texas Tech University, Psychology Building, Box 42051, Lubbock, TX 79409
| |
Collapse
|
25
|
Srivastava S, Lopez BI, Heras-Saldana SL, Park JE, Shin DH, Chai HH, Park W, Lee SH, Lim D. Estimation of Genetic Parameters by Single-Trait and Multi-Trait Models for Carcass Traits in Hanwoo Cattle. Animals (Basel) 2019; 9:E1061. [PMID: 31810212 DOI: 10.3390/ani9121061] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 11/16/2019] [Accepted: 11/27/2019] [Indexed: 11/17/2022] Open
Abstract
Hanwoo breed is preferred in South Korea because of the high standards in marbling and the palatability of its meat. Numerous studies have been conducted and are ongoing to increase the meat production and quality in this beef population. The aim of this study was to estimate and compare genetic parameters for carcass traits using BLUPF90 software. Four models were constructed, single trait pedigree model (STPM), single-trait genomic model (STGM), multi-trait pedigree model (MTPM), and multi-trait genomic model (MTGM), using the pedigree, phenotype, and genomic information of 7991 Hanwoo cattle. Four carcass traits were evaluated: Back fat thickness (BFT), carcass weight (CWT), eye muscle area (EMA), and marbling score (MS). Heritability estimates of 0.40 and 0.41 for BFT, 0.33 and 0.34 for CWT, 0.36 and 0.37 for EMA, and 0.35 and 0.38 for MS were obtained for the single-trait pedigree model and the multi-trait pedigree model, respectively, in Hanwoo. Further, the genomic model showed more improved results compared to the pedigree model, with heritability of 0.39 (CWT), 0.39 (EMA), and 0.46 (MS), except for 0.39 (BFT), which may be due to random events. Utilization of genomic information in the form of single nucleotide polymorphisms (SNPs) has allowed more capturing of the variance from the traits improving the variance components.
Collapse
|
26
|
Montesinos-López OA, Montesinos-López A, Tuberosa R, Maccaferri M, Sciara G, Ammar K, Crossa J. Multi-Trait, Multi-Environment Genomic Prediction of Durum Wheat With Genomic Best Linear Unbiased Predictor and Deep Learning Methods. Front Plant Sci 2019; 10:1311. [PMID: 31787990 PMCID: PMC6856087 DOI: 10.3389/fpls.2019.01311] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 09/20/2019] [Indexed: 05/23/2023]
Abstract
Although durum wheat (Triticum turgidum var. durum Desf.) is a minor cereal crop representing just 5-7% of the world's total wheat crop, it is a staple food in Mediterranean countries, where it is used to produce pasta, couscous, bulgur and bread. In this paper, we cover multi-trait prediction of grain yield (GY), days to heading (DH) and plant height (PH) of 270 durum wheat lines that were evaluated in 43 environments (country-location-year combinations) across a broad range of water regimes in the Mediterranean Basin and other locations. Multi-trait prediction analyses were performed by implementing a multi-trait deep learning model (MTDL) with a feed-forward network topology and a rectified linear unit activation function with a grid search approach for the selection of hyper-parameters. The results of the multi-trait deep learning method were also compared with univariate predictions of the genomic best linear unbiased predictor (GBLUP) method and the univariate counterpart of the multi-trait deep learning method (UDL). All models were implemented with and without the genotype × environment interaction term. We found that the best predictions were observed without the genotype × environment interaction term in the UDL and MTDL methods. However, under the GBLUP method, the best predictions were observed when the genotype × environment interaction term was taken into account. We also found that in general the best predictions were observed under the GBLUP model; however, the predictions of the MTDL were very similar to those of the GBLUP model. This result provides more evidence that the GBLUP model is a powerful approach for genomic prediction, but also that the deep learning method is a practical approach for predicting univariate and multivariate traits in the context of genomic selection.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Mexico
| | - Roberto Tuberosa
- Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Marco Maccaferri
- Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Giuseppe Sciara
- Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Karim Ammar
- Global Wheat Breeding Program, International Maize and Wheat Improvement Center (CIMMYT), Mexico City, Mexico
| | - José Crossa
- Global Wheat Breeding Program, International Maize and Wheat Improvement Center (CIMMYT), Mexico City, Mexico
| |
Collapse
|
27
|
Runcie D, Cheng H. Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods. G3 (Bethesda) 2019; 9:3727-3741. [PMID: 31511297 PMCID: PMC6829121 DOI: 10.1534/g3.119.400598] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 09/10/2019] [Indexed: 01/08/2023]
Abstract
Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.
Collapse
Affiliation(s)
| | - Hao Cheng
- Department of Animal Science, University of California Davis, Davis, CA 95616
| |
Collapse
|
28
|
Montesinos-López OA, Montesinos-López A, Crossa J, Cuevas J, Montesinos-López JC, Gutiérrez ZS, Lillemo M, Philomin J, Singh R. A Bayesian Genomic Multi-output Regressor Stacking Model for Predicting Multi-trait Multi-environment Plant Breeding Data. G3 (Bethesda) 2019; 9:3381-93. [PMID: 31427455 DOI: 10.1534/g3.119.400336] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper we propose a Bayesian multi-output regressor stacking (BMORS) model that is a generalization of the multi-trait regressor stacking method. The proposed BMORS model consists of two stages: in the first stage, a univariate genomic best linear unbiased prediction (GBLUP including genotype × environment interaction GE) model is implemented for each of the L traits under study; then the predictions of all traits are included as covariates in the second stage, by implementing a Ridge regression model. The main objectives of this research were to study alternative models to the existing multi-trait multi-environment (BMTME) model with respect to (1) genomic-enabled prediction accuracy, and (2) potential advantages in terms of computing resources and implementation. We compared the predictions of the BMORS model to those of the univariate GBLUP model using 7 maize and wheat datasets. We found that the proposed BMORS produced similar predictions to the univariate GBLUP model and to the BMTME model in terms of prediction accuracy; however, the best predictions were obtained under the BMTME model. In terms of computing resources, we found that the BMORS is at least 9 times faster than the BMTME method. Based on our empirical findings, the proposed BMORS model is an alternative for predicting multi-trait and multi-environment data, which are very common in genomic-enabled prediction in plant and animal breeding programs.
Collapse
|
29
|
Montesinos-López OA, Montesinos-López A, Luna-Vázquez FJ, Toledo FH, Pérez-Rodríguez P, Lillemo M, Crossa J. An R Package for Bayesian Analysis of Multi-environment and Multi-trait Multi-environment Data for Genome-Based Prediction. G3 (Bethesda) 2019; 9:1355-69. [PMID: 30819822 DOI: 10.1534/g3.119.400126] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Evidence that genomic selection (GS) is a technology that is revolutionizing plant breeding continues to grow. However, it is very well documented that its success strongly depends on statistical models, which are used by GS to perform predictions of candidate genotypes that were not phenotyped. Because there is no universally better model for prediction and models for each type of response variable are needed (continuous, binary, ordinal, count, etc.), an active area of research aims to develop statistical models for the prediction of univariate and multivariate traits in GS. However, most of the models developed so far are for univariate and continuous (Gaussian) traits. Therefore, to overcome the lack of multivariate statistical models for genome-based prediction by improving the original version of the BMTME, we propose an improved Bayesian multi-trait and multi-environment (BMTME) R package for analyzing breeding data with multiple traits and multiple environments. We also introduce Bayesian multi-output regressor stacking (BMORS) functions that are considerably efficient in terms of computational resources. The package allows parameter estimation and evaluates the prediction performance of multi-trait and multi-environment data in a reliable, efficient and user-friendly way. We illustrate the use of the BMTME with real toy datasets to show all the facilities that the software offers the user. However, for large datasets, the BME() and BMTME() functions of the BMTME R package are very intense in terms of computing time; on the other hand, less intensive computing is required with BMORS functions BMORS() and BMORS_Env() that are also included in the BMTME package.
Collapse
|
30
|
Montesinos-López OA, Montesinos-López A, Crossa J, Gianola D, Hernández-Suárez CM, Martín-Vallejo J. Multi-trait, Multi-environment Deep Learning Modeling for Genomic-Enabled Prediction of Plant Traits. G3 (Bethesda) 2018; 8:3829-3840. [PMID: 30291108 PMCID: PMC6288830 DOI: 10.1534/g3.118.200728] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 10/03/2018] [Indexed: 11/27/2022]
Abstract
Multi-trait and multi-environment data are common in animal and plant breeding programs. However, what is lacking are more powerful statistical models that can exploit the correlation between traits to improve prediction accuracy in the context of genomic selection (GS). Multi-trait models are more complex than univariate models and usually require more computational resources, but they are preferred because they can exploit the correlation between traits, which many times helps improve prediction accuracy. For this reason, in this paper we explore the power of multi-trait deep learning (MTDL) models in terms of prediction accuracy. The prediction performance of MTDL models was compared to the performance of the Bayesian multi-trait and multi-environment (BMTME) model proposed by Montesinos-López et al. (2016), which is a multi-trait version of the genomic best linear unbiased prediction (GBLUP) univariate model. Both models were evaluated with predictors with and without the genotype×environment interaction term. The prediction performance of both models was evaluated in terms of Pearson's correlation using cross-validation. We found that the best predictions in two of the three data sets were found under the BMTME model, but in general the predictions of both models, BTMTE and MTDL, were similar. Among models without the genotype×environment interaction, the MTDL model was the best, while among models with genotype×environment interaction, the BMTME model was superior. These results indicate that the MTDL model is very competitive for performing predictions in the context of GS, with the important practical advantage that it requires less computational resources than the BMTME model.
Collapse
Affiliation(s)
| | - Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, México
| | - José Crossa
- Biometrics and Statistics Unit, Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, 06600, Ciudad de México, México
| | - Daniel Gianola
- Departments of Animal Sciences, Dairy Science, and Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53706
| | | | - Javier Martín-Vallejo
- Departamento de Estadística, Universidad de Salamanca, c/Espejo 2, Salamanca, 37007, España
| |
Collapse
|
31
|
Abstract
Bayesian multiple-regression methods incorporating different mixture priors for marker effects are used widely in genomic prediction. Improvement in prediction accuracies from using those methods, such as BayesB, BayesC, and BayesCπ, have been shown in single-trait analyses with both simulated and real data. These methods have been extended to multi-trait analyses, but only under the restrictive assumption that a locus simultaneously affects all the traits or none of them. This assumption is not biologically meaningful, especially in multi-trait analyses involving many traits. In this paper, we develop and implement a more general multi-trait BayesC[Formula: see text] and BayesB methods allowing a broader range of mixture priors. Our methods allow a locus to affect any combination of traits, e.g., in a 5-trait analysis, the "restrictive" model only allows two situations, whereas ours allow all 32 situations. Further, we compare our methods to single-trait methods and the "restrictive" multi-trait formulation using real and simulated data. In the real data analysis, higher prediction accuracies were observed from both our new broad-based multi-trait methods and the "restrictive" formulation. The broad-based and restrictive multi-trait methods showed similar prediction accuracies. In the simulated data analysis, higher prediction accuracies to the "restrictive" method were observed from our general multi-trait methods for intermediate training population size. The software tool JWAS offers open-source routines to perform these analyses.
Collapse
Affiliation(s)
- Hao Cheng
- Department of Animal Science, University of California Davis, California 95616
| | - Kadir Kizilkaya
- Department of Animal Science, Adnan Menderes University, 9100 Aydin, Turkey
| | - Jian Zeng
- Program in Complex Trait Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD 4072, Australia
| | - Dorian Garrick
- School of Agriculture, Massey University, Palmerston North 4442 New Zealand
| | - Rohan Fernando
- Department of Animal Science, Iowa State University, Ames, Iowa 50011-1050
| |
Collapse
|
32
|
Cheng H, Kizilkaya K, Zeng J, Garrick D, Fernando R. Genomic Prediction from Multiple-Trait Bayesian Regression Methods Using Mixture Priors. Genetics 2018; 209:89-103. [PMID: 29514861 DOI: 10.1534/genetics.118.300650] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Bayesian multiple-regression methods incorporating different mixture priors for marker effects are used widely in genomic prediction. Improvement in prediction accuracies from using those methods, such as BayesB, BayesC, and BayesCπ, have been shown in single-trait analyses with both simulated and real data. These methods have been extended to multi-trait analyses, but only under the restrictive assumption that a locus simultaneously affects all the traits or none of them. This assumption is not biologically meaningful, especially in multi-trait analyses involving many traits. In this paper, we develop and implement a more general multi-trait BayesC[Formula: see text] and BayesB methods allowing a broader range of mixture priors. Our methods allow a locus to affect any combination of traits, e.g., in a 5-trait analysis, the "restrictive" model only allows two situations, whereas ours allow all 32 situations. Further, we compare our methods to single-trait methods and the "restrictive" multi-trait formulation using real and simulated data. In the real data analysis, higher prediction accuracies were observed from both our new broad-based multi-trait methods and the "restrictive" formulation. The broad-based and restrictive multi-trait methods showed similar prediction accuracies. In the simulated data analysis, higher prediction accuracies to the "restrictive" method were observed from our general multi-trait methods for intermediate training population size. The software tool JWAS offers open-source routines to perform these analyses.
Collapse
|
33
|
Montesinos-López OA, Montesinos-López A, Crossa J, Montesinos-López JC, Mota-Sanchez D, Estrada-González F, Gillberg J, Singh R, Mondal S, Juliana P. Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems. G3 (Bethesda) 2018; 8:131-47. [PMID: 29097376 DOI: 10.1534/g3.117.300309] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment–trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets.
Collapse
|
34
|
Montesinos-López OA, Montesinos-López A, Crossa J, Toledo FH, Montesinos-López JC, Singh P, Juliana P, Salinas-Ruiz J. A Bayesian Poisson-lognormal Model for Count Data for Multiple-Trait Multiple-Environment Genomic-Enabled Prediction. G3 (Bethesda) 2017; 7:1595-606. [PMID: 28364037 DOI: 10.1534/g3.117.039974] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
When a plant scientist wishes to make genomic-enabled predictions of multiple traits measured in multiple individuals in multiple environments, the most common strategy for performing the analysis is to use a single trait at a time taking into account genotype × environment interaction (G × E), because there is a lack of comprehensive models that simultaneously take into account the correlated counting traits and G × E. For this reason, in this study we propose a multiple-trait and multiple-environment model for count data. The proposed model was developed under the Bayesian paradigm for which we developed a Markov Chain Monte Carlo (MCMC) with noninformative priors. This allows obtaining all required full conditional distributions of the parameters leading to an exact Gibbs sampler for the posterior distribution. Our model was tested with simulated data and a real data set. Results show that the proposed multi-trait, multi-environment model is an attractive alternative for modeling multiple count traits measured in multiple environments.
Collapse
|
35
|
Montesinos-López OA, Montesinos-López A, Crossa J, Toledo FH, Pérez-Hernández O, Eskridge KM, Rutkoski J. A Genomic Bayesian Multi-trait and Multi-environment Model. G3 (Bethesda) 2016; 6:2725-44. [PMID: 27342738 DOI: 10.1534/g3.116.032359] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
When information on multiple genotypes evaluated in multiple environments is recorded, a multi-environment single trait model for assessing genotype × environment interaction (G × E) is usually employed. Comprehensive models that simultaneously take into account the correlated traits and trait × genotype × environment interaction (T × G × E) are lacking. In this research, we propose a Bayesian model for analyzing multiple traits and multiple environments for whole-genome prediction (WGP) model. For this model, we used Half-t priors on each standard deviation term and uniform priors on each correlation of the covariance matrix. These priors were not informative and led to posterior inferences that were insensitive to the choice of hyper-parameters. We also developed a computationally efficient Markov Chain Monte Carlo (MCMC) under the above priors, which allowed us to obtain all required full conditional distributions of the parameters leading to an exact Gibbs sampling for the posterior distribution. We used two real data sets to implement and evaluate the proposed Bayesian method and found that when the correlation between traits was high (>0.5), the proposed model (with unstructured variance–covariance) improved prediction accuracy compared to the model with diagonal and standard variance–covariance structures. The R-software package Bayesian Multi-Trait and Multi-Environment (BMTME) offers optimized C++ routines to efficiently perform the analyses.
Collapse
|
36
|
Cemal I, Karaman E, Firat MZ, Yilmaz O, Ata N, Karaca O. Bayesian inference of genetic parameters for ultrasound scanning traits of Kivircik lambs. Animal 2017; 11:375-81. [PMID: 27510851 DOI: 10.1017/S1751731116001774] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Ultrasound scanning traits have been adapted in selection programs in many countries to improve carcass traits for lean meat production. As the genetic parameters of the traits interested are important for breeding programs, the estimation of these parameters was aimed at the present investigation. The estimated parameters were direct and maternal heritability as well as genetic correlations between the studied traits. The traits were backfat thickness (BFT), skin+backfat thickness (SBFT), eye muscle depth (MD) and live weights at the day of scanning (LW). The breed investigated was Kivircik, which has a high quality of meat. Six different multi-trait animal models were fitted to determine the most suitable model for the data using Bayesian approach. Based on deviance information criterion, a model that includes direct additive genetic effects, maternal additive genetic effects, direct maternal genetic covariance and maternal permanent environmental effects revealed to be the most appropriate for the data, and therefore, inferences were built on the results of that model. The direct heritability estimates for BFT, SBFT, MD and LW were 0.26, 0.26, 0.23 and 0.09, whereas the maternal heritability estimates were 0.27, 0.27, 0.24 and 0.20, respectively. Negative genetic correlations were obtained between direct and maternal effects for BFT, SBFT and MD. Both direct and maternal genetic correlations between traits were favorable, whereas BFT-MD and SBFT-MD had negligible direct genetic correlation. The highest direct and maternal genetic correlations were between BFT and SBFT (0.39) and between MD and LW (0.48), respectively. Our results, in general, indicated that maternal effects should be accounted for in estimation of genetic parameters of ultrasound scanning traits in Kivircik lambs, and SBFT can be used as a selection criterion to improve BFT.
Collapse
|