1
|
Mbebi AJ, Mercado F, Hobby D, Tong H, Nikoloski Z. Advances in multi-trait genomic prediction approaches: classification, comparative analysis, and perspectives. Brief Bioinform 2025; 26:bbaf211. [PMID: 40358423 PMCID: PMC12070487 DOI: 10.1093/bib/bbaf211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2025] [Revised: 03/24/2025] [Accepted: 04/20/2025] [Indexed: 05/15/2025] Open
Abstract
Traits in any organism are not independent, but show considerable integration, observed in a form of couplings and trade-offs. Therefore, improvement in one trait may affect other traits, often in undesired direction. To account for this problem, crop breeding increasingly relies on multi-trait genomic prediction (MT-GP) approaches that leverage the availability of genetic markers from different populations along with advances in high-throughput precision phenotyping. While significant progress has been made to jointly model multiple traits using a variety of statistical and machine learning approaches, there is no systematic comparison of advantages and shortcomings of the existing classes of MT-GP models. Here, we fill this knowledge gap by first classifying the existing MT-GP models and briefly summarizing their general principles, modeling assumptions, and potential limitations. We then perform an extensive comparative analysis with 10 traits measured in an Oryza sativa diversity panel using cross-validation scenarios relevant in breeding practice. Finally, we discuss directions that can enable the building of next generation MT-GP models in addressing pressing challenges in crop breeding.
Collapse
Affiliation(s)
- Alain J Mbebi
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Brandenburg, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Brandenburg, Germany
| | - Facundo Mercado
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Brandenburg, Germany
| | - David Hobby
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Brandenburg, Germany
| | - Hao Tong
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Brandenburg, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Brandenburg, Germany
| | - Zoran Nikoloski
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Brandenburg, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Brandenburg, Germany
| |
Collapse
|
2
|
Wang X, Shi S, Ali Khan MY, Zhang Z, Zhang Y. Improving the accuracy of genomic prediction in dairy cattle using the biologically annotated neural networks framework. J Anim Sci Biotechnol 2024; 15:87. [PMID: 38945998 PMCID: PMC11215832 DOI: 10.1186/s40104-024-01044-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 05/05/2024] [Indexed: 07/02/2024] Open
Abstract
BACKGROUND Biologically annotated neural networks (BANNs) are feedforward Bayesian neural network models that utilize partially connected architectures based on SNP-set annotations. As an interpretable neural network, BANNs model SNP and SNP-set effects in their input and hidden layers, respectively. Furthermore, the weights and connections of the network are regarded as random variables with prior distributions reflecting the manifestation of genetic effects at various genomic scales. However, its application in genomic prediction has yet to be explored. RESULTS This study extended the BANNs framework to the area of genomic selection and explored the optimal SNP-set partitioning strategies by using dairy cattle datasets. The SNP-sets were partitioned based on two strategies-gene annotations and 100 kb windows, denoted as BANN_gene and BANN_100kb, respectively. The BANNs model was compared with GBLUP, random forest (RF), BayesB and BayesCπ through five replicates of five-fold cross-validation using genotypic and phenotypic data on milk production traits, type traits, and one health trait of 6,558, 6,210 and 5,962 Chinese Holsteins, respectively. Results showed that the BANNs framework achieves higher genomic prediction accuracy compared to GBLUP, RF and Bayesian methods. Specifically, the BANN_100kb demonstrated superior accuracy and the BANN_gene exhibited generally suboptimal accuracy compared to GBLUP, RF, BayesB and BayesCπ across all traits. The average accuracy improvements of BANN_100kb over GBLUP, RF, BayesB and BayesCπ were 4.86%, 3.95%, 3.84% and 1.92%, and the accuracy of BANN_gene was improved by 3.75%, 2.86%, 2.73% and 0.85% compared to GBLUP, RF, BayesB and BayesCπ, respectively across all seven traits. Meanwhile, both BANN_100kb and BANN_gene yielded lower overall mean square error values than GBLUP, RF and Bayesian methods. CONCLUSION Our findings demonstrated that the BANNs framework performed better than traditional genomic prediction methods in our tested scenarios, and might serve as a promising alternative approach for genomic prediction in dairy cattle.
Collapse
Affiliation(s)
- Xue Wang
- State Key Laboratory of Animal Biotech Breeding, National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Shaolei Shi
- State Key Laboratory of Animal Biotech Breeding, National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Md Yousuf Ali Khan
- State Key Laboratory of Animal Biotech Breeding, National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
- Bangladesh Livestock Research Institute, Dhaka 1341, Bangladesh
| | - Zhe Zhang
- Guangdong Laboratory of Lingnan Modern Agriculture, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Yi Zhang
- State Key Laboratory of Animal Biotech Breeding, National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China.
| |
Collapse
|
3
|
Deng T, Li K, Du L, Liang M, Qian L, Xue Q, Qiu S, Xu L, Zhang L, Gao X, Lan X, Li J, Gao H. Genome-Wide Gene-Environment Interaction Analysis Identifies Novel Candidate Variants for Growth Traits in Beef Cattle. Animals (Basel) 2024; 14:1695. [PMID: 38891742 PMCID: PMC11171348 DOI: 10.3390/ani14111695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/24/2024] [Accepted: 05/30/2024] [Indexed: 06/21/2024] Open
Abstract
Complex traits are widely considered to be the result of a compound regulation of genes, environmental factors, and genotype-by-environment interaction (G × E). The inclusion of G × E in genome-wide association analyses is essential to understand animal environmental adaptations and improve the efficiency of breeding decisions. Here, we systematically investigated the G × E of growth traits (including weaning weight, yearling weight, 18-month body weight, and 24-month body weight) with environmental factors (farm and temperature) using genome-wide genotype-by-environment interaction association studies (GWEIS) with a dataset of 1350 cattle. We validated the robust estimator's effectiveness in GWEIS and detected 29 independent interacting SNPs with a significance threshold of 1.67 × 10-6, indicating that these SNPs, which do not show main effects in traditional genome-wide association studies (GWAS), may have non-additive effects across genotypes but are obliterated by environmental means. The gene-based analysis using MAGMA identified three genes that overlapped with the GEWIS results exhibiting G × E, namely SMAD2, PALMD, and MECOM. Further, the results of functional exploration in gene-set analysis revealed the bio-mechanisms of how cattle growth responds to environmental changes, such as mitotic or cytokinesis, fatty acid β-oxidation, neurotransmitter activity, gap junction, and keratan sulfate degradation. This study not only reveals novel genetic loci and underlying mechanisms influencing growth traits but also transforms our understanding of environmental adaptation in beef cattle, thereby paving the way for more targeted and efficient breeding strategies.
Collapse
Affiliation(s)
- Tianyu Deng
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
- Shaanxi Key Laboratory of Molecular Biology for Agriculture, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang 712100, China;
| | - Keanning Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Lili Du
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Mang Liang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Li Qian
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Qingqing Xue
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Shiyuan Qiu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Lingyang Xu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Lupei Zhang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Xue Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Xianyong Lan
- Shaanxi Key Laboratory of Molecular Biology for Agriculture, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang 712100, China;
| | - Junya Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Huijiang Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| |
Collapse
|
4
|
Teng J, Zhai T, Zhang X, Zhao C, Wang W, Tang H, Wang D, Shang Y, Ning C, Zhang Q. Improving multi-population genomic prediction accuracy using multi-trait GBLUP models which incorporate global or local genetic correlation information. Brief Bioinform 2024; 25:bbae276. [PMID: 38856170 PMCID: PMC11163384 DOI: 10.1093/bib/bbae276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/05/2024] [Accepted: 05/24/2024] [Indexed: 06/11/2024] Open
Abstract
In the application of genomic prediction, a situation often faced is that there are multiple populations in which genomic prediction (GP) need to be conducted. A common way to handle the multi-population GP is simply to combine the multiple populations into a single population. However, since these populations may be subject to different environments, there may exist genotype-environment interactions which may affect the accuracy of genomic prediction. In this study, we demonstrated that multi-trait genomic best linear unbiased prediction (MTGBLUP) can be used for multi-population genomic prediction, whereby the performances of a trait in different populations are regarded as different traits, and thus multi-population prediction is regarded as multi-trait prediction by employing the between-population genetic correlation. Using real datasets, we proved that MTGBLUP outperformed the conventional multi-population model that simply combines different populations together. We further proposed that MTGBLUP can be improved by partitioning the global between-population genetic correlation into local genetic correlations (LGC). We suggested two LGC models, LGC-model-1 and LGC-model-2, which partition the genome into regions with and without significant LGC (LGC-model-1) or regions with and without strong LGC (LGC-model-2). In analysis of real datasets, we demonstrated that the LGC models could increase universally the prediction accuracy and the relative improvement over MTGBLUP reached up to 163.86% (25.64% on average).
Collapse
Affiliation(s)
- Jun Teng
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
- Shandong Futeng Food Co. Ltd., Zaozhuang 277500, Shandong, China
| | - Tingting Zhai
- National Key Laboratory of Wheat Improvement, College of Life Science, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Xinyi Zhang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Changheng Zhao
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Wenwen Wang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Hui Tang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Dan Wang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Yingli Shang
- College of Veterinary Medicine, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Chao Ning
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Qin Zhang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| |
Collapse
|
5
|
Zhao W, Zhang Z, Wang Z, Ma P, Pan Y, Wang Q, Zhang Z. Factors affecting the accuracy of genomic prediction in joint pig populations. Animal 2023; 17:100980. [PMID: 37797495 DOI: 10.1016/j.animal.2023.100980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 08/28/2023] [Accepted: 08/31/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic prediction (GP) has greatly advanced animal and plant breeding over the past two decades. GP in joint populations is a feasible method to improve the accuracy of genomic estimated breeding values in small populations. However, there is still a need to understand the factors that influence GP in joint populations. This study used simulated data and real data from Duroc pig populations to examine the impact of linkage disequilibrium (LD), causal variants effect sizes (CVESs), and minor allele frequencies (MAF) of SNPs on the accuracy of genomic prediction in joint populations. Three prediction methods were used: genomic best linear unbiased prediction (GBLUP), single-step GBLUP and multi-trait GBLUP. Results from the simulated datasets showed that the accuracies of GP in joint populations were always higher than those in a single population when only LD inconsistencies existed. However, single-step GBLUP accuracy in joint populations decreased as the correlation of MAF between populations decreased, while the accuracy of GBLUP is consistently higher in joint populations than in a single population. As the correlation of CVES between populations decreased, the accuracy of both GBLUP and single-step GBLUP in joint populations declined. Analysis of real Duroc populations showed low genetic correlation, similar to the simulated relationship between the most distant populations. In most cases in Duroc populations, GP have higher accuracies in joint populations than in individual population. In conclusion, the consistency of CVES plays a more important role in multi-population GP. The genetic relatedness of the Duroc populations is so weak that the prediction accuracy of GP in joint populations is reduced in some traits. Multi-trait GBLUP is a competitive method for the joint breeding evaluation.
Collapse
Affiliation(s)
- Wei Zhao
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiaotong University, 800# Dongchuan Road, Shang, East 200240, China
| | - Zhenyang Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China
| | - Zhen Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China
| | - Peipei Ma
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiaotong University, 800# Dongchuan Road, Shang, East 200240, China
| | - Yuchun Pan
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China; Hainan Institute, Zhejiang University, Yongyou Industrial Park, Yazhou Bay Sci-Tech City, Sanya 572000, China
| | - Qishan Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China; Hainan Institute, Zhejiang University, Yongyou Industrial Park, Yazhou Bay Sci-Tech City, Sanya 572000, China
| | - Zhe Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China.
| |
Collapse
|
6
|
Improving Genomic Prediction Accuracy in the Chinese Holstein Population by Combining with the Nordic Holstein Reference Population. Animals (Basel) 2023; 13:ani13040636. [PMID: 36830423 PMCID: PMC9951650 DOI: 10.3390/ani13040636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 01/29/2023] [Accepted: 02/01/2023] [Indexed: 02/16/2023] Open
Abstract
The size of the reference population is critical in order to improve the accuracy of genomic prediction. Indeed, improving genomic prediction accuracy by combining multinational reference populations has proven to be effective. In this study, we investigated the improvement of genomic prediction accuracy in seven complex traits (i.e., milk yield; fat yield; protein yield; somatic cell count; body conformation; feet and legs; and mammary system conformation) by combining the Chinese and Nordic Holstein reference populations. The estimated genetic correlations between the Chinese and Nordic Holstein populations are high with respect to protein yield, fat yield, and milk yield-whereby these correlations range from 0.621 to 0.720-and are moderate with respect to somatic cell count (0.449), but low for the three conformation traits (which range from 0.144 to 0.236). When utilizing the joint reference data and a two-trait GBLUP model, the genomic prediction accuracy in the Chinese Holsteins improves considerably with respect to the traits with moderate-to-high genetic correlations, whereas the improvement in Nordic Holsteins is small. When compared with the single population analysis, using the joint reference population for genomic prediction in younger animals, results in a 2.3 to 8.1 percent improvement in accuracy. Meanwhile, 10 replications of five-fold cross-validation were also implemented in order to evaluate the performance of joint genomic prediction, thereby resulting in a 1.6 to 5.2 percent increase in accuracy. With respect to joint genomic prediction, the bias was found to be quite low. However, for traits with low genetic correlations, the joint reference data do not improve the prediction accuracy substantially for either population.
Collapse
|
7
|
Zhao W, Zhang Z, Ma P, Wang Z, Wang Q, Zhang Z, Pan Y. The effect of high-density genotypic data and different methods on joint genomic prediction: A case study in large white pigs. Anim Genet 2023; 54:45-54. [PMID: 36414135 DOI: 10.1111/age.13275] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 11/07/2022] [Accepted: 11/07/2022] [Indexed: 11/24/2022]
Abstract
Joint genomic prediction (GP) is an attractive method to improve the accuracy of GP by combining information from multiple populations. However, many factors can negatively influence the accuracy of joint GP, such as differences in linkage disequilibrium phasing between single nucleotide polymorphisms (SNPs) and causal variants, minor allele frequencies and causal variants' effect sizes across different populations. The objective of this study was to investigate whether the imputed high-density genotype data can improve the accuracy of joint GP using genomic best linear unbiased prediction (GBLUP), single-step GBLUP (ssGBLUP), multi-trait GBLUP (MT-GBLUP) and GBLUP based on genomic relationship matrix considering heterogenous minor allele frequencies across different populations (wGBLUP). Three traits, including days taken to reach slaughter weight, backfat thickness and loin muscle area, were measured on 67 276 Large White pigs from two different populations, for which 3334 were genotyped by SNP array. The results showed that a combined population could substantially improve the accuracy of GP compared with a single-population GP, especially for the population with a smaller size. The imputed SNP data had no effect for single population GP but helped to yield higher accuracy than the medium-density array data for joint GP. Of the four methods, ssGLBUP performed the best, but the advantage of ssGBLUP decreased as more individuals were genotyped. In some cases, MT-GBLUP and wGBLUP performed better than GBLUP. In conclusion, our results confirmed that joint GP could be beneficial from imputed high-density genotype data, and the wGBLUP and MT-GBLUP methods are promising for joint GP in pig breeding.
Collapse
Affiliation(s)
- Wei Zhao
- Department of Animal Science, College of Animal Science, Zhejiang University, Hangzhou, China
| | - Zhenyang Zhang
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Peipei Ma
- Department of Animal Science, College of Animal Science, Zhejiang University, Hangzhou, China
| | - Zhen Wang
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Qishan Wang
- Department of Animal Science, College of Animal Science, Zhejiang University, Hangzhou, China
| | - Zhe Zhang
- Department of Animal Science, College of Animal Science, Zhejiang University, Hangzhou, China
| | - Yuchun Pan
- Department of Animal Science, College of Animal Science, Zhejiang University, Hangzhou, China.,Hainan Research Institute, Zhejiang University, Sanya, China
| |
Collapse
|
8
|
Song H, Wang X, Guo Y, Ding X. G × EBLUP: A novel method for exploring genotype by environment interactions and genomic prediction. Front Genet 2022; 13:972557. [PMID: 36171888 PMCID: PMC9510768 DOI: 10.3389/fgene.2022.972557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 08/01/2022] [Indexed: 11/17/2022] Open
Abstract
Genotype by environment (G × E) interaction is fundamental in the biology of complex traits and diseases. However, most of the existing methods for genomic prediction tend to ignore G × E interaction (GEI). In this study, we proposed the genomic prediction method G × EBLUP by considering GEI. Meanwhile, G × EBLUP can also detect the genome-wide single nucleotide polymorphisms (SNPs) subject to GEI. Using comprehensive simulations and analysis of real data from pigs and maize, we showed that G × EBLUP achieved higher efficiency in mapping GEI SNPs and higher prediction accuracy than the existing methods, and its superiority was more obvious when the GEI variance was large. For pig and maize real data, compared with GBLUP, G × EBLUP showed improvement by 3% in the prediction accuracy for backfat thickness, while our findings indicated that the trait of days to 100 kg of pig was not affected by GEI and G × EBLUP did not improve the accuracy of genomic prediction for the trait. A significant advantage was observed for G × EBLUP in maize; the prediction accuracy was improved by ∼5.0 and 7.7% for grain weight and water content, respectively. Furthermore, G × EBLUP was not influenced by the number of environment levels. It could determine a favourable environment using SNP Bayes factors for each environment, implying that it is a robust and useful method for market-specific animal and plant breeding. We proposed G × EBLUP, a novel method for the estimation of genomic breeding value by considering GEI. This method identified the genome-wide SNPs that were susceptible to GEI and yielded higher genomic prediction accuracies and lower mean squared error compared with the GBLUP method.
Collapse
Affiliation(s)
- Hailiang Song
- Beijing Key Laboratory of Fisheries Biotechnology, Fisheries Science Institute, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
| | - Xue Wang
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Yi Guo
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
- *Correspondence: Xiangdong Ding, , orcid.org/0000000226842551
| |
Collapse
|
9
|
Wang X, Shi S, Wang G, Luo W, Wei X, Qiu A, Luo F, Ding X. Using machine learning to improve the accuracy of genomic prediction of reproduction traits in pigs. J Anim Sci Biotechnol 2022; 13:60. [PMID: 35578371 PMCID: PMC9112588 DOI: 10.1186/s40104-022-00708-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 03/13/2022] [Indexed: 12/02/2022] Open
Abstract
Background Recently, machine learning (ML) has become attractive in genomic prediction, but its superiority in genomic prediction over conventional (ss) GBLUP methods and the choice of optimal ML methods need to be investigated. Results In this study, 2566 Chinese Yorkshire pigs with reproduction trait records were genotyped with the GenoBaits Porcine SNP 50 K and PorcineSNP50 panels. Four ML methods, including support vector regression (SVR), kernel ridge regression (KRR), random forest (RF) and Adaboost.R2 were implemented. Through 20 replicates of fivefold cross-validation (CV) and one prediction for younger individuals, the utility of ML methods in genomic prediction was explored. In CV, compared with genomic BLUP (GBLUP), single-step GBLUP (ssGBLUP) and the Bayesian method BayesHE, ML methods significantly outperformed these conventional methods. ML methods improved the genomic prediction accuracy of GBLUP, ssGBLUP, and BayesHE by 19.3%, 15.0% and 20.8%, respectively. In addition, ML methods yielded smaller mean squared error (MSE) and mean absolute error (MAE) in all scenarios. ssGBLUP yielded an improvement of 3.8% on average in accuracy compared to that of GBLUP, and the accuracy of BayesHE was close to that of GBLUP. In genomic prediction of younger individuals, RF and Adaboost.R2_KRR performed better than GBLUP and BayesHE, while ssGBLUP performed comparably with RF, and ssGBLUP yielded slightly higher accuracy and lower MSE than Adaboost.R2_KRR in the prediction of total number of piglets born, while for number of piglets born alive, Adaboost.R2_KRR performed significantly better than ssGBLUP. Among ML methods, Adaboost.R2_KRR consistently performed well in our study. Our findings also demonstrated that optimal hyperparameters are useful for ML methods. After tuning hyperparameters in CV and in predicting genomic outcomes of younger individuals, the average improvement was 14.3% and 21.8% over those using default hyperparameters, respectively. Conclusion Our findings demonstrated that ML methods had better overall prediction performance than conventional genomic selection methods, and could be new options for genomic prediction. Among ML methods, Adaboost.R2_KRR consistently performed well in our study, and tuning hyperparameters is necessary for ML methods. The optimal hyperparameters depend on the character of traits, datasets etc. Supplementary Information The online version contains supplementary material available at 10.1186/s40104-022-00708-0.
Collapse
Affiliation(s)
- Xue Wang
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Shaolei Shi
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Guijiang Wang
- Hebei Province Animal Husbandry and Improved Breeds Work Station, Shijiazhuang, Hebei, China
| | - Wenxue Luo
- Hebei Province Animal Husbandry and Improved Breeds Work Station, Shijiazhuang, Hebei, China
| | - Xia Wei
- Zhangjiakou Dahao Heshan New Agricultural Development Co., Ltd, Zhangjiakou, Hebei, China
| | - Ao Qiu
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Fei Luo
- Hebei Province Animal Husbandry and Improved Breeds Work Station, Shijiazhuang, Hebei, China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China.
| |
Collapse
|
10
|
Li X, Song H, Zhang Z, Huang Y, Zhang Q, Ding X. The theory on and software simulating large-scale genomic data for genotype-by-environment interactions. BMC Genomics 2021; 22:877. [PMID: 34865618 PMCID: PMC8647494 DOI: 10.1186/s12864-021-08191-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 11/19/2021] [Indexed: 11/10/2022] Open
Abstract
Background With the emphasis on analysing genotype-by-environment interactions within the framework of genomic selection and genome-wide association analysis, there is an increasing demand for reliable tools that can be used to simulate large-scale genomic data in order to assess related approaches. Results We proposed a theory to simulate large-scale genomic data on genotype-by-environment interactions and added this new function to our developed tool GPOPSIM. Additionally, a simulated threshold trait with large-scale genomic data was also added. The validation of the simulated data indicated that GPOSPIM2.0 is an efficient tool for mimicking the phenotypic data of quantitative traits, threshold traits, and genetically correlated traits with large-scale genomic data while taking genotype-by-environment interactions into account. Conclusions This tool is useful for assessing genotype-by-environment interactions and threshold traits methods.
Collapse
Affiliation(s)
- Xiujin Li
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangdong, 510225, Guangzhou, People's Republic of China
| | - Hailiang Song
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, 100193, Beijing, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, People's Republic of China
| | - Yunmao Huang
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangdong, 510225, Guangzhou, People's Republic of China
| | - Qin Zhang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Veterinary Medicine, Shandong Agricultural University, 271001, Taian, China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, 100193, Beijing, China.
| |
Collapse
|
11
|
Song H, Hu H. Strategies to improve the accuracy and reduce costs of genomic prediction in aquaculture species. Evol Appl 2021; 15:578-590. [PMID: 35505889 PMCID: PMC9046917 DOI: 10.1111/eva.13262] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/30/2021] [Accepted: 06/07/2021] [Indexed: 11/27/2022] Open
Affiliation(s)
- Hailiang Song
- Beijing Fisheries Research Institute & Beijing Key Laboratory of Fishery Biotechnology Beijing China
| | - Hongxia Hu
- Beijing Fisheries Research Institute & Beijing Key Laboratory of Fishery Biotechnology Beijing China
| |
Collapse
|
12
|
Salek Ardestani S, Jafarikia M, Sargolzaei M, Sullivan B, Miar Y. Genomic Prediction of Average Daily Gain, Back-Fat Thickness, and Loin Muscle Depth Using Different Genomic Tools in Canadian Swine Populations. Front Genet 2021; 12:665344. [PMID: 34149806 PMCID: PMC8209496 DOI: 10.3389/fgene.2021.665344] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 04/15/2021] [Indexed: 12/12/2022] Open
Abstract
Improvement of prediction accuracy of estimated breeding values (EBVs) can lead to increased profitability for swine breeding companies. This study was performed to compare the accuracy of different popular genomic prediction methods and traditional best linear unbiased prediction (BLUP) for future performance of back-fat thickness (BFT), average daily gain (ADG), and loin muscle depth (LMD) in Canadian Duroc, Landrace, and Yorkshire swine breeds. In this study, 17,019 pigs were genotyped using Illumina 60K and Affymetrix 50K panels. After quality control and imputation steps, a total of 41,304, 48,580, and 49,102 single-nucleotide polymorphisms remained for Duroc (n = 6,649), Landrace (n = 5,362), and Yorkshire (n = 5,008) breeds, respectively. The breeding values of animals in the validation groups (n = 392–774) were predicted before performance test using BLUP, BayesC, BayesCπ, genomic BLUP (GBLUP), and single-step GBLUP (ssGBLUP) methods. The prediction accuracies were obtained using the correlation between the predicted breeding values and their deregressed EBVs (dEBVs) after performance test. The genomic prediction methods showed higher prediction accuracies than traditional BLUP for all scenarios. Although the accuracies of genomic prediction methods were not significantly (P > 0.05) different, ssGBLUP was the most accurate method for Duroc-ADG, Duroc-LMD, Landrace-BFT, Landrace-ADG, and Yorkshire-BFT scenarios, and BayesCπ was the most accurate method for Duroc-BFT, Landrace-LMD, and Yorkshire-ADG scenarios. Furthermore, BayesCπ method was the least biased method for Duroc-LMD, Landrace-BFT, Landrace-ADG, Yorkshire-BFT, and Yorkshire-ADG scenarios. Our findings can be beneficial for accelerating the genetic progress of BFT, ADG, and LMD in Canadian swine populations by selecting more accurate and unbiased genomic prediction methods.
Collapse
Affiliation(s)
| | - Mohsen Jafarikia
- Canadian Centre for Swine Improvement, Ottawa, ON, Canada.,Centre for Genetic Improvement of Livestock (CGIL), Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada
| | - Mehdi Sargolzaei
- Department of Pathobiology, University of Guelph, Guelph, ON, Canada.,Select Sires Inc., Plain City, OH, United States
| | - Brian Sullivan
- Canadian Centre for Swine Improvement, Ottawa, ON, Canada
| | - Younes Miar
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, NS, Canada
| |
Collapse
|