1
|
Fonseca PADS, Suarez-Vega A, Esteban-Blanco C, Marina H, Pelayo R, Gutiérrez-Gil B, Arranz JJ. Integration of epigenomic and genomic data to predict residual feed intake and the feed conversion ratio in dairy sheep via machine learning algorithms. BMC Genomics 2025; 26:313. [PMID: 40165084 PMCID: PMC11956460 DOI: 10.1186/s12864-025-11520-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 03/24/2025] [Indexed: 04/02/2025] Open
Abstract
BACKGROUND Feed efficiency (FE) is an essential trait in livestock species because of the constant demand to increase the productivity and sustainability of livestock production systems. A better understanding of the biological mechanisms associated with FEs might help improve the estimation and selection of superior animals. In this work, differentially methylated regions (DMRs) were identified via genome-wide bisulfite sequencing (GWBS) by comparing the DNA methylation profiles of milk somatic cells from dairy ewes that were divergent in terms of residual feed intake. The DMRs were identified by comparing divergent groups for residual feed intake (RFI), the feed conversion ratio (FCR), and the consensus between both metrics (Cons). Additionally, the predictive performance of these DMRs and genetic variants mapped within these regions was evaluated via three machine learning (ML) models (xgboost, random forest (RF), and multilayer feedforward artificial neural network (deeplearning)). The average performance of each model was based on the root mean squared error (RMSE) and squared Spearman correlation (rho2). Finally, the best model for each scenario was selected on the basis of the highest ratio between rho2 and RMSE. RESULTS In total, 12,257, 9,328, and 6,723 genes were annotated for DMRs detected in the RFI, FCR, and Cons groups, respectively. These genes are associated with important pathways for regulating FE in dairy sheep, such as protein digestion and absorption, hormone synthesis and secretion, control of energy availability, cellular signaling, and feed behavior pathways. With respect to the ML predictions, the smallest mean RMSE (0.17) was obtained using RF, which was used to predict RFI. The highest mean rho2 (0.20) was obtained when the RFI was predicted via the mean methylation within the DMRs identified, the consensus groups were compared, and the genetic variants mapped within these DMRs were included. The best overall models were obtained for the prediction of RFI using the DMRs obtained in the comparison of RFI groups (RMSE = 0.10, rho2 = 0.86) using xgboost and the DMRs plus the genetic variants identified via the Cons groups (RMSE = 0.07, rho2 = 0.62) using RF. CONCLUSIONS The results provide new insights into the biological mechanisms associated with FE and the control of these processes through epigenetic mechanisms. Additionally, the potential use of epigenetic information as a biomarker for the prediction of FE can be suggested based on the obtained results.
Collapse
Affiliation(s)
| | - Aroa Suarez-Vega
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana, Leon, 24007, Spain
| | - Cristina Esteban-Blanco
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana, Leon, 24007, Spain
| | - Héctor Marina
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana, Leon, 24007, Spain
| | - Rocío Pelayo
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana, Leon, 24007, Spain
| | - Beatriz Gutiérrez-Gil
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana, Leon, 24007, Spain
| | - Juan-José Arranz
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana, Leon, 24007, Spain.
| |
Collapse
|
2
|
Peng J, Lei X, Liu T, Xiong Y, Wu J, Xiong Y, You M, Zhao J, Zhang J, Ma X. Integration of machine learning and genome-wide association study to explore the genomic prediction accuracy of agronomic trait in oats (Avena sativa L.). THE PLANT GENOME 2025; 18:e20549. [PMID: 39780036 PMCID: PMC11711298 DOI: 10.1002/tpg2.20549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 09/22/2024] [Accepted: 12/04/2024] [Indexed: 01/11/2025]
Abstract
Machine learning (ML) has garnered significant attention for its potential to enhance the accuracy of genomic predictions (GPs) in various economic crops with the use of complete genomic information. Genome-wide association studies (GWAS) are widely used to pinpoint trait-related causal variant loci in genomes. However, the simultaneous integration of both methods for crop genome prediction necessitates further research. In this study, we integrated ML and GWAS to assess the efficiency of GP for seven key agronomic traits in 195 oat (Avena sativa) cultivars from major oat-growing regions around the world. A total of 94 trait-associated single nucleotide polymorphisms were identified through the GWAS study. GP studies were conducted using the classical model genomic best linear unbiased prediction (GBLUP) and six ML models. GBLUP performed poorly in predicting all traits except flag leaf width, while none of the ML models consistently provided the best prediction accuracy across all traits. The prediction accuracy of the GWAS-derived markers was better than that of the use of genome-wide markers, and plant height had the highest prediction rate at 100 GWAS-derived markers, and the rest of the traits for which more markers were required. These results play an important role in advancing the use of GP in small oat breeding programs by optimizing the prediction rate of GP and reducing the number of markers, confirming that high prediction rates can be achieved with smaller datasets.
Collapse
Affiliation(s)
- Jinghan Peng
- College of Grassland Science and TechnologySichuan Agricultural UniversityChengduChina
- Sichuan Academy of Grassland ScienceChengduChina
| | - Xiong Lei
- Sichuan Academy of Grassland ScienceChengduChina
| | - Tianqi Liu
- College of Grassland Science and TechnologySichuan Agricultural UniversityChengduChina
| | - Yi Xiong
- College of Grassland Science and TechnologySichuan Agricultural UniversityChengduChina
| | - Jiqiang Wu
- College of Grassland Science and TechnologySichuan Agricultural UniversityChengduChina
- Sichuan Academy of Grassland ScienceChengduChina
| | - Yanli Xiong
- College of Grassland Science and TechnologySichuan Agricultural UniversityChengduChina
| | - Minghong You
- Sichuan Academy of Grassland ScienceChengduChina
| | - Junming Zhao
- College of Grassland Science and TechnologySichuan Agricultural UniversityChengduChina
| | - Jian Zhang
- Sichuan Provincial Research Center for Forestry and Grassland DevelopmentChengduChina
| | - Xiao Ma
- College of Grassland Science and TechnologySichuan Agricultural UniversityChengduChina
| |
Collapse
|
3
|
Wang J, Chai J, Chen L, Zhang T, Long X, Diao S, Chen D, Guo Z, Tang G, Wu P. Enhancing Genomic Prediction Accuracy of Reproduction Traits in Rongchang Pigs Through Machine Learning. Animals (Basel) 2025; 15:525. [PMID: 40003007 PMCID: PMC11852217 DOI: 10.3390/ani15040525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Revised: 02/02/2025] [Accepted: 02/10/2025] [Indexed: 02/27/2025] Open
Abstract
The increasing volume of genome sequencing data presents challenges for traditional genome-wide prediction methods in handling large datasets. Machine learning (ML) techniques, which can process high-dimensional data, offer promising solutions. This study aimed to find a genome-wide prediction method for local pig breeds, using 10 datasets with varying SNP densities derived from imputed sequencing data of 515 Rongchang pigs and the Pig QTL database. Three reproduction traits-litter weight, total number of piglets born, and number of piglets born alive-were predicted using six traditional methods and five ML methods, including kernel ridge regression, random forest, Gradient Boosting Decision Tree (GBDT), Light Gradient Boosting Machine, and Adaboost. The methods' efficacy was evaluated using fivefold cross-validation and independent tests. The predictive performance of both traditional and ML methods initially increased with SNP density, peaking at 800-900 k SNPs. ML methods outperformed traditional ones, showing improvements of 0.4-4.1%. The integration of GWAS and the Pig QTL database enhanced ML robustness. ML models exhibited superior generalizability, with high correlation coefficients (0.935-0.998) between cross-validation and independent test results. GBDT and random forest showed high computational efficiency, making them promising methods for genomic prediction in livestock breeding.
Collapse
Affiliation(s)
- Junge Wang
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China; (J.W.); (D.C.)
| | - Jie Chai
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Li Chen
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Tinghuan Zhang
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Xi Long
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Shuqi Diao
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Dong Chen
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China; (J.W.); (D.C.)
| | - Zongyi Guo
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Guoqing Tang
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China; (J.W.); (D.C.)
| | - Pingxian Wu
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| |
Collapse
|
4
|
Shirzadifar A, Manafiazar G, Davoudi P, Do D, Hu G, Miar Y. Prediction of growth and feed efficiency in mink using machine learning algorithms. Animal 2025; 19:101330. [PMID: 39862571 DOI: 10.1016/j.animal.2024.101330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 09/06/2024] [Accepted: 09/10/2024] [Indexed: 01/27/2025] Open
Abstract
The feed efficiency (FE) expresses as the amount of feed required per unit of BW gain. Since feed cost is the major input cost in the mink industry, evaluating of FE is a crucial step for competitiveness of the mink industry. However, the FE measures have not been widely adopted for the mink due to the high cost of periodically measuring BW and daily feed intake. Measuring individual daily feed intake and BW is time-consuming, labor-intensive, and stressful for the animals and mink producers. The main objectives of this study were to (1) evaluate the application of machine learning (ML) algorithms to predict the average daily gain (ADG), feed conversion ratio (FCR), and residual feed intake (RFI) values during the whole growing and furring period (15 weeks from August 1st to November 14th) using less expensive features such as sex, color type, age, BW and length; (2) find the most significant contributing feature within the growth and furring period to predict the ADG, FCR and RFI. The color and sex features were recorded on 1 088 mink and mink's age, BW and length were measured every 3 weeks from August 1st to November 14th which is called P1-P5. The ADG, FCR, and RFI were then predicted by the selected ML algorithms using multiple combinations of the observed and measured features from P1 to P5. By comparing the calculated ADG, FCR, and RFI values with the predicted values, it was determined that the most accurate combination of features was to include all features such as sex, color, age, BW and body length on August 1st (at the beginning of the P1). Among selected ML algorithms, the extreme gradient boosting (XGB) algorithm provided the most accurate and reliable prediction for the ADG (R2 = 0.71, RMSE = 0.10), FCR (R2 = 0.74, RMSE = 0.14), and RFI (R2 = 0.76, RMSE = 0.10). The XGB algorithm can be an accurate algorithm to predict the ADG, FCR, and RFI values without measuring costly daily feed intake. In addition, sex was identified as the most significant feature to predict the ADG, FCR, and RFI values with the importance scores of 0.85, 0.67, and 0.79, respectively.
Collapse
Affiliation(s)
- A Shirzadifar
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, Nova Scotia B2N 5E3, Canada; Biosystems Engineering Department, Shiraz University, Shiraz, Iran
| | - G Manafiazar
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, Nova Scotia B2N 5E3, Canada
| | - P Davoudi
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, Nova Scotia B2N 5E3, Canada
| | - D Do
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, Nova Scotia B2N 5E3, Canada
| | - G Hu
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, Nova Scotia B2N 5E3, Canada
| | - Y Miar
- Department of Animal Science and Aquaculture, Dalhousie University, Truro, Nova Scotia B2N 5E3, Canada.
| |
Collapse
|
5
|
Cetintav B, Yalcin A. From Prediction to Precision: Explainable AI-Driven Insights for Targeted Treatment in Equine Colic. Animals (Basel) 2025; 15:126. [PMID: 39858126 PMCID: PMC11758311 DOI: 10.3390/ani15020126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2024] [Revised: 12/30/2024] [Accepted: 01/06/2025] [Indexed: 01/27/2025] Open
Abstract
Colic is a leading cause of mortality in horses, demanding precise and timely interventions. This study integrates machine learning and explainable artificial intelligence (XAI) to predict survival outcomes in horses with colic, using clinical, procedural, and diagnostic data. Random forest and XGBoost emerged as top-performing models, achieving F1 scores of 85.9% and 86.1%, respectively. SHAP (Shapley additive explanations) was employed to provide interpretable insights, offering both global and local explanations for model predictions. The analysis revealed that key features, such as pulse rate, lesion type, and total protein levels, significantly influenced survival likelihood. Local interpretations highlighted the unique contribution of clinical factors to individual cases, enabling personalized insights that guide targeted treatment strategies. These tailored predictions empower veterinarians to prioritize interventions based on the specific conditions of each horse, moving beyond generalized care protocols. By combining predictive accuracy with interpretability, this study advances precision veterinary medicine, enhancing outcomes for equine colic cases and setting a benchmark for future applications of AI in animal health.
Collapse
Affiliation(s)
- Bekir Cetintav
- Department of Biostatistics, Veterinary Faculty, Burdur Mehmet Akif Ersoy University, 15030 Burdur Merkez, Turkey
| | - Ahmet Yalcin
- Institute of Science, Burdur Mehmet Akif Ersoy University, 15030 Burdur Merkez, Turkey;
| |
Collapse
|
6
|
Bréhélin L. Advancing Regulatory Genomics With Machine Learning. Bioinform Biol Insights 2024; 18:11779322241249562. [PMID: 39735654 PMCID: PMC11672376 DOI: 10.1177/11779322241249562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 04/09/2024] [Indexed: 12/31/2024] Open
Abstract
In recent years, several machine learning (ML) approaches have been proposed to predict gene expression signal and chromatin features from the DNA sequence alone. These models are often used to deduce and, to some extent, assess putative new biological insights about gene regulation, and they have led to very interesting advances in regulatory genomics. This article reviews a selection of these methods, ranging from linear models to random forests, kernel methods, and more advanced deep learning models. Specifically, we detail the different techniques and strategies that can be used to extract new gene-regulation hypotheses from these models. Furthermore, because these putative insights need to be validated with wet-lab experiments, we emphasize that it is important to have a measure of confidence associated with the extracted hypotheses. We review the procedures that have been proposed to measure this confidence for the different types of ML models, and we discuss the fact that they do not provide the same kind of information.
Collapse
|
7
|
Do DT, Yang MR, Vo TNS, Le NQK, Wu YW. Unitig-centered pan-genome machine learning approach for predicting antibiotic resistance and discovering novel resistance genes in bacterial strains. Comput Struct Biotechnol J 2024; 23:1864-1876. [PMID: 38707536 PMCID: PMC11067008 DOI: 10.1016/j.csbj.2024.04.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 04/13/2024] [Accepted: 04/13/2024] [Indexed: 05/07/2024] Open
Abstract
In current genomic research, the widely used methods for predicting antimicrobial resistance (AMR) often rely on prior knowledge of known AMR genes or reference genomes. However, these methods have limitations, potentially resulting in imprecise predictions owing to incomplete coverage of AMR mechanisms and genetic variations. To overcome these limitations, we propose a pan-genome-based machine learning approach to advance our understanding of AMR gene repertoires and uncover possible feature sets for precise AMR classification. By building compacted de Brujin graphs (cDBGs) from thousands of genomes and collecting the presence/absence patterns of unique sequences (unitigs) for Pseudomonas aeruginosa, we determined that using machine learning models on unitig-centered pan-genomes showed significant promise for accurately predicting the antibiotic resistance or susceptibility of microbial strains. Applying a feature-selection-based machine learning algorithm led to satisfactory predictive performance for the training dataset (with an area under the receiver operating characteristic curve (AUC) of > 0.929) and an independent validation dataset (AUC, approximately 0.77). Furthermore, the selected unitigs revealed previously unidentified resistance genes, allowing for the expansion of the resistance gene repertoire to those that have not previously been described in the literature on antibiotic resistance. These results demonstrate that our proposed unitig-based pan-genome feature set was effective in constructing machine learning predictors that could accurately identify AMR pathogens. Gene sets extracted using this approach may offer valuable insights into expanding known AMR genes and forming new hypotheses to uncover the underlying mechanisms of bacterial AMR.
Collapse
Affiliation(s)
- Duyen Thi Do
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Ming-Ren Yang
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
- Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
| | - Tran Nam Son Vo
- Department of Business Administration, College of Management, Lunghwa University of Science and Technology, Taoyuan City, Taiwan
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Yu-Wei Wu
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
- Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan
- TMU Research Center for Digestive Medicine, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
8
|
Hay EH. Machine Learning for the Genomic Prediction of Growth Traits in a Composite Beef Cattle Population. Animals (Basel) 2024; 14:3014. [PMID: 39457945 PMCID: PMC11505319 DOI: 10.3390/ani14203014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 10/14/2024] [Accepted: 10/17/2024] [Indexed: 10/28/2024] Open
Abstract
The adoption of genomic selection is prevalent across various plant and livestock species, yet existing models for predicting genomic breeding values often remain suboptimal. Machine learning models present a promising avenue to enhance prediction accuracy due to their ability to accommodate both linear and non-linear relationships. In this study, we evaluated four machine learning models-Random Forest, Support Vector Machine, Convolutional Neural Networks, and Multi-Layer Perceptrons-for predicting genomic values related to birth weight (BW), weaning weight (WW), and yearling weight (YW), and compared them with other conventional models-GBLUP (Genomic Best Linear Unbiased Prediction), Bayes A, and Bayes B. The results demonstrated that the GBLUP model achieved the highest prediction accuracy for both BW and YW, whereas the Random Forest model exhibited a superior prediction accuracy for WW. Furthermore, GBLUP outperformed the other models in terms of model fit, as evidenced by the lower mean square error values and regression coefficients of the corrected phenotypes on predicted values. Overall, the GBLUP model delivered a superior prediction accuracy and model fit compared to the machine learning models tested.
Collapse
Affiliation(s)
- El Hamidi Hay
- USDA Agricultural Research Service, Fort Keogh Livestock and Range Research Laboratory, Miles City, MT 59301, USA
| |
Collapse
|
9
|
Chan YL, Ho CSH, Tay GWN, Tan TWK, Tang TB. MicroRNA classification and discovery for major depressive disorder diagnosis: Towards a robust and interpretable machine learning approach. J Affect Disord 2024; 360:326-335. [PMID: 38788856 DOI: 10.1016/j.jad.2024.05.066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/08/2024] [Accepted: 05/15/2024] [Indexed: 05/26/2024]
Abstract
BACKGROUND Major depressive disorder (MDD) is notably underdiagnosed and undertreated due to its complex nature and subjective diagnostic methods. Biomarker identification would help provide a clearer understanding of MDD aetiology. Although machine learning (ML) has been implemented in previous studies to study the alteration of microRNA (miRNA) levels in MDD cases, clinical translation has not been feasible due to the lack of interpretability (i.e. too many miRNAs for consideration) and stability. METHODS This study applied logistic regression (LR) model to the blood miRNA expression profile to differentiate patients with MDD (n = 60) from healthy controls (HCs, n = 60). Embedded (L1-regularised logistic regression) feature selector was utilised to extract clinically relevant miRNAs, and optimized for clinical application. RESULTS Patients with MDD could be differentiated from HCs with the area under the receiver operating characteristic curve (AUC) of 0.81 on testing data when all available miRNAs were considered (which served as a benchmark). Our LR model selected miRNAs up to 5 (known as LR-5 model) emerged as the best model because it achieved a moderate classification ability (AUC = 0.75), relatively high interpretability (feature number = 5) and stability (ϕ̂Z=0.55) compared to the benchmark. The top-ranking miRNAs identified by our model have demonstrated associations with MDD pathways involving cytokine signalling in the immune system, the reelin signalling pathway, programmed cell death and cellular responses to stress. CONCLUSION The LR-5 model, which is optimised based on ML design factors, may lead to a robust and clinically usable MDD diagnostic tool.
Collapse
Affiliation(s)
- Yee Ling Chan
- Centre for Intelligent Signal and Imaging Research (CISIR), Universiti Teknologi PETRONAS (UTP), Bandar Seri Iskandar 32610, Perak, Malaysia
| | - Cyrus S H Ho
- Department of Psychological Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117543, Singapore
| | - Gabrielle W N Tay
- Department of Psychological Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117543, Singapore
| | - Trevor W K Tan
- Centre for Sleep and Cognition, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117543, Singapore; Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117543, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117583, Singapore; N.1 Institute for Health & Institute for Digital Medicine (WisDM), National University of Singapore, Singapore 117456, Singapore; Integrative Sciences and Engineering Programme (ISEP), National University of Singapore, Singapore 119077, Singapore
| | - Tong Boon Tang
- Centre for Intelligent Signal and Imaging Research (CISIR), Universiti Teknologi PETRONAS (UTP), Bandar Seri Iskandar 32610, Perak, Malaysia.
| |
Collapse
|
10
|
Suárez-Vega A, Gutiérrez-Gil B, Fonseca PAS, Hervás G, Pelayo R, Toral PG, Marina H, de Frutos P, Arranz JJ. Milk transcriptome biomarker identification to enhance feed efficiency and reduce nutritional costs in dairy ewes. Animal 2024; 18:101250. [PMID: 39096599 DOI: 10.1016/j.animal.2024.101250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 07/03/2024] [Accepted: 07/05/2024] [Indexed: 08/05/2024] Open
Abstract
In recent years, rising prices for high-quality protein-based feeds have significantly increased nutrition costs. Consequently, investigating strategies to reduce these expenses and improve feed efficiency (FE) have become increasingly important for the dairy sheep industry. This research investigates the impact of nutritional protein restriction (NPR) during prepuberty and FE on the milk transcriptome of dairy Assaf ewes (sampled during the first lactation). To this end, we first compared transcriptomic differences between NPR and control ewes. Subsequently, we evaluated gene expression differences between ewes with divergent FE, using feed conversion ratio (FCR), residual feed intake (RFI), and consensus classifications of high- and low-FE animals for both indices. Lastly, we assess milk gene expression as a predictor of FE phenotype using random forest. No effect was found for the prepubertal NPR on milk performance or FE. Moreover, at the milk transcriptome level, only one gene, HBB, was differentially expressed between the NPR (n = 14) and the control group (n = 14). Further, the transcriptomic analysis between divergent FE sheep revealed 114 differentially expressed genes (DEGs) for RFI index (high-FERFI = 10 vs low-FERFI = 10), 244 for FCR (high-FEFCR = 10 vs low-FEFCR = 10), and 1 016 DEGs between divergent consensus ewes for both indices (high-FEconsensus = 8 vs low-FEconsensus = 8). These results underscore the critical role of selected FE indices for RNA-Seq analyses, revealing that consensus divergent animals for both indices maximise differences in transcriptomic responses. Genes overexpressed in high-FEconsensus ewes were associated with milk production and mammary gland development, while low-FEconsensus genes were linked to higher metabolic expenditure for tissue organisation and repair. The best prediction accuracy for FE phenotype using random forest was obtained for a set of 44 genes consistently differentially expressed across lactations, with Spearman correlations of 0.37 and 0.22 for FCR and RFI, respectively. These findings provide insights into potential sustainability strategies for dairy sheep, highlighting the utility of transcriptomic markers as FE proxies.
Collapse
Affiliation(s)
- A Suárez-Vega
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 Leon, Spain
| | - B Gutiérrez-Gil
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 Leon, Spain
| | - P A S Fonseca
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 Leon, Spain
| | - G Hervás
- Instituto de Ganadería de Montaña (CSIC-University of León), Finca Marzanas s/n, 24346 Grulleros, León, Spain
| | - R Pelayo
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 Leon, Spain
| | - P G Toral
- Instituto de Ganadería de Montaña (CSIC-University of León), Finca Marzanas s/n, 24346 Grulleros, León, Spain
| | - H Marina
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 Leon, Spain
| | - P de Frutos
- Instituto de Ganadería de Montaña (CSIC-University of León), Finca Marzanas s/n, 24346 Grulleros, León, Spain
| | - J J Arranz
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 Leon, Spain.
| |
Collapse
|
11
|
Marina H, Arranz JJ, Suárez-Vega A, Pelayo R, Gutiérrez-Gil B, Toral PG, Hervás G, Frutos P, Fonseca PAS. Assessment of milk metabolites as biomarkers for predicting feed efficiency in dairy sheep. J Dairy Sci 2024; 107:4743-4757. [PMID: 38369116 DOI: 10.3168/jds.2023-23984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 01/11/2024] [Indexed: 02/20/2024]
Abstract
Estimating feed efficiency (FE) in dairy sheep is challenging due to the high cost of systems that measure individual feed intake. Identifying proxies that can serve as effective predictors of FE could make it possible to introduce FE into breeding programs. Here, 39 Assaf ewes in first lactation were evaluated regarding their FE by 2 metrics, residual feed intake (RFI) and feed conversion ratio (FCR). The ewes were classified into high, medium and low groups for each metric. Milk samples of the 39 ewes were subjected to untargeted metabolomics analysis. The complete milk metabolomic signature was used to discriminate the FE groups using partial least squares discriminant analysis. A total of 41 and 26 features were selected as the most relevant features for the discrimination of RFI and FCR groups, respectively. The predictive ability when utilizing the complete milk metabolomic signature and the reduced data sets were investigated using 4 machine learning (ML) algorithms and a multivariate regression method. The orthogonal partial least squares algorithm outperformed other ML algorithms for FCR prediction in the scenarios using the complete milk metabolite signature (R2 = 0.62 ± 0.06) and the 26 selected features (R2 = 0.62 ± 0.15). Regarding RFI predictions, the scenarios using the 41 selected features outperformed the scenario with the complete milk metabolite signature, where the multilayer feedforward artificial neural network (R2 = 0.18 ± 0.14) and extreme gradient boosting (R2 = 0.17 ± 0.15) outperformed other algorithms. The functionality of the selected metabolites implied that the metabolism of glucose, galactose, fructose, sphingolipids, amino acids, insulin, and thyroid hormones was at play. Compared with the use of traditional methods, practical applications of these biomarkers might simplify and reduce costs in selecting feed-efficient ewes.
Collapse
Affiliation(s)
- H Marina
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 León, Spain
| | - J J Arranz
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 León, Spain.
| | - A Suárez-Vega
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 León, Spain
| | - R Pelayo
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 León, Spain
| | - B Gutiérrez-Gil
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 León, Spain
| | - P G Toral
- Instituto de Ganadería de Montaña (CSIC-University of León), Finca Marzanas s/n, 24346 Grulleros, León, Spain
| | - G Hervás
- Instituto de Ganadería de Montaña (CSIC-University of León), Finca Marzanas s/n, 24346 Grulleros, León, Spain
| | - P Frutos
- Instituto de Ganadería de Montaña (CSIC-University of León), Finca Marzanas s/n, 24346 Grulleros, León, Spain
| | - P A S Fonseca
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 León, Spain
| |
Collapse
|
12
|
Mota LFM, Giannuzzi D, Pegolo S, Toledo-Alvarado H, Schiavon S, Gallo L, Trevisi E, Arazi A, Katz G, Rosa GJM, Cecchinato A. Combining genetic markers, on-farm information and infrared data for the in-line prediction of blood biomarkers of metabolic disorders in Holstein cattle. J Anim Sci Biotechnol 2024; 15:83. [PMID: 38851729 PMCID: PMC11162571 DOI: 10.1186/s40104-024-01042-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 04/28/2024] [Indexed: 06/10/2024] Open
Abstract
BACKGROUND Various blood metabolites are known to be useful indicators of health status in dairy cattle, but their routine assessment is time-consuming, expensive, and stressful for the cows at the herd level. Thus, we evaluated the effectiveness of combining in-line near infrared (NIR) milk spectra with on-farm (days in milk [DIM] and parity) and genetic markers for predicting blood metabolites in Holstein cattle. Data were obtained from 388 Holstein cows from a farm with an AfiLab system. NIR spectra, on-farm information, and single nucleotide polymorphisms (SNP) markers were blended to develop calibration equations for blood metabolites using the elastic net (ENet) approach, considering 3 models: (1) Model 1 (M1) including only NIR information, (2) Model 2 (M2) with both NIR and on-farm information, and (3) Model 3 (M3) combining NIR, on-farm and genomic information. Dimension reduction was considered for M3 by preselecting SNP markers from genome-wide association study (GWAS) results. RESULTS Results indicate that M2 improved the predictive ability by an average of 19% for energy-related metabolites (glucose, cholesterol, NEFA, BHB, urea, and creatinine), 20% for liver function/hepatic damage, 7% for inflammation/innate immunity, 24% for oxidative stress metabolites, and 23% for minerals compared to M1. Meanwhile, M3 further enhanced the predictive ability by 34% for energy-related metabolites, 32% for liver function/hepatic damage, 22% for inflammation/innate immunity, 42.1% for oxidative stress metabolites, and 41% for minerals, compared to M1. We found improved predictive ability of M3 using selected SNP markers from GWAS results using a threshold of > 2.0 by 5% for energy-related metabolites, 9% for liver function/hepatic damage, 8% for inflammation/innate immunity, 22% for oxidative stress metabolites, and 9% for minerals. Slight reductions were observed for phosphorus (2%), ferric-reducing antioxidant power (1%), and glucose (3%). Furthermore, it was found that prediction accuracies are influenced by using more restrictive thresholds (-log10(P-value) > 2.5 and 3.0), with a lower increase in the predictive ability. CONCLUSION Our results highlighted the potential of combining several sources of information, such as genetic markers, on-farm information, and in-line NIR infrared data improves the predictive ability of blood metabolites in dairy cattle, representing an effective strategy for large-scale in-line health monitoring in commercial herds.
Collapse
Affiliation(s)
- Lucio F M Mota
- Department of Agronomy, Food, Natural resources, Animals and Environment (DAFNAE), University of Padova, Legnaro, Padova, 35020, Italy
| | - Diana Giannuzzi
- Department of Agronomy, Food, Natural resources, Animals and Environment (DAFNAE), University of Padova, Legnaro, Padova, 35020, Italy.
| | - Sara Pegolo
- Department of Agronomy, Food, Natural resources, Animals and Environment (DAFNAE), University of Padova, Legnaro, Padova, 35020, Italy
| | - Hugo Toledo-Alvarado
- Department of Genetics and Biostatistics, School of Veterinary Medicine and Zootechnics, National Autonomous University of Mexico, Ciudad Universitaria, Mexico City, 04510, Mexico
| | - Stefano Schiavon
- Department of Agronomy, Food, Natural resources, Animals and Environment (DAFNAE), University of Padova, Legnaro, Padova, 35020, Italy
| | - Luigi Gallo
- Department of Agronomy, Food, Natural resources, Animals and Environment (DAFNAE), University of Padova, Legnaro, Padova, 35020, Italy
| | - Erminio Trevisi
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, Piacenza, 29122, Italy
| | | | - Gil Katz
- Afimilk LTD, Afikim, 15148, Israel
| | - Guilherme J M Rosa
- Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - Alessio Cecchinato
- Department of Agronomy, Food, Natural resources, Animals and Environment (DAFNAE), University of Padova, Legnaro, Padova, 35020, Italy
| |
Collapse
|
13
|
Mota LFM, Giannuzzi D, Pegolo S, Sturaro E, Gianola D, Negrini R, Trevisi E, Ajmone Marsan P, Cecchinato A. Genomic prediction of blood biomarkers of metabolic disorders in Holstein cattle using parametric and nonparametric models. Genet Sel Evol 2024; 56:31. [PMID: 38684971 PMCID: PMC11057143 DOI: 10.1186/s12711-024-00903-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 04/12/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Metabolic disturbances adversely impact productive and reproductive performance of dairy cattle due to changes in endocrine status and immune function, which increase the risk of disease. This may occur in the post-partum phase, but also throughout lactation, with sub-clinical symptoms. Recently, increased attention has been directed towards improved health and resilience in dairy cattle, and genomic selection (GS) could be a helpful tool for selecting animals that are more resilient to metabolic disturbances throughout lactation. Hence, we evaluated the genomic prediction of serum biomarkers levels for metabolic distress in 1353 Holsteins genotyped with the 100K single nucleotide polymorphism (SNP) chip assay. The GS was evaluated using parametric models best linear unbiased prediction (GBLUP), Bayesian B (BayesB), elastic net (ENET), and nonparametric models, gradient boosting machine (GBM) and stacking ensemble (Stack), which combines ENET and GBM approaches. RESULTS The results show that the Stack approach outperformed other methods with a relative difference (RD), calculated as an increment in prediction accuracy, of approximately 18.0% compared to GBLUP, 12.6% compared to BayesB, 8.7% compared to ENET, and 4.4% compared to GBM. The highest RD in prediction accuracy between other models with respect to GBLUP was observed for haptoglobin (hapto) from 17.7% for BayesB to 41.2% for Stack; for Zn from 9.8% (BayesB) to 29.3% (Stack); for ceruloplasmin (CuCp) from 9.3% (BayesB) to 27.9% (Stack); for ferric reducing antioxidant power (FRAP) from 8.0% (BayesB) to 40.0% (Stack); and for total protein (PROTt) from 5.7% (BayesB) to 22.9% (Stack). Using a subset of top SNPs (1.5k) selected from the GBM approach improved the accuracy for GBLUP from 1.8 to 76.5%. However, for the other models reductions in prediction accuracy of 4.8% for ENET (average of 10 traits), 5.9% for GBM (average of 21 traits), and 6.6% for Stack (average of 16 traits) were observed. CONCLUSIONS Our results indicate that the Stack approach was more accurate in predicting metabolic disturbances than GBLUP, BayesB, ENET, and GBM and seemed to be competitive for predicting complex phenotypes with various degrees of mode of inheritance, i.e. additive and non-additive effects. Selecting markers based on GBM improved accuracy of GBLUP.
Collapse
Affiliation(s)
- Lucio F M Mota
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
| | - Diana Giannuzzi
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| | - Sara Pegolo
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy.
| | - Enrico Sturaro
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| | - Daniel Gianola
- Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - Riccardo Negrini
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Erminio Trevisi
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
- Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Paolo Ajmone Marsan
- Department of Animal Science, Food and Nutrition (DIANA) and the Romeo and Enrica Invernizzi Research Center for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food, and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
- Nutrigenomics and Proteomics Research Center, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Alessio Cecchinato
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, 35020, Legnaro, PD, Italy
| |
Collapse
|
14
|
Abdoli N, Zhang K, Gilley P, Chen X, Sadri Y, Thai T, Dockery L, Moore K, Mannel R, Qiu Y. Evaluating the Effectiveness of 2D and 3D CT Image Features for Predicting Tumor Response to Chemotherapy. Bioengineering (Basel) 2023; 10:1334. [PMID: 38002458 PMCID: PMC10669238 DOI: 10.3390/bioengineering10111334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/08/2023] [Accepted: 11/09/2023] [Indexed: 11/26/2023] Open
Abstract
Background and Objective: 2D and 3D tumor features are widely used in a variety of medical image analysis tasks. However, for chemotherapy response prediction, the effectiveness between different kinds of 2D and 3D features are not comprehensively assessed, especially in ovarian-cancer-related applications. This investigation aims to accomplish such a comprehensive evaluation. Methods: For this purpose, CT images were collected retrospectively from 188 advanced-stage ovarian cancer patients. All the metastatic tumors that occurred in each patient were segmented and then processed by a set of six filters. Next, three categories of features, namely geometric, density, and texture features, were calculated from both the filtered results and the original segmented tumors, generating a total of 1403 and 1595 features for the 2D and 3D tumors, respectively. In addition to the conventional single-slice 2D and full-volume 3D tumor features, we also computed the incomplete-3D tumor features, which were achieved by sequentially adding one individual CT slice and calculating the corresponding features. Support vector machine (SVM)-based prediction models were developed and optimized for each feature set. Five-fold cross-validation was used to assess the performance of each individual model. Results: The results show that the 2D feature-based model achieved an AUC (area under the ROC curve (receiver operating characteristic)) of 0.84 ± 0.02. When adding more slices, the AUC first increased to reach the maximum and then gradually decreased to 0.86 ± 0.02. The maximum AUC was yielded when adding two adjacent slices, with a value of 0.91 ± 0.01. Conclusions: This initial result provides meaningful information for optimizing machine learning-based decision-making support tools in the future.
Collapse
Affiliation(s)
- Neman Abdoli
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA; (N.A.); (K.Z.); (Y.S.)
| | - Ke Zhang
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA; (N.A.); (K.Z.); (Y.S.)
- Stephenson School of Biomedical Engineering, University of Oklahoma, Norman, OK 73019, USA
| | - Patrik Gilley
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA; (N.A.); (K.Z.); (Y.S.)
| | - Xuxin Chen
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA; (N.A.); (K.Z.); (Y.S.)
| | - Youkabed Sadri
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA; (N.A.); (K.Z.); (Y.S.)
| | - Theresa Thai
- Department of Radiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA;
| | - Lauren Dockery
- Department of Obstetrics and Gynecology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Kathleen Moore
- Department of Obstetrics and Gynecology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Robert Mannel
- Department of Obstetrics and Gynecology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Yuchen Qiu
- School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019, USA; (N.A.); (K.Z.); (Y.S.)
| |
Collapse
|
15
|
Heinrich F, Lange TM, Kircher M, Ramzan F, Schmitt AO, Gültas M. Exploring the potential of incremental feature selection to improve genomic prediction accuracy. Genet Sel Evol 2023; 55:78. [PMID: 37946104 PMCID: PMC10634161 DOI: 10.1186/s12711-023-00853-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 11/02/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND The ever-increasing availability of high-density genomic markers in the form of single nucleotide polymorphisms (SNPs) enables genomic prediction, i.e. the inference of phenotypes based solely on genomic data, in the field of animal and plant breeding, where it has become an important tool. However, given the limited number of individuals, the abundance of variables (SNPs) can reduce the accuracy of prediction models due to overfitting or irrelevant SNPs. Feature selection can help to reduce the number of irrelevant SNPs and increase the model performance. In this study, we investigated an incremental feature selection approach based on ranking the SNPs according to the results of a genome-wide association study that we combined with random forest as a prediction model, and we applied it on several animal and plant datasets. RESULTS Applying our approach to different datasets yielded a wide range of outcomes, i.e. from a substantial increase in prediction accuracy in a few cases to minor improvements when only a fraction of the available SNPs were used. Compared with models using all available SNPs, our approach was able to achieve comparable performances with a considerably reduced number of SNPs in several cases. Our approach showcased state-of-the-art efficiency and performance while having a faster computation time. CONCLUSIONS The results of our study suggest that our incremental feature selection approach has the potential to improve prediction accuracy substantially. However, this gain seems to depend on the genomic data used. Even for datasets where the number of markers is smaller than the number of individuals, feature selection may still increase the performance of the genomic prediction. Our approach is implemented in R and is available at https://github.com/FelixHeinrich/GP_with_IFS/ .
Collapse
Affiliation(s)
- Felix Heinrich
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075, Göttingen, Germany.
| | - Thomas Martin Lange
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075, Göttingen, Germany
| | - Magdalena Kircher
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, 30559, Hannover, Germany
| | - Faisal Ramzan
- Institute of Animal and Dairy Sciences, University of Agriculture Faisalabad, Jail Road, 38000, Faisalabad, Pakistan
| | - Armin Otto Schmitt
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075, Göttingen, Germany
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany
| | - Mehmet Gültas
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany.
- Faculty of Agriculture, South Westphalia University of Applied Sciences, 59494, Soest, Germany.
| |
Collapse
|
16
|
Mora M, González P, Quevedo JR, Montañés E, Tusell L, Bergsma R, Piles M. Impact of multi-output and stacking methods on feed efficiency prediction from genotype using machine learning algorithms. J Anim Breed Genet 2023; 140:638-652. [PMID: 37403756 DOI: 10.1111/jbg.12815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/23/2023] [Accepted: 06/23/2023] [Indexed: 07/06/2023]
Abstract
Feeding represents the largest economic cost in meat production; therefore, selection to improve traits related to feed efficiency is a goal in most livestock breeding programs. Residual feed intake (RFI), that is, the difference between the actual and the expected feed intake based on animal's requirements, has been used as the selection criteria to improve feed efficiency since it was proposed by Kotch in 1963. In growing pigs, it is computed as the residual of the multiple regression model of daily feed intake (DFI), on average daily gain (ADG), backfat thickness (BFT), and metabolic body weight (MW). Recently, prediction using single-output machine learning algorithms and information from SNPs as predictor variables have been proposed for genomic selection in growing pigs, but like in other species, the prediction quality achieved for RFI has been generally poor. However, it has been suggested that it could be improved through multi-output or stacking methods. For this purpose, four strategies were implemented to predict RFI. Two of them correspond to the computation of RFI in an indirect way using the predicted values of its components obtained from (i) individual (multiple single-output strategy) or (ii) simultaneous predictions (multi-output strategy). The other two correspond to the direct prediction of RFI using (iii) the individual predictions of its components as predictor variables jointly with the genotype (stacking strategy), or (iv) using only the genotypes as predictors of RFI (single-output strategy). The single-output strategy was considered the benchmark. This research aimed to test the former three hypotheses using data recorded from 5828 growing pigs and 45,610 SNPs. For all the strategies two different learning methods were fitted: random forest (RF) and support vector regression (SVR). A nested cross-validation (CV) with an outer 10-folds CV and an inner threefold CV for hyperparameter tuning was implemented to test all strategies. This scheme was repeated using as predictor variables different subsets with an increasing number (from 200 to 3000) of the most informative SNPs identified with RF. Results showed that the highest prediction performance was achieved with 1000 SNPs, although the stability of feature selection was poor (0.13 points out of 1). For all SNP subsets, the benchmark showed the best prediction performance. Using the RF as a learner and the 1000 most informative SNPs as predictors, the mean (SD) of the 10 values obtained in the test sets were: 0.23 (0.04) for the Spearman correlation, 0.83 (0.04) for the zero-one loss, and 0.33 (0.03) for the rank distance loss. We conclude that the information on predicted components of RFI (DFI, ADG, MW, and BFT) does not contribute to improve the quality of the prediction of this trait in relation to the one obtained with the single-output strategy.
Collapse
Affiliation(s)
- Mónica Mora
- Departamento de Ciencia Animal, Universidad Politècnica de València, Valencia, Spain
- Animal Breeding and Genetics, Institute of Agrifood Research and Technology (IRTA), Barcelona, Spain
| | - Pablo González
- Artificial Intelligence Centre, University of Oviedo, Gijón, Spain
| | | | - Elena Montañés
- Artificial Intelligence Centre, University of Oviedo, Gijón, Spain
| | - Llibertat Tusell
- Animal Breeding and Genetics, Institute of Agrifood Research and Technology (IRTA), Barcelona, Spain
| | - Rob Bergsma
- Topigs Norsvin Research Center, Beuningen, Netherlands
| | - Miriam Piles
- Animal Breeding and Genetics, Institute of Agrifood Research and Technology (IRTA), Barcelona, Spain
| |
Collapse
|
17
|
Sadeqi MB, Ballvora A, Dadshani S, Léon J. Genetic Parameter and Hyper-Parameter Estimation Underlie Nitrogen Use Efficiency in Bread Wheat. Int J Mol Sci 2023; 24:14275. [PMID: 37762585 PMCID: PMC10531695 DOI: 10.3390/ijms241814275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 09/07/2023] [Accepted: 09/14/2023] [Indexed: 09/29/2023] Open
Abstract
Estimation and prediction play a key role in breeding programs. Currently, phenotyping of complex traits such as nitrogen use efficiency (NUE) in wheat is still expensive, requires high-throughput technologies and is very time consuming compared to genotyping. Therefore, researchers are trying to predict phenotypes based on marker information. Genetic parameters such as population structure, genomic relationship matrix, marker density and sample size are major factors that increase the performance and accuracy of a model. However, they play an important role in adjusting the statistically significant false discovery rate (FDR) threshold in estimation. In parallel, there are many genetic hyper-parameters that are hidden and not represented in the given genomic selection (GS) model but have significant effects on the results, such as panel size, number of markers, minor allele frequency, number of call rates for each marker, number of cross validations and batch size in the training set of the genomic file. The main challenge is to ensure the reliability and accuracy of predicted breeding values (BVs) as results. Our study has confirmed the results of bias-variance tradeoff and adaptive prediction error for the ensemble-learning-based model STACK, which has the highest performance when estimating genetic parameters and hyper-parameters in a given GS model compared to other models.
Collapse
Affiliation(s)
- Mohammad Bahman Sadeqi
- INRES-Plant Breeding, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113 Bonn, Germany; (M.B.S.); (J.L.)
| | - Agim Ballvora
- INRES-Plant Breeding, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113 Bonn, Germany; (M.B.S.); (J.L.)
| | - Said Dadshani
- INRES-Plant Nutrition, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113 Bonn, Germany;
| | - Jens Léon
- INRES-Plant Breeding, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113 Bonn, Germany; (M.B.S.); (J.L.)
| |
Collapse
|
18
|
Chafai N, Hayah I, Houaga I, Badaoui B. A review of machine learning models applied to genomic prediction in animal breeding. Front Genet 2023; 14:1150596. [PMID: 37745853 PMCID: PMC10516561 DOI: 10.3389/fgene.2023.1150596] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 08/22/2023] [Indexed: 09/26/2023] Open
Abstract
The advent of modern genotyping technologies has revolutionized genomic selection in animal breeding. Large marker datasets have shown several drawbacks for traditional genomic prediction methods in terms of flexibility, accuracy, and computational power. Recently, the application of machine learning models in animal breeding has gained a lot of interest due to their tremendous flexibility and their ability to capture patterns in large noisy datasets. Here, we present a general overview of a handful of machine learning algorithms and their application in genomic prediction to provide a meta-picture of their performance in genomic estimated breeding values estimation, genotype imputation, and feature selection. Finally, we discuss a potential adoption of machine learning models in genomic prediction in developing countries. The results of the reviewed studies showed that machine learning models have indeed performed well in fitting large noisy data sets and modeling minor nonadditive effects in some of the studies. However, sometimes conventional methods outperformed machine learning models, which confirms that there's no universal method for genomic prediction. In summary, machine learning models have great potential for extracting patterns from single nucleotide polymorphism datasets. Nonetheless, the level of their adoption in animal breeding is still low due to data limitations, complex genetic interactions, a lack of standardization and reproducibility, and the lack of interpretability of machine learning models when trained with biological data. Consequently, there is no remarkable outperformance of machine learning methods compared to traditional methods in genomic prediction. Therefore, more research should be conducted to discover new insights that could enhance livestock breeding programs.
Collapse
Affiliation(s)
- Narjice Chafai
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Ichrak Hayah
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Isidore Houaga
- Centre for Tropical Livestock Genetics and Health, The Roslin Institute, Royal (Dick) School of Veterinary Medicine, The University of Edinburgh, Edinburgh, United Kingdom
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laayoune, Morocco
| |
Collapse
|
19
|
GhoshRoy D, Alvi PA, Santosh KC. AI Tools for Assessing Human Fertility Using Risk Factors: A State-of-the-Art Review. J Med Syst 2023; 47:91. [PMID: 37610455 DOI: 10.1007/s10916-023-01983-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 08/02/2023] [Indexed: 08/24/2023]
Abstract
Infertility has massively disrupted social and marital life, resulting in stressful emotional well-being. Early diagnosis is the utmost need for faster adaption to respond to these changes, which makes possible via AI tools. Our main objective is to comprehend the role of AI in fertility detection since we have primarily worked to find biomarkers and related risk factors associated with infertility. This paper aims to vividly analyse the role of AI as an effective method in screening, predicting for infertility and related risk factors. Three scientific repositories: PubMed, Web of Science, and Scopus, are used to gather relevant articles via technical terms: (human infertility OR human fertility) AND risk factors AND (machine learning OR artificial intelligence OR intelligent system). In this way, we systematically reviewed 42 articles and performed a meta-analysis. The significant findings and recommendations are discussed. These include the rising importance of data augmentation, feature extraction, explainability, and the need to revisit the meaning of an effective system for fertility analysis. Additionally, the paper outlines various mitigation actions that can be employed to tackle infertility and its related risk factors. These insights contribute to a better understanding of the role of AI in fertility analysis and the potential for improving reproductive health outcomes.
Collapse
Affiliation(s)
- Debasmita GhoshRoy
- School of Automation, Banasthali Vidyapith, 304022, Rajasthan, India
- Applied AI Research Lab, Vermillion, SD, 57069, USA
| | - P A Alvi
- Department of Physics, Banasthali Vidyapith, 304022, Rajasthan, India
| | - K C Santosh
- Department of Computer Science, University of South Dakota, Vermillion, SD, 57069, USA.
- Applied AI Research Lab, Vermillion, SD, 57069, USA.
| |
Collapse
|
20
|
Chang L, Fukuoka Y, Aouizerat BE, Zhang L, Flowers E. Prediction Performance of Feature Selectors and Classifiers on Highly Dimensional Transcriptomic Data for Prediction of Weight Loss in Filipino Americans at Risk for Type 2 Diabetes. Biol Res Nurs 2023; 25:393-403. [PMID: 36600204 PMCID: PMC10404908 DOI: 10.1177/10998004221147513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Background: Accurate prediction of risk for chronic diseases like type 2 diabetes (T2D) is challenging due to the complex underlying etiology. Integration of more complex data types from sensors and leveraging technologies for collection of -omics datasets may provide greater insights into the specific risk profile for complex diseases.Methods: We performed a literature review to identify feature selection methods and machine learning models for prediction of weight loss in a previously completed clinical trial (NCT02278939) of a behavioral intervention for weight loss in Filipinos at risk for T2D. Features included demographic and clinical characteristics, dietary factors, physical activity, and transcriptomics.Results: We identified four feature selection methods: Correlation-based Feature Subset Selection (CfsSubsetEval) with BestFirst, Kolmogorov-Smirnov (KS) test with correlation featureselection (CFS), DESeq2, and max-relevance-min-relevance (MRMR) with linear forward search and mutual information (MI) and four machine learning algorithms: support vector machine, decision tree, random forest, and extra trees that are applicable to prediction of weight loss using the specified feature types.Conclusion: More accurate prediction of risk for T2D and other complex conditions may be possible by leveraging complex data types from sensors and -omics datasets. Emerging methods for feature selection and machine learning algorithms make this type of modeling feasible.
Collapse
Affiliation(s)
- Lisa Chang
- Department of Physiological Nursing, University of California San Francisco, San Francisco, CA, USA
- Keck Graduate Institute, Claremont, CA, USA
| | - Yoshimi Fukuoka
- Department of Physiological Nursing, University of California San Francisco, San Francisco, CA, USA
| | - Bradley E. Aouizerat
- Bluestone Center for Clinical Research, New York University, New York, NY, USA
- Department of Oral and Maxillofacial Surgery, New York University, New York, NY, USA
| | - Li Zhang
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
- Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Elena Flowers
- Department of Physiological Nursing, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
21
|
Feng J, Wang L, Yang X, Chen Q, Cheng X. Prognostic prediction by a novel integrative inflammatory and nutritional score based on least absolute shrinkage and selection operator in esophageal squamous cell carcinoma. Front Nutr 2022; 9:966518. [PMID: 36438741 PMCID: PMC9686353 DOI: 10.3389/fnut.2022.966518] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 10/25/2022] [Indexed: 09/22/2023] Open
Abstract
BACKGROUND This study aimed to establish and validate a novel predictive model named integrative inflammatory and nutritional score (IINS) for prognostic prediction in esophageal squamous cell carcinoma (ESCC). MATERIALS AND METHODS We retrospectively recruited 494 pathologically confirmed ESCC patients with surgery and randomized them into training (n = 346) or validation group (n = 148). The least absolute shrinkage and selection operator (LASSO) Cox proportional hazards (PH) regression analysis was initially used to construct a novel predictive model of IINS. The clinical features and prognostic factors with hazard ratio (HRs) and 95% confidence intervals (CIs) grouped by IINS were analyzed. Nomogram was also established to verify the prognostic value of IINS. RESULTS According to the LASSO Cox PH regression analysis, a novel score of IINS was initially constructed based on 10 inflammatory and nutritional indicators with the optimal cut-off level of 2.35. The areas under the curve (AUCs) of IINS regarding prognostic ability in 1-year, 3-years, and 5-years prediction were 0.814 (95% CI: 0.769-0.854), 0.748 (95% CI: 0.698-0.793), and 0.792 (95% CI: 0.745-0.833) in the training cohort and 0.802 (95% CI: 0.733-0.866), 0.702 (95% CI: 0.621-0.774), and 0.748 (95% CI: 0.670-0.816) in the validation cohort, respectively. IINS had the largest AUCs in the two cohorts compared with other prognostic indicators, indicating a higher predictive ability. A better 5-years cancer-specific survival (CSS) was found in patients with IINS ≤ 2.35 compared with those with IINS > 2.35 in both training cohort (54.3% vs. 11.1%, P < 0.001) and validation cohort (53.7% vs. 18.2%, P < 0.001). The IINS was then confirmed as a useful independent factor (training cohort: HR: 3.000, 95% CI: 2.254-3.992, P < 0.001; validation cohort: HR: 2.609, 95% CI: 1.693-4.020, P < 0.001). Finally, an IINS-based predictive nomogram model was established and validated the CSS prediction (training set: C-index = 0.71 and validation set: C-index = 0.69, respectively). CONCLUSION Preoperative IINS is an independent predictor of CSS in ESCC. The nomogram based on IINS may be used as a potential risk stratification to predict individual CSS and guide treatment in ESCC with radical resection.
Collapse
Affiliation(s)
- Jifeng Feng
- The Second Clinical Medical College, Zhejiang Chinese Medical University, Hangzhou, China
- Department of Thoracic Oncological Surgery, Chinese Academy of Science, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Basic Medicine and Cancer (IBMC), Hangzhou, China
- Chinese Academy of Science, Zhejiang Provincial Research Center for Upper Gastrointestinal Tract Cancer, Key Laboratory of Prevention, Diagnosis and Therapy of Upper Gastrointestinal Cancer of Zhejiang Province, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Basic Medicine and Cancer (IBMC), Hangzhou, China
| | - Liang Wang
- Department of Thoracic Oncological Surgery, Chinese Academy of Science, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Basic Medicine and Cancer (IBMC), Hangzhou, China
| | - Xun Yang
- Department of Thoracic Oncological Surgery, Chinese Academy of Science, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Basic Medicine and Cancer (IBMC), Hangzhou, China
| | - Qixun Chen
- Department of Thoracic Oncological Surgery, Chinese Academy of Science, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Basic Medicine and Cancer (IBMC), Hangzhou, China
| | - Xiangdong Cheng
- Chinese Academy of Science, Zhejiang Provincial Research Center for Upper Gastrointestinal Tract Cancer, Key Laboratory of Prevention, Diagnosis and Therapy of Upper Gastrointestinal Cancer of Zhejiang Province, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Basic Medicine and Cancer (IBMC), Hangzhou, China
| |
Collapse
|
22
|
Integrating genome-wide association study and pathway analysis reveals physiological aspects affecting heifer early calving defined at different ages in Nelore cattle. Genomics 2022; 114:110395. [DOI: 10.1016/j.ygeno.2022.110395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 05/23/2022] [Accepted: 06/01/2022] [Indexed: 11/22/2022]
|
23
|
Li F, Yin J, Lu M, Yang Q, Zeng Z, Zhang B, Li Z, Qiu Y, Dai H, Chen Y, Zhu F. ConSIG: consistent discovery of molecular signature from OMIC data. Brief Bioinform 2022; 23:6618243. [PMID: 35758241 DOI: 10.1093/bib/bbac253] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 05/09/2022] [Accepted: 05/31/2022] [Indexed: 12/12/2022] Open
Abstract
The discovery of proper molecular signature from OMIC data is indispensable for determining biological state, physiological condition, disease etiology, and therapeutic response. However, the identified signature is reported to be highly inconsistent, and there is little overlap among the signatures identified from different biological datasets. Such inconsistency raises doubts about the reliability of reported signatures and significantly hampers its biological and clinical applications. Herein, an online tool, ConSIG, was constructed to realize consistent discovery of gene/protein signature from any uploaded transcriptomic/proteomic data. This tool is unique in a) integrating a novel strategy capable of significantly enhancing the consistency of signature discovery, b) determining the optimal signature by collective assessment, and c) confirming the biological relevance by enriching the disease/gene ontology. With the increasingly accumulated concerns about signature consistency and biological relevance, this online tool is expected to be used as an essential complement to other existing tools for OMIC-based signature discovery. ConSIG is freely accessible to all users without login requirement at https://idrblab.org/consig/.
Collapse
Affiliation(s)
- Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Qingxia Yang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Zhenyu Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yunqing Qiu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, 79 QingChun Road, Hangzhou, Zhejiang 310000, China
| | - Haibin Dai
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yuzong Chen
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, The Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China.,Qian Xuesen Collaborative Research Center of Astrochemistry and Space Life Sciences, Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
24
|
Mancin E, Mota LFM, Tuliozi B, Verdiglione R, Mantovani R, Sartori C. Improvement of Genomic Predictions in Small Breeds by Construction of Genomic Relationship Matrix Through Variable Selection. Front Genet 2022; 13:814264. [PMID: 35664297 PMCID: PMC9158133 DOI: 10.3389/fgene.2022.814264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
Genomic selection has been increasingly implemented in the animal breeding industry, and it is becoming a routine method in many livestock breeding contexts. However, its use is still limited in several small-population local breeds, which are, nonetheless, an important source of genetic variability of great economic value. A major roadblock for their genomic selection is accuracy when population size is limited: to improve breeding value accuracy, variable selection models that assume heterogenous variance have been proposed over the last few years. However, while these models might outperform traditional and genomic predictions in terms of accuracy, they also carry a proportional increase of breeding value bias and dispersion. These mutual increases are especially striking when genomic selection is performed with a low number of phenotypes and high shrinkage value—which is precisely the situation that happens with small local breeds. In our study, we tested several alternative methods to improve the accuracy of genomic selection in a small population. First, we investigated the impact of using only a subset of informative markers regarding prediction accuracy, bias, and dispersion. We used different algorithms to select them, such as recursive feature eliminations, penalized regression, and XGBoost. We compared our results with the predictions of pedigree-based BLUP, single-step genomic BLUP, and weighted single-step genomic BLUP in different simulated populations obtained by combining various parameters in terms of number of QTLs and effective population size. We also investigated these approaches on a real data set belonging to the small local Rendena breed. Our results show that the accuracy of GBLUP in small-sized populations increased when performed with SNPs selected via variable selection methods both in simulated and real data sets. In addition, the use of variable selection models—especially those using XGBoost—in our real data set did not impact bias and the dispersion of estimated breeding values. We have discussed possible explanations for our results and how our study can help estimate breeding values for future genomic selection in small breeds.
Collapse
Affiliation(s)
- Enrico Mancin
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Lucio Flavio Macedo Mota
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Beniamino Tuliozi
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Rina Verdiglione
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Roberto Mantovani
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Cristina Sartori
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| |
Collapse
|
25
|
Mota LFM, Santos SWB, Júnior GAF, Bresolin T, Mercadante MEZ, Silva JAV, Cyrillo JNSG, Monteiro FM, Carvalheiro R, Albuquerque LG. Meta-analysis across Nellore cattle populations identifies common metabolic mechanisms that regulate feed efficiency-related traits. BMC Genomics 2022; 23:424. [PMID: 35672696 PMCID: PMC9172108 DOI: 10.1186/s12864-022-08671-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 05/03/2022] [Indexed: 11/28/2022] Open
Abstract
Background Feed efficiency (FE) related traits play a key role in the economy and sustainability of beef cattle production systems. The accurate knowledge of the physiologic background for FE-related traits can help the development of more efficient selection strategies for them. Hence, multi-trait weighted GWAS (MTwGWAS) and meta-analyze were used to find genomic regions associated with average daily gain (ADG), dry matter intake (DMI), feed conversion ratio (FCR), feed efficiency (FE), and residual feed intake (RFI). The FE-related traits and genomic information belong to two breeding programs that perform the FE test at different ages: post-weaning (1,024 animals IZ population) and post-yearling (918 animals for the QLT population). Results The meta-analyze MTwGWAS identified 14 genomic regions (-log10(p -value) > 5) regions mapped on BTA 1, 2, 3, 4, 7, 8, 11, 14, 15, 18, 21, and 29. These regions explained a large proportion of the total genetic variance for FE-related traits across-population ranging from 20% (FCR) to 36% (DMI) in the IZ population and from 22% (RFI) to 28% (ADG) in the QLT population. Relevant candidate genes within these regions (LIPE, LPL, IGF1R, IGF1, IGFBP5, IGF2, INS, INSR, LEPR, LEPROT, POMC, NPY, AGRP, TGFB1, GHSR, JAK1, LYN, MOS, PLAG1, CHCD7, LCAT, and PLA2G15) highlighted that the physiological mechanisms related to neuropeptides and the metabolic signals controlling the body's energy balance are responsible for leading to greater feed efficiency. Integrated meta-analysis results and functional pathway enrichment analysis highlighted the major effect of biological functions linked to energy, lipid metabolism, and hormone signaling that mediates the effects of peptide signals in the hypothalamus and whole-body energy homeostasis affecting the genetic control of FE-related traits in Nellore cattle. Conclusions Genes and pathways associated with common signals for feed efficiency-related traits provide better knowledge about regions with biological relevance in physiological mechanisms associated with differences in energy metabolism and hypothalamus signaling. These pleiotropic regions would support the selection for feed efficiency-related traits, incorporating and pondering causal variations assigning prior weights in genomic selection approaches. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08671-w.
Collapse
Affiliation(s)
- Lucio F M Mota
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal - SP, São Paulo, 14884-900, Brazil.
| | - Samuel W B Santos
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal - SP, São Paulo, 14884-900, Brazil
| | - Gerardo A Fernandes Júnior
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal - SP, São Paulo, 14884-900, Brazil
| | - Tiago Bresolin
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal - SP, São Paulo, 14884-900, Brazil
| | - Maria E Z Mercadante
- Institute of Animal Science, Beef Cattle Research Center, Sertãozinho - SP, São Paulo, 14174-000, Brazil.,National Council for Science and Technological Development, Brasilia - DF, 71605-001, Brazil
| | - Josineudson A V Silva
- National Council for Science and Technological Development, Brasilia - DF, 71605-001, Brazil.,School of Veterinary Medicine and Animal Science, São Paulo State University (UNESP), Botucatu - SP, 18618-681, Brazil
| | - Joslaine N S G Cyrillo
- Institute of Animal Science, Beef Cattle Research Center, Sertãozinho - SP, São Paulo, 14174-000, Brazil
| | - Fábio M Monteiro
- Institute of Animal Science, Beef Cattle Research Center, Sertãozinho - SP, São Paulo, 14174-000, Brazil
| | - Roberto Carvalheiro
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal - SP, São Paulo, 14884-900, Brazil.,National Council for Science and Technological Development, Brasilia - DF, 71605-001, Brazil
| | - Lucia G Albuquerque
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal - SP, São Paulo, 14884-900, Brazil. .,National Council for Science and Technological Development, Brasilia - DF, 71605-001, Brazil.
| |
Collapse
|
26
|
Gabur I, Simioniuc DP, Snowdon RJ, Cristea D. Machine Learning Applied to the Search for Nonlinear Features in Breeding Populations. Front Artif Intell 2022; 5:876578. [PMID: 35669178 PMCID: PMC9164111 DOI: 10.3389/frai.2022.876578] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 04/19/2022] [Indexed: 11/13/2022] Open
Abstract
Large plant breeding populations are traditionally a source of novel allelic diversity and are at the core of selection efforts for elite material. Finding rare diversity requires a deep understanding of biological interactions between the genetic makeup of one genotype and its environmental conditions. Most modern breeding programs still rely on linear regression models to solve this problem, generalizing the complex genotype by phenotype interactions through manually constructed linear features. However, the identification of positive alleles vs. background can be addressed using deep learning approaches that have the capacity to learn complex nonlinear functions for the inputs. Machine learning (ML) is an artificial intelligence (AI) approach involving a range of algorithms to learn from input data sets and predict outcomes in other related samples. This paper describes a variety of techniques that include supervised and unsupervised ML algorithms to improve our understanding of nonlinear interactions from plant breeding data sets. Feature selection (FS) methods are combined with linear and nonlinear predictors and compared to traditional prediction methods used in plant breeding. Recent advances in ML allowed the construction of complex models that have the capacity to better differentiate between positive alleles and the genetic background. Using real plant breeding program data, we show that ML methods have the ability to outperform current approaches, increase prediction accuracies, decrease the computing time drastically, and improve the detection of important alleles involved in qualitative or quantitative traits.
Collapse
Affiliation(s)
- Iulian Gabur
- Department of Plant Breeding, Justus-Liebig-University, Giessen, Germany
- Department of Plant Sciences, Iasi University of Life Sciences, Iasi, Romania
- *Correspondence: Iulian Gabur
| | | | - Rod J. Snowdon
- Department of Plant Breeding, Justus-Liebig-University, Giessen, Germany
| | - Dan Cristea
- Institute of Computer Science, Romanian Academy, Iasi Branch, Iasi, Romania
| |
Collapse
|
27
|
Wang X, Shi S, Wang G, Luo W, Wei X, Qiu A, Luo F, Ding X. Using machine learning to improve the accuracy of genomic prediction of reproduction traits in pigs. J Anim Sci Biotechnol 2022; 13:60. [PMID: 35578371 PMCID: PMC9112588 DOI: 10.1186/s40104-022-00708-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 03/13/2022] [Indexed: 12/02/2022] Open
Abstract
Background Recently, machine learning (ML) has become attractive in genomic prediction, but its superiority in genomic prediction over conventional (ss) GBLUP methods and the choice of optimal ML methods need to be investigated. Results In this study, 2566 Chinese Yorkshire pigs with reproduction trait records were genotyped with the GenoBaits Porcine SNP 50 K and PorcineSNP50 panels. Four ML methods, including support vector regression (SVR), kernel ridge regression (KRR), random forest (RF) and Adaboost.R2 were implemented. Through 20 replicates of fivefold cross-validation (CV) and one prediction for younger individuals, the utility of ML methods in genomic prediction was explored. In CV, compared with genomic BLUP (GBLUP), single-step GBLUP (ssGBLUP) and the Bayesian method BayesHE, ML methods significantly outperformed these conventional methods. ML methods improved the genomic prediction accuracy of GBLUP, ssGBLUP, and BayesHE by 19.3%, 15.0% and 20.8%, respectively. In addition, ML methods yielded smaller mean squared error (MSE) and mean absolute error (MAE) in all scenarios. ssGBLUP yielded an improvement of 3.8% on average in accuracy compared to that of GBLUP, and the accuracy of BayesHE was close to that of GBLUP. In genomic prediction of younger individuals, RF and Adaboost.R2_KRR performed better than GBLUP and BayesHE, while ssGBLUP performed comparably with RF, and ssGBLUP yielded slightly higher accuracy and lower MSE than Adaboost.R2_KRR in the prediction of total number of piglets born, while for number of piglets born alive, Adaboost.R2_KRR performed significantly better than ssGBLUP. Among ML methods, Adaboost.R2_KRR consistently performed well in our study. Our findings also demonstrated that optimal hyperparameters are useful for ML methods. After tuning hyperparameters in CV and in predicting genomic outcomes of younger individuals, the average improvement was 14.3% and 21.8% over those using default hyperparameters, respectively. Conclusion Our findings demonstrated that ML methods had better overall prediction performance than conventional genomic selection methods, and could be new options for genomic prediction. Among ML methods, Adaboost.R2_KRR consistently performed well in our study, and tuning hyperparameters is necessary for ML methods. The optimal hyperparameters depend on the character of traits, datasets etc. Supplementary Information The online version contains supplementary material available at 10.1186/s40104-022-00708-0.
Collapse
Affiliation(s)
- Xue Wang
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Shaolei Shi
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Guijiang Wang
- Hebei Province Animal Husbandry and Improved Breeds Work Station, Shijiazhuang, Hebei, China
| | - Wenxue Luo
- Hebei Province Animal Husbandry and Improved Breeds Work Station, Shijiazhuang, Hebei, China
| | - Xia Wei
- Zhangjiakou Dahao Heshan New Agricultural Development Co., Ltd, Zhangjiakou, Hebei, China
| | - Ao Qiu
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Fei Luo
- Hebei Province Animal Husbandry and Improved Breeds Work Station, Shijiazhuang, Hebei, China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China.
| |
Collapse
|
28
|
Machine Learning-Based Radiomics for Prediction of Epidermal Growth Factor Receptor Mutations in Lung Adenocarcinoma. DISEASE MARKERS 2022; 2022:2056837. [PMID: 35578691 PMCID: PMC9107363 DOI: 10.1155/2022/2056837] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Revised: 04/13/2022] [Accepted: 04/23/2022] [Indexed: 12/20/2022]
Abstract
Identifying an epidermal growth factor receptor (EGFR) mutation is important because EGFR tyrosine kinase inhibitors are the first-line treatment of choice for patients with EGFR mutation-positive lung adenocarcinomas (LUAC). This study is aimed at developing and validating a radiomics-based machine learning (ML) approach to identify EGFR mutations in patients with LUAC. We retrospectively collected data from 201 patients with positive EGFR mutation LUAC (140 in the training cohort and 61 in the validation cohort). We extracted 1316 radiomics features from preprocessed CT images and selected 14 radiomics features and 1 clinical feature which were most relevant to mutations through filter method. Subsequently, we built models using 7 ML approaches and established the receiver operating characteristic (ROC) curve to assess the discriminating performance of these models. In terms of predicting EGFR mutation, the model derived from radiomics features and combined models (radiomics features and relevant clinical factors) had an AUC of 0.79 (95% confidence interval (CI): 0.77-0.82), 0.86 (0.87-0.88), respectively. Our study offers a radiomics-based ML model using filter methods to detect the EGFR mutation in patients with LUAC. This convenient and low-cost method may be of help to noninvasively identify patients before obtaining tumor sample for molecule testing.
Collapse
|
29
|
Xu Z, York LM, Seethepalli A, Bucciarelli B, Cheng H, Samac DA. Objective Phenotyping of Root System Architecture Using Image Augmentation and Machine Learning in Alfalfa (Medicago sativa L.). PLANT PHENOMICS (WASHINGTON, D.C.) 2022; 2022:9879610. [PMID: 35479182 PMCID: PMC9012978 DOI: 10.34133/2022/9879610] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 03/03/2022] [Indexed: 12/28/2022]
Abstract
Active breeding programs specifically for root system architecture (RSA) phenotypes remain rare; however, breeding for branch and taproot types in the perennial crop alfalfa is ongoing. Phenotyping in this and other crops for active RSA breeding has mostly used visual scoring of specific traits or subjective classification into different root types. While image-based methods have been developed, translation to applied breeding is limited. This research is aimed at developing and comparing image-based RSA phenotyping methods using machine and deep learning algorithms for objective classification of 617 root images from mature alfalfa plants collected from the field to support the ongoing breeding efforts. Our results show that unsupervised machine learning tends to incorrectly classify roots into a normal distribution with most lines predicted as the intermediate root type. Encouragingly, random forest and TensorFlow-based neural networks can classify the root types into branch-type, taproot-type, and an intermediate taproot-branch type with 86% accuracy. With image augmentation, the prediction accuracy was improved to 97%. Coupling the predicted root type with its prediction probability will give breeders a confidence level for better decisions to advance the best and exclude the worst lines from their breeding program. This machine and deep learning approach enables accurate classification of the RSA phenotypes for genomic breeding of climate-resilient alfalfa.
Collapse
Affiliation(s)
- Zhanyou Xu
- USDA-ARS, Plant Science Research Unit, 1991 Upper Buford Circle, St. Paul, MN 55108, USA
| | - Larry M. York
- Biosciences Division and Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | | | - Bruna Bucciarelli
- Department of Agronomy and Plant Genetics, University of Minnesota, 1991 Upper Buford Circle, St. Paul, MN 55108, USA
| | - Hao Cheng
- Department of Animal Science, University of California, 2251 Meyer Hall, One Shields Ave., Davis, CA 95616, USA
| | - Deborah A. Samac
- USDA-ARS, Plant Science Research Unit, 1991 Upper Buford Circle, St. Paul, MN 55108, USA
| |
Collapse
|
30
|
Graph Based Feature Selection for Reduction of Dimensionality in Next-Generation RNA Sequencing Datasets. ALGORITHMS 2022. [DOI: 10.3390/a15010021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Analysis of high-dimensional data, with more features (p) than observations (N) (p>N), places significant demand in cost and memory computational usage attributes. Feature selection can be used to reduce the dimensionality of the data. We used a graph-based approach, principal component analysis (PCA) and recursive feature elimination to select features for classification from RNAseq datasets from two lung cancer datasets. The selected features were discretized for association rule mining where support and lift were used to generate informative rules. Our results show that the graph-based feature selection improved the performance of sequential minimal optimization (SMO) and multilayer perceptron classifiers (MLP) in both datasets. In association rule mining, features selected using the graph-based approach outperformed the other two feature-selection techniques at a support of 0.5 and lift of 2. The non-redundant rules reflect the inherent relationships between features. Biological features are usually related to functions in living systems, a relationship that cannot be deduced by feature selection and classification alone. Therefore, the graph-based feature-selection approach combined with rule mining is a suitable way of selecting and finding associations between features in high-dimensional RNAseq data.
Collapse
|