1
|
Energy balance of dairy cows predicted by mid-infrared spectra data of milk using Bayesian approaches. J Dairy Sci 2024; 107:1561-1576. [PMID: 37806624 DOI: 10.3168/jds.2023-23772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 09/18/2023] [Indexed: 10/10/2023]
Abstract
Information on dry matter intake (DMI) and energy balance (EB) at the animal and herd level is important for management and breeding decisions. However, routine recording of these traits at commercial farms can be challenging and costly. Fourier-transform mid-infrared (FT-MIR) spectroscopy is a noninvasive technique applicable to a large cohort of animals that is routinely used to analyze milk components and is convenient for predicting complex phenotypes that are typically difficult and expensive to obtain on a large scale. We aimed to develop prediction models for EB and use the predicted phenotypes for genetic analysis. First, we assessed prediction equations using 4,485 phenotypic records from 167 Holstein cows from an experimental station. The phenotypes available were body weight (BW), milk yield (MY) and milk components, weekly-averaged DMI, and FT-MIR data from all milk samples available. We implemented mixed models with Bayesian approaches and assessed them through 50 randomized replicates of a 5-fold cross-validation. Second, we used the best prediction models to obtain predicted phenotypes of EB (EBp) and DMI (DMIp) on 5 commercial farms with 2,365 phenotypic records of MY, milk components and FT-MIR data, and BW from 1,441 Holstein cows. Third, we performed a GWAS and estimated heritability and genetic correlations for energy content in milk (EnM), BW, DMIp, and EBp using the genomic information available on the cows from commercial farms. The highest correlation between the predicted and observed phenotype (ry,y^) was obtained with DMI (0.88) and EB (0.86), while predicting BW was, as anticipated, more challenging (0.69). In our study, models that included FT-MIR information performed better than models without spectra information in the 3 traits analyzed, with increments in prediction correlation ranging from 5% to 10%. For the predicted phenotypes calculated by the prediction equations and data from the commercial farms the heritability ranged between 0.11 and 0.16 for EnM, DMIp and EBp, and 0.42 for BW. The genetic correlation between EnM and BW was -0.17, with DMIp was 0.40 and with EBp was -0.39. From the GWAS, we detected one significant QTL region for EnM, and 3 for BW, but none for DMIp and EBp. The results obtained in our study support previous evidence that FT-MIR information from milk samples contribute to improve the prediction equations for DMI, BW, and EB, and these predicted phenotypes may be used for herd management and contribute to the breeding strategy for improving cow performance.
Collapse
|
2
|
Predicting methane emissions of individual grazing dairy cows from spectral analyses of their milk samples. J Dairy Sci 2024; 107:978-991. [PMID: 37709036 DOI: 10.3168/jds.2023-23577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 08/30/2023] [Indexed: 09/16/2023]
Abstract
Data on the enteric methane emissions of individual cows are useful not just in assisting management decisions and calculating herd inventories but also as inputs for animal genetic evaluations. Data generation for many animal characteristics, including enteric methane emissions, can be expensive and time consuming, so being able to extract as much information as possible from available samples or data sources is worthy of investigation. The objective of the present study was to attempt to predict individual cow methane emissions from the information contained within milk samples, specifically the spectrum of light transmittance across different wavelengths of the mid-infrared (MIR) region of the electromagnetic spectrum. A total of 93,888 individual spot measures of methane (i.e., individual samples of an animal's breath when using the GreenFeed technology) from 384 lactations on 277 grazing dairy cows were collapsed into weekly averages expressed as grams per day; each weekly average coincided with a MIR spectral analysis of a morning or evening individual cow milk sample. Associations between the spectra and enteric methane measures were performed separately using partial least squares regression or neural networks with different tuning parameters evaluated. Several alternative definitions of the enteric methane phenotype (i.e., average enteric methane in the 6 d preceding or 6 d following taking the milk sample or the average of the 6 d before and after the milk sample, all of which also included the enteric methane emitted on the day of milk sampling), the candidate model features (e.g., milk yield, milk composition, and milk MIR) as well as validation strategy (i.e., cross-validation or leave-one-experimental treatment-out) were evaluated. Irrespective of the validation method, the prediction accuracy was best when the average of the milk MIR from the morning and evening milk sample was used and the prediction model was developed using neural networks; concurrently including milk yield and days in milk in the prediction model generated superior predictions relative to just the spectral information alone. Furthermore, prediction accuracy was best when the enteric methane phenotype was the average of at least 20 methane spot measures across a 6-d period flanking each side of the milk sample with associated spectral data. Based on the strategy that achieved the best accuracy of prediction, the correlation between the actual and predicted daily methane emissions when based on 4-fold cross-validation varied per validation stratum from 0.68 to 0.75; the corresponding range when validated on each of the 8 different experimental treatments focusing on alternative pasture grazing systems represented in the dataset varied from 0.55 to 0.71. The root mean square error of prediction across the 4-folds of cross-validation was 37.46 g/d, whereas the root mean square error averaged across all folds of leave-one-treatment-out was 37.50 g/d. Results suggest that even with the likely measurement errors contained within the MIR spectrum and gold standard enteric methane phenotype, enteric methane can be reasonably well predicted from the infrared spectrum of milk samples. What is yet to be established, however, is whether (a) genetic variation exists in this predicted enteric methane phenotype and (b) selection on estimates of genetic merit for this phenotype translate to actual phenotypic differences in enteric methane emissions.
Collapse
|
3
|
Quality Characterization of Fava Bean-Fortified Bread Using Hyperspectral Imaging. Foods 2024; 13:231. [PMID: 38254532 PMCID: PMC10814855 DOI: 10.3390/foods13020231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/04/2024] [Accepted: 01/08/2024] [Indexed: 01/24/2024] Open
Abstract
As the demand for alternative protein sources and nutritional improvement in baked goods grows, integrating legume-based ingredients, such as fava beans, into wheat flour presents an innovative alternative. This study investigates the potential of hyperspectral imaging (HSI) to predict the protein content (short-wave infrared (SWIR) range)) of fava bean-fortified bread and classify them based on their color characteristics (visible-near-infrared (Vis-NIR) range). Different multivariate analysis tools, such as principal component analysis (PCA), partial least square discriminant analysis (PLS-DA), and partial least square regression (PLSR), were utilized to assess the protein distribution and color quality parameters of bread samples. The result of the PLS-DA in the SWIR range yielded a classification accuracy of ˃99%, successfully classifying the samples based on their protein contents (low protein and high protein). The PLSR model showed an RMSEC of 0.086% and an RMSECV of 0.094%. Also, the external validation resulted in an RMSEP of 0.064%. The PLSR model possessed the capability to efficiently predict the protein content of the bread samples. The results suggest that HSI can be successfully used to classify bread samples based on their protein content and for the prediction of protein composition. Hyperspectral imaging can therefore be reliably implemented for the quality monitoring of baked goods in commercial bakeries.
Collapse
|
4
|
Possible Alternatives: Identifying and Quantifying Adulteration in Buffalo, Goat, and Camel Milk Using Mid-Infrared Spectroscopy Combined with Modern Statistical Machine Learning Methods. Foods 2023; 12:3856. [PMID: 37893749 PMCID: PMC10606090 DOI: 10.3390/foods12203856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 10/11/2023] [Accepted: 10/13/2023] [Indexed: 10/29/2023] Open
Abstract
Adulteration of higher priced milks with cheaper ones to obtain extra profit can adversely affect consumer health and the market. In this study, pure buffalo milk (BM), goat milk (GM), camel milk (CM), and their mixtures with 5-50% (vol/vol) cow milk or water were used. Mid-infrared spectroscopy (MIRS) combined with modern statistical machine learning was used for the discrimination and quantification of cow milk or water adulteration in BM, GM, and CM. Compared to partial least squares (PLS), modern statistical machine learning-especially support vector machines (SVM), projection pursuit regression (PPR), and Bayesian regularized neural networks (BRNN)-exhibited superior performance for the detection of adulteration. The best prediction models for the different predictive traits are as follows: The binary classification models developed by SVM resulted in differentiation of CM-cow milk, and GM/CM-water mixtures. PLS resulted in differentiation of BM/GM-cow milk and BM-water mixtures. All of the above models have 100% classification accuracy. SVM was used to develop multi-classification models for identifying the high and low proportions of cow milk in BM, GM, and CM, as well as the high and low proportions of water adulteration in BM and GM, with correct classification rates of 94%, 100%, 100%, 99%, and 100%, respectively. In addition, a PLS-based model was developed for identifying the high and low proportions of water adulteration in CM, with correct classification rates of 100%. A regression model for quantifying cow milk in BM was developed using PCA + BRNN, with RMSEV = 5.42%, and RV2 = 0.88. A regression model for quantifying water adulteration in BM was developed using PCA + PPR, with RMSEV = 1.70%, and RV2 = 0.99. Modern statistical machine learning improved the accuracy of MIRS in predicting BM, GM, and CM adulteration more effectively than PLS.
Collapse
|
5
|
Estimation of body condition score change in dairy cows in a seasonal calving pasture-based system using routinely available milk mid-infrared spectra and machine learning techniques. J Dairy Sci 2023; 106:4232-4244. [PMID: 37105880 DOI: 10.3168/jds.2022-22394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 12/22/2022] [Indexed: 04/29/2023]
Abstract
Body condition score (BCS) is a subjective estimate of body reserves in cows. Body condition score and its change in early lactation have been associated with cow fertility and health. The aim of the present study was to estimate change in BCS (ΔBCS) using mid-infrared spectra of the milk, with a particular focus on estimating ΔBCS in cows losing BCS at the fastest rate (i.e., the cows most of interest to the producer). A total of 73,193 BCS records (scale 1 to 5) from 6,572 cows were recorded. Daily BCS was interpolated from cubic splines fitted through the BCS records, and subsequently used to calculate daily ΔBCS. Body condition score change records were merged with milk mid-infrared spectra recorded on the same week. Both morning (a.m.) and evening (p.m.) spectra were available. Two different statistical methods were used to estimate ΔBCS: partial least squares regression and a neural network (NN). Several combinations of variables were included as model features, such as days in milk (DIM) only, a.m. spectra only and DIM, p.m. spectra only and DIM, and a.m. and p.m. spectra as well as DIM. The data used to estimate ΔBCS were either based on the first 120 DIM or all 305 DIM. Daily ΔBCS had a standard deviation of 1.65 × 10-3 BCS units in the 305 DIM data set and of 1.98 × 10-3 BCS units in the 120 DIM data set. Each data set was divided into 4 sub-data sets, 3 of which were used for training the prediction model and the fourth to test it. This process was repeated until all the sub-data sets were considered as the test data set once. Using all 305 DIM, the lowest root mean square error of validation (RMSEV; 0.96 × 10-3 BCS units) and the strongest correlation between actual and estimated ΔBCS (0.82) was achieved with NN using a.m. and p.m. spectra and DIM. Using the 120 DIM data, the lowest RMSEV (0.98 × 10-3 BCS units) and the strongest correlation between actual and estimated ΔBCS (0.87) was achieved with NN using DIM and either a.m. spectra only or a.m. and p.m. spectra together. The RMSEV for records in the lowest 2.5% ΔBCS percentile per DIM in early lactation was reduced up to a maximum of 13% when spectra and DIM were both considered in the model compared with a model that considered just DIM. The performance of the NN using DIM and a.m. spectra only with the 120 DIM data was robust across different strata of farm, parity, year of sampling, and breed. Results from the present study demonstrate the ability of mid-infrared spectra of milk coupled with machine learning techniques to estimate ΔBCS; specifically, the inclusion of spectral data reduced the RMSEV over and above using DIM alone, particularly for cows losing BCS at the fastest rate. This approach can be used to routinely generate estimates of ΔBCS that can subsequently be used for farm decisions.
Collapse
|
6
|
Parsimonious Bayesian factor analysis for modelling latent structures in spectroscopy data. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
Comparison of the genetic characteristics of directly measured and Fourier-transform mid-infrared-predicted bovine milk fatty acids and proteins. J Dairy Sci 2022; 105:9763-9791. [DOI: 10.3168/jds.2022-22089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 07/21/2022] [Indexed: 11/17/2022]
|
8
|
Application of infrared spectroscopic techniques to cheese authentication: A review. INT J DAIRY TECHNOL 2022. [DOI: 10.1111/1471-0307.12859] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
9
|
Real-time milk analysis integrated with stacking ensemble learning as a tool for the daily prediction of cheese-making traits in Holstein cattle. J Dairy Sci 2022; 105:4237-4255. [DOI: 10.3168/jds.2021-21426] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 01/10/2022] [Indexed: 01/12/2023]
|
10
|
The use of milk Fourier-transform mid-infrared spectroscopy to diagnose pregnancy and determine spectral regional associations with pregnancy in US dairy cows. J Dairy Sci 2022; 105:3209-3221. [DOI: 10.3168/jds.2021-21079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 12/21/2021] [Indexed: 11/19/2022]
|
11
|
Application of Optical Quality Control Technologies in the Dairy Industry: An Overview. PHOTONICS 2021. [DOI: 10.3390/photonics8120551] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Sustainable development of the agricultural industry, in particular, the production of milk and feed for farm animals, requires accurate, fast, and non-invasive diagnostic tools. Currently, there is a rapid development of a number of analytical methods and approaches that meet these requirements. Infrared spectrometry in the near and mid-IR range is especially widespread. Progress has been made not only in the physical methods of carrying out measurements, but significant advances have also been achieved in the development of mathematical processing of the received signals. This review is devoted to the comparison of modern methods and devices used to control the quality of milk and feed for farm animals.
Collapse
|
12
|
Application of machine-learning methods to milk mid-infrared spectra for discrimination of cow milk from pasture or total mixed ration diets. J Dairy Sci 2021; 104:12394-12402. [PMID: 34593222 DOI: 10.3168/jds.2021-20812] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 08/10/2021] [Indexed: 11/19/2022]
Abstract
The prevalence of "grass-fed" labeled food products on the market has increased in recent years, often commanding a premium price. To date, the majority of methods used for the authentication of grass-fed source products are driven by auditing and inspection of farm records. As such, the ability to verify grass-fed source claims to ensure consumer confidence will be important in the future. Mid-infrared (MIR) spectroscopy is widely used in the dairy industry as a rapid method for the routine monitoring of individual herd milk composition and quality. Further harnessing the data from individual spectra offers a promising and readily implementable strategy to authenticate the milk source at both farm and processor levels. Herein, a comprehensive comparison of the robustness, specificity, and accuracy of 11 machine-learning statistical analysis methods were tested for the discrimination of grass-fed versus non-grass-fed milks based on the MIR spectra of 4,320 milk samples collected from cows on pasture or indoor total mixed ration-based feeding systems over a 3-yr period. Linear discriminant analysis and partial least squares discriminant analysis (PLS-DA) were demonstrated to offer the greatest level of accuracy for the prediction of cow diet from MIR spectra. Parsimonious strategies for the selection of the most discriminating wavelengths within the spectra are also highlighted.
Collapse
|
13
|
Sequence-based genome-wide association study of individual milk mid-infrared wavenumbers in mixed-breed dairy cattle. Genet Sel Evol 2021; 53:62. [PMID: 34284721 PMCID: PMC8290608 DOI: 10.1186/s12711-021-00648-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 06/22/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Fourier-transform mid-infrared (FT-MIR) spectroscopy provides a high-throughput and inexpensive method for predicting milk composition and other novel traits from milk samples. While there have been many genome-wide association studies (GWAS) conducted on FT-MIR predicted traits, there have been few GWAS for individual FT-MIR wavenumbers. Using imputed whole-genome sequence for 38,085 mixed-breed New Zealand dairy cattle, we conducted GWAS on 895 individual FT-MIR wavenumber phenotypes, and assessed the value of these direct phenotypes for identifying candidate causal genes and variants, and improving our understanding of the physico-chemical properties of milk. RESULTS Separate GWAS conducted for each of 895 individual FT-MIR wavenumber phenotypes, identified 450 1-Mbp genomic regions with significant FT-MIR wavenumber QTL, compared to 246 1-Mbp genomic regions with QTL identified for FT-MIR predicted milk composition traits. Use of mammary RNA-seq data and gene annotation information identified 38 co-localized and co-segregating expression QTL (eQTL), and 31 protein-sequence mutations for FT-MIR wavenumber phenotypes, the latter including a null mutation in the ABO gene that has a potential role in changing milk oligosaccharide profiles. For the candidate causative genes implicated in these analyses, we examined the strength of association between relevant loci and each wavenumber across the mid-infrared spectrum. This revealed shared association patterns for groups of genomically-distant loci, highlighting clusters of loci linked through their biological roles in lactation and their presumed impacts on the chemical composition of milk. CONCLUSIONS This study demonstrates the utility of FT-MIR wavenumber phenotypes for improving our understanding of milk composition, presenting a larger number of QTL and putative causative genes and variants than found from FT-MIR predicted composition traits. Examining patterns of significance across the mid-infrared spectrum for loci of interest further highlighted commonalities of association, which likely reflects the physico-chemical properties of milk constituents.
Collapse
|
14
|
Prediction of fatty acid composition using milk spectral data and its associations with various mid-infrared spectral regions in Michigan Holsteins. J Dairy Sci 2021; 104:11242-11258. [PMID: 34275636 DOI: 10.3168/jds.2021-20267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 05/28/2021] [Indexed: 11/19/2022]
Abstract
Fatty acid composition in milk is not only reflective of nutritional quality but also potentially predictive of other attributes (e. g. including the cow's energy balance and its relative output of methane emissions). Furthermore, a higher ratio of long-chain to short-chain fatty acids or mean carbon number has been associated with negative energy balance in dairy cows, whereas enhanced nutritional properties have been generally associated with higher levels of unsaturation. We set out to directly compare Bayesian regression strategies with partial least squares for the prediction of various milk fatty acids using Fourier-transform infrared spectrum data on 777 milk samples taken from 579 cows on 4 Michigan dairy herds between 5 and 90 d in milk. We also set out to identify those spectral regions that might be associated with fatty acids and whether carbon number or level of unsaturation might contribute to the strength of these associations. These associations were based on adaptively clustered windows of wavenumbers to mitigate the distorting effects of severe multicollinearity on marginal associations involving individual wavenumbers. In general, Bayesian regression methods, particularly the variable selection method BayesB, outperformed partial least squares regression for cross-validation prediction accuracy for both individual fatty acids and fatty acid groups. Strong signals for wavenumber associations using BayesB were well distributed throughout the mid-infrared spectrum, particularly between 910 and 3,998 cm-1. Carbon number appeared to be linearly related to strength of wavenumber associations for 38 moderately to highly predicted fatty acids within the spectral regions of 2,286 to 2,376 and 2,984 to 3,100 cm-1, whereas nonlinear associations were determined within 1,141 to 1,205; 1,570 to 1,630; and 1,727 to 1,768 cm-1. However, no such associations were detected with level of unsaturation. Spectral regions where there were significant relationships between strength of association and carbon number may be useful targets for inferring the relative proportion of long-chain to short-chain fatty acids, and hence energy balance.
Collapse
|
15
|
Predicting cow milk quality traits from routinely available milk spectra using statistical machine learning methods. J Dairy Sci 2021; 104:7438-7447. [PMID: 33865578 DOI: 10.3168/jds.2020-19576] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 03/09/2021] [Indexed: 11/19/2022]
Abstract
Numerous statistical machine learning methods suitable for application to highly correlated features, as those that exist for spectral data, could potentially improve prediction performance over the commonly used partial least squares approach. Milk samples from 622 individual cows with known detailed protein composition and technological trait data accompanied by mid-infrared spectra were available to assess the predictive ability of different regression and classification algorithms. The regression-based approaches were partial least squares regression (PLSR), ridge regression (RR), least absolute shrinkage and selection operator (LASSO), elastic net, principal component regression, projection pursuit regression, spike and slab regression, random forests, boosting decision trees, neural networks (NN), and a post-hoc approach of model averaging (MA). Several classification methods (i.e., partial least squares discriminant analysis (PLSDA), random forests, boosting decision trees, and support vector machines (SVM)) were also used after stratifying the traits of interest into categories. In the regression analyses, MA was the best prediction method for 6 of the 14 traits investigated [curd firmness at 60 min, αS1-casein (CN), αS2-CN, κ-CN, α-lactalbumin, and β-lactoglobulin B], whereas NN and RR were the best algorithms for 3 traits each (rennet coagulation time, curd-firming time, and heat stability, and curd firmness at 30 min, β-CN, and β-lactoglobulin A, respectively), PLSR was best for pH, and LASSO was best for CN micelle size. When traits were divided into 2 classes, SVM had the greatest accuracy for the majority of the traits investigated. Although the well-established PLSR-based method performed competitively, the application of statistical machine learning methods for regression analyses reduced the root mean square error compared with PLSR from between 0.18% (κ-CN) to 3.67% (heat stability). The use of modern statistical machine learning methods for trait prediction from mid-infrared spectroscopy may improve the prediction accuracy for some traits.
Collapse
|
16
|
Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data. J Dairy Sci 2021; 104:8107-8121. [PMID: 33865589 DOI: 10.3168/jds.2020-19861] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 03/05/2021] [Indexed: 12/11/2022]
Abstract
Fourier-transform infrared (FTIR) spectroscopy is a powerful high-throughput phenotyping tool for predicting traits that are expensive and difficult to measure in dairy cattle. Calibration equations are often developed using standard methods, such as partial least squares (PLS) regression. Methods that employ penalization, rank-reduction, and variable selection, as well as being able to model the nonlinear relations between phenotype and FTIR, might offer improvements in predictive ability and model robustness. This study aimed to compare the predictive ability of 2 machine learning methods, namely random forest (RF) and gradient boosting machine (GBM), and penalized regression against PLS regression for predicting 3 phenotypes differing in terms of biological meaning and relationships with milk composition (i.e., phenotypes measurable directly and not directly in milk, reflecting different biological processes which can be captured using milk spectra) in Holstein-Friesian cattle under 2 cross-validation scenarios. The data set comprised phenotypic information from 471 Holstein-Friesian cows, and 3 target phenotypes were evaluated: (1) body condition score (BCS), (2) blood β-hydroxybutyrate (BHB, mmol/L), and (3) κ-casein expressed as a percentage of nitrogen (κ-CN, % N). The data set was split considering 2 cross-validation scenarios: samples-out random in which the population was randomly split into 10-folds (8-folds for training and 1-fold for validation and testing); and herd/date-out in which the population was randomly assigned to training (70% herd), validation (10%), and testing (20% herd) based on the herd and date in which the samples were collected. The random grid search was performed using the training subset for the hyperparameter optimization and the validation set was used for the generalization of prediction error. The trained model was then used to assess the final prediction in the testing subset. The grid search for penalized regression evidenced that the elastic net (EN) was the best regularization with increase in predictive ability of 5%. The performance of PLS (standard model) was compared against 2 machine learning techniques and penalized regression using 2 cross-validation scenarios. Machine learning methods showed a greater predictive ability for BCS (0.63 for GBM and 0.61 for RF), BHB (0.80 for GBM and 0.79 for RF), and κ-CN (0.81 for GBM and 0.80 for RF) in samples-out cross-validation. Considering a herd/date-out cross-validation these values were 0.58 (GBM and RF) for BCS, 0.73 (GBM and RF) for BHB, and 0.77 (GBM and RF) for κ-CN. The GBM model tended to outperform other methods in predictive ability around 4%, 1%, and 7% for EN, RF, and PLS, respectively. The prediction accuracies of the GBM and RF models were similar, and differed statistically from the PLS model in samples-out random cross-validation. Although, machine learning techniques outperformed PLS in herd/date-out cross-validation, no significant differences were observed in terms of predictive ability due to the large standard deviation observed for predictions. Overall, GBM achieved the highest accuracy of FTIR-based prediction of the different phenotypic traits across the cross-validation scenarios. These results indicate that GBM is a promising method for obtaining more accurate FTIR-based predictions for different phenotypes in dairy cattle.
Collapse
|
17
|
Integrating genomic and infrared spectral data improves the prediction of milk protein composition in dairy cattle. Genet Sel Evol 2021; 53:29. [PMID: 33726672 PMCID: PMC7968271 DOI: 10.1186/s12711-021-00620-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Accepted: 03/01/2021] [Indexed: 11/20/2022] Open
Abstract
Background Over the past decade, Fourier transform infrared (FTIR) spectroscopy has been used to predict novel milk protein phenotypes. Genomic data might help predict these phenotypes when integrated with milk FTIR spectra. The objective of this study was to investigate prediction accuracy for milk protein phenotypes when heterogeneous on-farm, genomic, and pedigree data were integrated with the spectra. To this end, we used the records of 966 Italian Brown Swiss cows with milk FTIR spectra, on-farm information, medium-density genetic markers, and pedigree data. True and total whey protein, and five casein, and two whey protein traits were analyzed. Multiple kernel learning constructed from spectral and genomic (pedigree) relationship matrices and multilayer BayesB assigning separate priors for FTIR and markers were benchmarked against a baseline partial least squares (PLS) regression. Seven combinations of covariates were considered, and their predictive abilities were evaluated by repeated random sub-sampling and herd cross-validations (CV). Results Addition of the on-farm effects such as herd, days in milk, and parity to spectral data improved predictions as compared to those obtained using the spectra alone. Integrating genomics and/or the top three markers with a large effect further enhanced the predictions. Pedigree data also improved prediction, but to a lesser extent than genomic data. Multiple kernel learning and multilayer BayesB increased predictive performance, whereas PLS did not. Overall, multilayer BayesB provided better predictions than multiple kernel learning, and lower prediction performance was observed in herd CV compared to repeated random sub-sampling CV. Conclusions Integration of genomic information with milk FTIR spectral can enhance milk protein trait predictions by 25% and 7% on average for repeated random sub-sampling and herd CV, respectively. Multiple kernel learning and multilayer BayesB outperformed PLS when used to integrate heterogeneous data for phenotypic predictions.
Collapse
|
18
|
A new integrated data mining model to map spatial variation in the susceptibility of land to act as a source of aeolian dust. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2020; 27:42022-42039. [PMID: 32700281 DOI: 10.1007/s11356-020-10168-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Accepted: 07/16/2020] [Indexed: 06/11/2023]
Abstract
This research developed a more efficient integrated model (IM) based on combining the Nash-Sutcliffe efficiency coefficient (NSEC) and individual data mining (DM) algorithms for the spatial mapping of dust provenance in the Hamoun-e-Hirmand Basin, southeastern Iran. This region experiences severe wind erosion and includes the Sistan plain which is one of the most PM2.5-polluted regions in the world. Due to a prolonged drought over the last two decades, the frequency of dust storms in the study area is increasing remarkably. Herein, 14 factors controlling dust emissions (FCDEs) including soil characteristics, climatic variables, digital elevation map, normalized difference vegetation index, land use and geology were mapped. Correlation and collinearity among the FCDEs were examined by the Pearson test, tolerance coefficient (TC) and variance inflation factor (VIF), with the results suggesting a lack of collinearity between FCDEs. A tree-based genetic algorithm was applied to prioritize and quantify the importance weights of the FCDEs. Thirteen individual data mining models were applied for mapping dust provenance. The model performance was assessed using root mean square error, mean absolute error and NSEC. Based on clustering analysis, the 13 DM models were grouped into five clusters and then the cluster with the highest NSEC values used in an integrated modelling process. Based on the results, the IM (NSEC = 93%) outperformed the individual DM models (the NSEC values range between 51 and 92%). Using the IM, 11, 5, 7 and 77% of the total study area were classified into low, moderate, high and very high susceptibility classes for dust provenance, respectively. Overall, the results illustrate the benefits of an IM for mapping spatial variation in the susceptibility of catchment areas to act as dust sources.
Collapse
|
19
|
Infrared Spectrometry as a High-Throughput Phenotyping Technology to Predict Complex Traits in Livestock Systems. Front Genet 2020; 11:923. [PMID: 32973876 PMCID: PMC7468402 DOI: 10.3389/fgene.2020.00923] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 07/24/2020] [Indexed: 12/17/2022] Open
Abstract
High-throughput phenotyping technologies are growing in importance in livestock systems due to their ability to generate real-time, non-invasive, and accurate animal-level information. Collecting such individual-level information can generate novel traits and potentially improve animal selection and management decisions in livestock operations. One of the most relevant tools used in the dairy and beef industry to predict complex traits is infrared spectrometry, which is based on the analysis of the interaction between electromagnetic radiation and matter. The infrared electromagnetic radiation spans an enormous range of wavelengths and frequencies known as the electromagnetic spectrum. The spectrum is divided into different regions, with near- and mid-infrared regions being the main spectral regions used in livestock applications. The advantage of using infrared spectrometry includes speed, non-destructive measurement, and great potential for on-line analysis. This paper aims to review the use of mid- and near-infrared spectrometry techniques as tools to predict complex dairy and beef phenotypes, such as milk composition, feed efficiency, methane emission, fertility, energy balance, health status, and meat quality traits. Although several research studies have used these technologies to predict a wide range of phenotypes, most of them are based on Partial Least Squares (PLS) and did not considered other machine learning (ML) techniques to improve prediction quality. Therefore, we will discuss the role of analytical methods employed on spectral data to improve the predictive ability for complex traits in livestock operations. Furthermore, we will discuss different approaches to reduce data dimensionality and the impact of validation strategies on predictive quality.
Collapse
|
20
|
The evolving role of Fourier-transform mid-infrared spectroscopy in genetic improvement of dairy cattle. J Anim Sci Biotechnol 2020; 11:39. [PMID: 32322393 PMCID: PMC7164258 DOI: 10.1186/s40104-020-00445-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 03/09/2020] [Indexed: 11/22/2022] Open
Abstract
Over the last 100 years, significant advances have been made in the characterisation of milk composition for dairy cattle improvement programs. Technological progress has enabled a shift from labour intensive, on-farm collection and processing of samples that assess yield and fat levels in milk, to large-scale processing of samples through centralised laboratories, with the scope extended to include quantification of other traits. Fourier-transform mid-infrared (FT-MIR) spectroscopy has had a significant role in the transformation of milk composition phenotyping, with spectral-based predictions of major milk components already being widely used in milk payment and animal evaluation systems globally. Increasingly, there is interest in analysing the individual FT-MIR wavenumbers, and in utilising the FT-MIR data to predict other novel traits of importance to breeding programs. This includes traits related to the nutritional value of milk, the processability of milk into products such as cheese, and traits relevant to animal health and the environment. The ability to successfully incorporate these traits into breeding programs is dependent on the heritability of the FT-MIR predicted traits, and the genetic correlations between the FT-MIR predicted and actual trait values. Linking FT-MIR predicted traits to the underlying mutations responsible for their variation can be difficult because the phenotypic expression of these traits are a function of a diverse range of molecular and biological mechanisms that can obscure their genetic basis. The individual FT-MIR wavenumbers give insights into the chemical composition of milk and provide an additional layer of granularity that may assist with establishing causal links between the genome and observed phenotypes. Additionally, there are other molecular phenotypes such as those related to the metabolome, chromatin accessibility, and RNA editing that could improve our understanding of the underlying biological systems controlling traits of interest. Here we review topics of importance to phenotyping and genetic applications of FT-MIR spectra datasets, and discuss opportunities for consolidating FT-MIR datasets with other genomic and molecular data sources to improve future dairy cattle breeding programs.
Collapse
|
21
|
Prediction of dry-cured ham weight loss and prospects of use in a pig breeding program. Animal 2020; 14:1128-1138. [DOI: 10.1017/s1751731120000026] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
22
|
Prediction of Milk Coagulation Properties and Individual Cheese Yield in Sheep Using Partial Least Squares Regression. Animals (Basel) 2019; 9:ani9090663. [PMID: 31500237 PMCID: PMC6770130 DOI: 10.3390/ani9090663] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 09/02/2019] [Accepted: 09/05/2019] [Indexed: 12/31/2022] Open
Abstract
Simple Summary Considered that all sheep milk in Italy is destined for cheese processing, traits describing rennet coagulation aptitude should be among the most important selection goals for dairy breeds. To reduce the costs and logistics related to the large-scale recording of these traits, mid-infrared (MIR) spectroscopy could be conveniently used to generate reliable predictions without any additional cost. The aims of this research were to predict the milk coagulation properties (MCP) and individual cheese yield (ILCY) in sheep by MIR spectrometry using partial least squares regression (PLS), and to compare different data pre-treatment procedures. The prediction results observed in the present study, although moderate, suggest the possibility of adding novel phenotypes (e.g., MCP and ILCY) in breeding schemes for dairy sheep breeds. Mid-infrared spectroscopy coupled with PLS regression could allow the prediction of phenotypes at the population level without additional costs. Abstract The objectives of this study were (i) the prediction of sheep milk coagulation properties (MCP) and individual laboratory cheese yield (ILCY) from mid-infrared (MIR) spectra by using partial least squares (PLS) regression, and (ii) the comparison of different data pre-treatments on prediction accuracy. Individual milk samples of 970 Sarda breed ewes were analyzed for rennet coagulation time (RCT), curd-firming time (k20), and curd firmness (a30) using the Formagraph instrument; ILCY was measured by micro-manufacturing assays. An Furier-transform Infrared (FTIR) milk-analyzer was used for the estimation of the milk gross composition and the recording of MIR spectrum. The dataset (n = 859, after the exclusion of 111 noncoagulating samples) was divided into two sub-datasets: the data of 700 ewes were used to estimate prediction model parameters, and the data of 159 ewes were used to validate the model. Four prediction scenarios were compared in the validation, differing for the use of whole or reduced MIR spectrum and the use of raw or corrected data (locally weighted scatterplot smoothing). PLS prediction statistics were moderate. The use of the reduced MIR spectrum yielded the best results for the considered traits, whereas the data correction improved the prediction ability only when the whole MIR spectrum was used. In conclusion, PLS achieves good accuracy of prediction, in particular for ILCY and RCT, and it may enable increasing the number of traits to be included in breeding programs for dairy sheep without additional costs and logistics.
Collapse
|
23
|
Comparison of Bayesian and partial least squares regression methods for mid-infrared prediction of cheese-making properties in Montbéliarde cows. J Dairy Sci 2019; 102:6943-6958. [PMID: 31178172 DOI: 10.3168/jds.2019-16320] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 04/23/2019] [Indexed: 01/17/2023]
Abstract
Assessing the cheese-making properties (CMP) of milks with a rapid and cost-effective method is of particular interest for the Protected Designation of Origin cheese sector. The aims of this study were to evaluate the potential of mid-infrared (MIR) spectra to estimate coagulation and acidification properties, as well as curd yield (CY) traits of Montbéliarde cow milk. Samples from 250 cows were collected in 216 commercial herds in Franche-Comté with the objectives to maximize the genetic diversity as well as the variation in milk composition. All coagulation and CY traits showed high variability (10 to 43%). Reference analyses performed for soft (SC) and pressed cooked (PCC) cheese technology were matched with MIR spectra. Prediction models were built on 446 informative wavelengths not tainted by the water absorbance, using different approaches such as partial least squares (PLS), uninformative variable elimination PLS, random forest PLS, Bayes A, Bayes B, Bayes C, and Bayes RR. We assessed equation performances for a set of 20 CMP traits (coagulation: 5 for SC and 4 for PCC; acidification: 5 for SC and 3 for PCC; laboratory CY: 3) by comparing prediction accuracies based on cross-validation. Overall, variable selection before PLS did not significantly improve the performances of the PLS regression, the prediction differences between Bayesian methods were negligible, and PLS models always outperformed Bayesian models. This was likely a result of the prior use of informative wavelengths of the MIR spectra. The best accuracies were obtained for curd yields expressed in dry matter (CYDM) or fresh (CYFRESH) and for coagulation traits (curd firmness for PCC and SC) using the PLS regression. Prediction models of other CMP traits were moderately to poorly accurate. Whatever the prediction methodology, the best results were always obtained for CY traits, probably because these traits are closely related to milk composition. The CYDM predictions showed coefficient of determination (R2) values up to 0.92 and 0.87, and RSy,x values of 3 and 4% for PLS and Bayes regressions, respectively. Finally, we divided the data set into calibration (2/3) and validation (1/3) sets and developed prediction models in external validation using PLS regression only. In conclusion, we confirmed, in the validation set, an excellent prediction for CYDM [R2 = 0.91, ratio of performance to deviation (RPD) = 3.39] and a very good prediction for CYFRESH (R2 = 0.84, RPD = 2.49), adequate for analytical purposes. We also obtained good results for both PCC and SC curd firmness traits (R2 ≥ 0.70, RPD ≥1.8), which enable quantitative prediction.
Collapse
|
24
|
Prediction of blood β-hydroxybutyrate content and occurrence of hyperketonemia in early-lactation, pasture-grazed dairy cows using milk infrared spectra. J Dairy Sci 2019; 102:6466-6476. [PMID: 31079906 DOI: 10.3168/jds.2018-15988] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Accepted: 02/12/2019] [Indexed: 11/19/2022]
Abstract
The objective of this study was to evaluate the ability of milk infrared spectra to predict blood β-hydroxybutyrate (BHB) concentration for use as a management tool for cow metabolic health on pasture-grazed dairy farms and for large-scale phenotyping for genetic evaluation purposes. The study involved 542 cows (Holstein-Friesian and Holstein-Friesian × Jersey crossbreds), from 2 farms located in the Waikato and Taranaki regions of New Zealand that operated under a seasonal-calving, pasture-based dairy system. Milk infrared spectra were collected once a week during the first 5 wk of lactation. A blood "prick" sample was taken from the ventral labial vein of each cow 3 times a week for the first 5 wk of lactation. The content of BHB in blood was measured immediately using a handheld device. After outlier elimination, 1,910 spectra records and corresponding BHB measures were used for prediction model development. Partial least square regression and partial least squares discriminant analysis were used to develop prediction models for quantitative determination of blood BHB content and for identifying cows with hyperketonemia (HYK). Both quantitative and discriminant predictions were developed using the phenotypes and infrared spectra from two-thirds of the cows (randomly assigned to the calibration set) and tested using the remaining one-third (validation set). A moderate accuracy was obtained for prediction of blood BHB. The coefficient of determination (R2) of the prediction model in calibration was 0.56, with a root mean squared error of prediction of 0.28 mmol/L and a ratio of performance to deviation, calculated as the ratio of the standard deviation of the partial least squares model calibration set to the standard error of prediction, of 1.50. In the validation set, the R2 was 0.50, with root mean squared error of prediction values of 0.32 mmol/L, which resulted in a ratio of performance to deviation of 1.39. When the reference test for HYK was defined as blood concentration of BHB ≥1.2 mmol/L, discriminant models indicated that milk infrared spectra correctly classified 76% of the HYK-positive cows and 82% of the HYK-negative cows. The quantitative models were not able to provide accurate estimates, but they could differentiate between high and low BHB concentrations. Furthermore, the discriminant models allowed the classification of cows with reasonable accuracy. This study indicates that the prediction of blood BHB content or occurrence of HYK from milk spectra is possible with moderate accuracy in pasture-grazed cows and could be used during routine milk testing. Applicability of infrared spectroscopy is not likely suited for obtaining accurate BHB measurements at an individual cow level, but discriminant models might be used in the future as herd-level management tools for classification of cows that are at risk of HYK, whereas quantitative models might provide large-scale phenotypes to be used as an indicator trait for breeding cows with improved metabolic health.
Collapse
|
25
|
Correlation of volatile compound concentrations with bacterial counts in whole pasteurised milk under various storage conditions. INT J DAIRY TECHNOL 2018. [DOI: 10.1111/1471-0307.12557] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|