Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Long N, Gianola D, Rosa G, Weigel K. Dimension reduction and variable selection for genomic selection: application to predicting milk yield in Holsteins. J Anim Breed Genet 2011;128:247-57. [DOI: 10.1111/j.1439-0388.2011.00917.x] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

For:	Long N, Gianola D, Rosa G, Weigel K. Dimension reduction and variable selection for genomic selection: application to predicting milk yield in Holsteins. J Anim Breed Genet 2011;128:247-57. [DOI: 10.1111/j.1439-0388.2011.00917.x] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

Number

Cited by Other Article(s)

Yao Z, Zou W, Zhang X, Nie P, Lv H, Wang W, Zhao X, Yang Y, Yang L. Integrating mid-infrared spectroscopy, machine learning, and graphical bias correction for fatty acid prediction in water buffalo milk. JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE 2024. [PMID: 38501395 DOI: 10.1002/jsfa.13471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/25/2024] [Accepted: 03/19/2024] [Indexed: 03/20/2024]

Abstract

BACKGROUND

Buffalo milk, constituting 15% of global production, has higher fatty acids content than Holstein milk. Fourier-transform mid-infrared (FT-MIR) spectroscopy is widely used for dairy analysis, but its application to buffalo milk, with larger fat globules, remains understudied. The ultimate goal of this study is to develop machine learning models based on FT-MIR spectroscopy for predicting fatty acids in buffalo milk and to assess the accuracy of commercial milk analyzers. This research provides a convenient, fast, and environmentally friendly method for detecting the fatty acid composition in buffalo milk.

RESULTS

We employed six machine learning algorithms to establish a detection model for 34 fatty acids in buffalo milk. The predictive models demonstrated robust capabilities for high-content fatty acids [C14:0, C15:0, C16:0, C17:0, C18:0, C18:1, saturated fatty acid (SFA), monounsaturated fatty acid (MUFA)], with errors within a 15% range. Traditional FT6000 detection methods exhibited limitations in measuring SFAs and polyunsaturated fatty acids (PUFA). Implementing a mean difference correction of 0.21 for MUFAs and applying regression equations (SFA × 1.0639 + 0.0705; PUFA × 0.5472 + 0.0047) significantly improved measurement accuracy.

CONCLUSION

This study successfully developed a predictive model for fatty acids in Mediterranean buffalo milk based on FT-MIR spectroscopy. Additionally, a correction was applied to the existing measurement device, FT6000, enabling more accurate measurements of fatty acids in buffalo milk. The findings have practical implications for the food industry, offering a faster and more reliable approach to assess and monitor fatty acid composition in buffalo milk, potentially influencing product development and quality control processes. © 2024 Society of Chemical Industry.

Collapse

Affiliation(s)

Zhiqiu Yao International Joint Research Center for Animal Genetics, Breeding and Reproduction (IJRCAGBR), Huazhong Agricultural University, Wuhan, China Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
Wenna Zou International Joint Research Center for Animal Genetics, Breeding and Reproduction (IJRCAGBR), Huazhong Agricultural University, Wuhan, China Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
Xinxin Zhang International Joint Research Center for Animal Genetics, Breeding and Reproduction (IJRCAGBR), Huazhong Agricultural University, Wuhan, China Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
Pei Nie International Joint Research Center for Animal Genetics, Breeding and Reproduction (IJRCAGBR), Huazhong Agricultural University, Wuhan, China Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China College of Veterinary Medicine, Hunan Agricultural University, Changsha, China
Haimiao Lv International Joint Research Center for Animal Genetics, Breeding and Reproduction (IJRCAGBR), Huazhong Agricultural University, Wuhan, China Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
Wei Wang International Joint Research Center for Animal Genetics, Breeding and Reproduction (IJRCAGBR), Huazhong Agricultural University, Wuhan, China Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
Xuhong Zhao International Joint Research Center for Animal Genetics, Breeding and Reproduction (IJRCAGBR), Huazhong Agricultural University, Wuhan, China Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
Ying Yang International Joint Research Center for Animal Genetics, Breeding and Reproduction (IJRCAGBR), Huazhong Agricultural University, Wuhan, China Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China
Liguo Yang International Joint Research Center for Animal Genetics, Breeding and Reproduction (IJRCAGBR), Huazhong Agricultural University, Wuhan, China Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China

Collapse

Shen J, Li H, Yu X, Bai L, Dong Y, Cao J, Lu K, Tang Z. Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder. Front Oncol 2023;12:1091767. [PMID: 36703783 PMCID: PMC9872139 DOI: 10.3389/fonc.2022.1091767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 12/19/2022] [Indexed: 01/11/2023] Open

Affiliation(s)

Junjie Shen Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, China
Huijun Li Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, China
Xinghao Yu Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, China,Center for Genetic Epidemiology and Genomics, School of Public Health, Medical College of Soochow University, Suzhou, China
Lu Bai Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, China
Yongfei Dong Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, China
Jianping Cao School of Radiation Medicine and Protection and Collaborative Innovation Center of Radiation Medicine of Jiangsu Higher Education Institutions, Soochow University, Suzhou, China
Ke Lu Department of Orthopedics, Affiliated Kunshan Hospital of Jiangsu University, Suzhou, China,*Correspondence: Zaixiang Tang, ; Ke Lu,
Zaixiang Tang Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, China,*Correspondence: Zaixiang Tang, ; Ke Lu,

Collapse

Sandhu KS, Shiv A, Kaur G, Meena MR, Raja AK, Vengavasi K, Mall AK, Kumar S, Singh PK, Singh J, Hemaprabha G, Pathak AD, Krishnappa G, Kumar S. Integrated Approach in Genomic Selection to Accelerate Genetic Gain in Sugarcane. PLANTS 2022;11:plants11162139. [PMID: 36015442 PMCID: PMC9412483 DOI: 10.3390/plants11162139] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/08/2022] [Accepted: 08/08/2022] [Indexed: 11/30/2022]

Islam MS, McCord PH, Olatoye MO, Qin L, Sood S, Lipka AE, Todd JR. Experimental evaluation of genomic selection prediction for rust resistance in sugarcane. THE PLANT GENOME 2021;14:e20148. [PMID: 34510803 DOI: 10.1002/tpg2.20148] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 07/22/2021] [Indexed: 06/13/2023]

Selection indexes using principal component analysis for reproductive, beef and milk traits in Simmental cattle. Trop Anim Health Prod 2021;53:378. [PMID: 34185177 DOI: 10.1007/s11250-021-02815-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 06/18/2021] [Indexed: 10/21/2022]

Aono AH, Costa EA, Rody HVS, Nagai JS, Pimenta RJG, Mancini MC, Dos Santos FRC, Pinto LR, Landell MGDA, de Souza AP, Kuroshu RM. Machine learning approaches reveal genomic regions associated with sugarcane brown rust resistance. Sci Rep 2020;10:20057. [PMID: 33208862 PMCID: PMC7676261 DOI: 10.1038/s41598-020-77063-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Accepted: 08/24/2020] [Indexed: 12/18/2022] Open

Mohino-Herranz I, Gil-Pita R, García-Gómez J, Rosa-Zurera M, Seoane F. A Wrapper Feature Selection Algorithm: An Emotional Assessment Using Physiological Recordings from Wearable Sensors. SENSORS 2020;20:s20010309. [PMID: 31935893 PMCID: PMC6983098 DOI: 10.3390/s20010309] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 12/29/2019] [Accepted: 01/03/2020] [Indexed: 11/16/2022]

Monteverde E, Gutierrez L, Blanco P, Pérez de Vida F, Rosas JE, Bonnecarrère V, Quero G, McCouch S. Integrating Molecular Markers and Environmental Covariates To Interpret Genotype by Environment Interaction in Rice (Oryza sativa L.) Grown in Subtropical Areas. G3 (BETHESDA, MD.) 2019;9:1519-1531. [PMID: 30877079 PMCID: PMC6505146 DOI: 10.1534/g3.119.400064] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Accepted: 03/05/2019] [Indexed: 01/11/2023]

Sun S, Miao Z, Ratcliffe B, Campbell P, Pasch B, El-Kassaby YA, Balasundaram B, Chen C. SNP variable selection by generalized graph domination. PLoS One 2019;14:e0203242. [PMID: 30677030 PMCID: PMC6345469 DOI: 10.1371/journal.pone.0203242] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 01/08/2019] [Indexed: 11/19/2022] Open

Abstract

BACKGROUND

High-throughput sequencing technology has revolutionized both medical and biological research by generating exceedingly large numbers of genetic variants. The resulting datasets share a number of common characteristics that might lead to poor generalization capacity. Concerns include noise accumulated due to the large number of predictors, sparse information regarding the p≫n problem, and overfitting and model mis-identification resulting from spurious collinearity. Additionally, complex correlation patterns are present among variables. As a consequence, reliable variable selection techniques play a pivotal role in predictive analysis, generalization capability, and robustness in clustering, as well as interpretability of the derived models.

METHODS AND FINDINGS

K-dominating set, a parameterized graph-theoretic generalization model, was used to model SNP (single nucleotide polymorphism) data as a similarity network and searched for representative SNP variables. In particular, each SNP was represented as a vertex in the graph, (dis)similarity measures such as correlation coefficients or pairwise linkage disequilibrium were estimated to describe the relationship between each pair of SNPs; a pair of vertices are adjacent, i.e. joined by an edge, if the pairwise similarity measure exceeds a user-specified threshold. A minimum k-dominating set in the SNP graph was then made as the smallest subset such that every SNP that is excluded from the subset has at least k neighbors in the selected ones. The strength of k-dominating set selection in identifying independent variables, and in culling representative variables that are highly correlated with others, was demonstrated by a simulated dataset. The advantages of k-dominating set variable selection were also illustrated in two applications: pedigree reconstruction using SNP profiles of 1,372 Douglas-fir trees, and species delineation for 226 grasshopper mouse samples. A C++ source code that implements SNP-SELECT and uses Gurobi optimization solver for the k-dominating set variable selection is available (https://github.com/transgenomicsosu/SNP-SELECT).

Collapse

Genomic Prediction of Growth and Stem Quality Traits in Eucalyptus globulus Labill. at Its Southernmost Distribution Limit in Chile. FORESTS 2018. [DOI: 10.3390/f9120779] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract The present study was undertaken to examine the ability of different genomic selection (GS) models to predict growth traits (diameter at breast height, tree height and wood volume), stem straightness and branching quality of Eucalyptus globulus Labill. trees using a genome-wide Single Nucleotide Polymorphism (SNP) chip (60 K), in one of the southernmost progeny trials of the species, close to its southern distribution limit in Chile. The GS methods examined were Ridge Regression-BLUP (RRBLUP), Bayes-A, Bayes-B, Bayesian least absolute shrinkage and selection operator (BLASSO), principal component regression (PCR), supervised PCR and a variant of the RRBLUP method that involves the previous selection of predictor variables (RRBLUP-B). RRBLUP-B and supervised PCR models presented the greatest predictive ability (PA), followed by the PCR method, for most of the traits studied. The highest PA was obtained for the branching quality (~0.7). For the growth traits, the maximum values of PA varied from 0.43 to 0.54, while for stem straightness, the maximum value of PA reached 0.62 (supervised PCR). The study population presented a more extended linkage disequilibrium (LD) than other populations of E. globulus previously studied. The genome-wide LD decayed rapidly within 0.76 Mbp (threshold value of r2 = 0.1). The average LD on all chromosomes was r2 = 0.09. In addition, the 0.15% of total pairs of linked SNPs were in a complete LD (r2 = 1), and the 3% had an r2 value >0.5. Genomic prediction, which is based on the reduction in dimensionality and variable selection may be a promising method, considering the early growth of the trees and the low-to-moderate values of heritability found in the traits evaluated. These findings provide new understanding of how develop novel breeding strategies for tree improvement of E. globulus at its southernmost range limit in Chile, which could represent new opportunities for forest planting that can benefit the local economy. Collapse

Hosseini-Vardanjani SM, Shariati MM, Moradi Shahrebabak H, Tahmoorespur M. Incorporating Prior Knowledge of Principal Components in Genomic Prediction. Front Genet 2018;9:289. [PMID: 30116258 PMCID: PMC6082966 DOI: 10.3389/fgene.2018.00289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2017] [Accepted: 07/11/2018] [Indexed: 12/05/2022] Open

Abstract

Genomic prediction using a large number of markers is challenging, due to the curse of dimensionality as well as multicollinearity arising from linkage disequilibrium between markers. Several methods have been proposed to solve these problems such as Principal Component Analysis (PCA) that is commonly used to reduce the dimension of predictor variables by generating orthogonal variables. Usually, the knowledge from PCA is incorporated in genomic prediction, assuming equal variance for the PCs or a variance proportional to the eigenvalues, both treat variances as fixed. Here, three prior distributions including normal, scaled-t and double exponential were assumed for PC effects in a Bayesian framework with a subset of PCs. These developed PCR models (dPCRm) were compared to routine genomic prediction models (RGPM) i.e., ridge and Bayesian ridge regression, BayesA, BayesB, and PC regression with a subset of PCs but PC variances predefined as proportional to the eigenvalues (PCR-Eigen). The performance of methods was compared by simulating a single trait with heritability of 0.25 on a genome consisted of 3,000 SNPs on three chromosomes and QTL numbers of 15, 60, and 105. After 500 generations of random mating as the historical population, a population was isolated and mated for another 15 generations. The generations 8 and 9 of recent population were used as the reference population and the next six generations as validation populations. The accuracy and bias of predictions were evaluated within the reference population, and each of validation populations. The accuracies of dPCRm were similar to RGPM (0.536 to 0.664 vs. 0.542 to 0.671), and higher than the accuracies of PCR-Eigen (0.504 to 0.641) within reference population over different QTL numbers. Decline in accuracies in validation populations were from 0.633 to 0.310, 0.639 to 0.313, and 0.617 to 0.298 using dPCRm, RGPM and PCR-Eigen, respectively. Prediction biases of dPCRm and RGPM were similar and always much less than biases of PCR-Eigen. In conclusion assuming PC variances as random variables via prior specification yielded higher accuracy than PCR-Eigen and same accuracy as RGPM, while fewer predictors were used.

Collapse

Li B, Zhang N, Wang YG, George AW, Reverter A, Li Y. Genomic Prediction of Breeding Values Using a Subset of SNPs Identified by Three Machine Learning Methods. Front Genet 2018;9:237. [PMID: 30023001 PMCID: PMC6039760 DOI: 10.3389/fgene.2018.00237] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 06/14/2018] [Indexed: 12/22/2022] Open

Abstract

The analysis of large genomic data is hampered by issues such as a small number of observations and a large number of predictive variables (commonly known as “large P small N”), high dimensionality or highly correlated data structures. Machine learning methods are renowned for dealing with these problems. To date machine learning methods have been applied in Genome-Wide Association Studies for identification of candidate genes, epistasis detection, gene network pathway analyses and genomic prediction of phenotypic values. However, the utility of two machine learning methods, Gradient Boosting Machine (GBM) and Extreme Gradient Boosting Method (XgBoost), in identifying a subset of SNP makers for genomic prediction of breeding values has never been explored before. In this study, using 38,082 SNP markers and body weight phenotypes from 2,093 Brahman cattle (1,097 bulls as a discovery population and 996 cows as a validation population), we examined the efficiency of three machine learning methods, namely Random Forests (RF), GBM and XgBoost, in (a) the identification of top 400, 1,000, and 3,000 ranked SNPs; (b) using the subsets of SNPs to construct genomic relationship matrices (GRMs) for the estimation of genomic breeding values (GEBVs). For comparison purposes, we also calculated the GEBVs from (1) 400, 1,000, and 3,000 SNPs that were randomly selected and evenly spaced across the genome, and (2) from all the SNPs. We found that RF and especially GBM are efficient methods in identifying a subset of SNPs with direct links to candidate genes affecting the growth trait. In comparison to the estimate of prediction accuracy of GEBVs from using all SNPs (0.43), the 3,000 top SNPs identified by RF (0.42) and GBM (0.46) had similar values to those of the whole SNP panel. The performance of the subsets of SNPs from RF and GBM was substantially better than that of evenly spaced subsets across the genome (0.18–0.29). Of the three methods, RF and GBM consistently outperformed the XgBoost in genomic prediction accuracy.

Collapse

Du C, Wei J, Wang S, Jia Z. Genomic selection using principal component regression. Heredity (Edinb) 2018;121:12-23. [PMID: 29713089 DOI: 10.1038/s41437-018-0078-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 03/17/2018] [Accepted: 03/21/2018] [Indexed: 01/02/2023] Open

Accuracy of Genomic Prediction in Switchgrass (Panicum virgatum L.) Improved by Accounting for Linkage Disequilibrium. G3-GENES GENOMES GENETICS 2016;6:1049-62. [PMID: 26869619 PMCID: PMC4825640 DOI: 10.1534/g3.115.024950] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Borel P, Desmarchelier C, Nowicki M, Bott R. Lycopene bioavailability is associated with a combination of genetic variants. Free Radic Biol Med 2015;83:238-44. [PMID: 25772008 DOI: 10.1016/j.freeradbiomed.2015.02.033] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/13/2015] [Revised: 02/25/2015] [Accepted: 02/26/2015] [Indexed: 10/23/2022]

Abstract

The intake of tomatoes and tomato products, which constitute the main dietary source of the red pigment lycopene (LYC), has been associated with a reduced risk of prostate cancer and cardiovascular disease, suggesting a protective role of this carotenoid. However, LYC bioavailability displays high interindividual variability. This variability may lead to varying biological effects following LYC consumption. Based on recent results obtained with two other carotenoids, we assumed that this variability was due, at least in part, to several single nucleotide polymorphisms (SNPs) in genes involved in LYC and lipid metabolism. Thus, we aimed at identifying a combination of SNPs significantly associated with the variability in LYC bioavailability. In a postprandial study, 33 healthy male volunteers consumed a test meal containing 100g tomato puree, which provided 9.7 mg all-trans LYC. LYC concentrations were measured in plasma chylomicrons (CM) isolated at regular time intervals over 8 h postprandially. For the study 1885 SNPs in 49 candidate genes, i.e., genes assumed to play a role in LYC bioavailability, were selected. Multivariate statistical analysis (partial least squares regression) was used to identify and validate the combination of SNPs most closely associated with postprandial CM LYC response. The postprandial CM LYC response to the meal was notably variable with a CV of 70%. A significant (P=0.037) and validated partial least squares regression model, which included 28 SNPs in 16 genes, explained 72% of the variance in the postprandial CM LYC response. The postprandial CM LYC response was also positively correlated to fasting plasma LYC concentrations (r=0.37, P<0.05). The ability to respond to LYC is explained, at least partly, by a combination of 28 SNPs in 16 genes. Interindividual variability in bioavailability apparently affects the long-term blood LYC status, which could ultimately modulate the biological response following LYC supplementation.

Collapse

Multiple-breed genomic evaluation by principal component analysis in small size populations. Animal 2015;9:738-49. [DOI: 10.1017/s1751731114002973] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open

Onogi A, Ideta O, Inoshita Y, Ebana K, Yoshioka T, Yamasaki M, Iwata H. Exploring the areas of applicability of whole-genome prediction methods for Asian rice (Oryza sativa L.). TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015;128:41-53. [PMID: 25341369 DOI: 10.1007/s00122-014-2411-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 10/03/2014] [Indexed: 05/25/2023]

Dadousis C, Veerkamp RF, Heringstad B, Pszczola M, Calus MPL. A comparison of principal component regression and genomic REML for genomic prediction across populations. Genet Sel Evol 2014;46:60. [PMID: 25370926 PMCID: PMC4220066 DOI: 10.1186/s12711-014-0060-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Accepted: 09/08/2014] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Genomic prediction faces two main statistical problems: multicollinearity and n ≪ p (many fewer observations than predictor variables). Principal component (PC) analysis is a multivariate statistical method that is often used to address these problems. The objective of this study was to compare the performance of PC regression (PCR) for genomic prediction with that of a commonly used REML model with a genomic relationship matrix (GREML) and to investigate the full potential of PCR for genomic prediction.

METHODS

The PCR model used either a common or a semi-supervised approach, where PC were selected based either on their eigenvalues (i.e. proportion of variance explained by SNP (single nucleotide polymorphism) genotypes) or on their association with phenotypic variance in the reference population (i.e. the regression sum of squares contribution). Cross-validation within the reference population was used to select the optimum PCR model that minimizes mean squared error. Pre-corrected average daily milk, fat and protein yields of 1609 first lactation Holstein heifers, from Ireland, UK, the Netherlands and Sweden, which were genotyped with 50 k SNPs, were analysed. Each testing subset included animals from only one country, or from only one selection line for the UK.

RESULTS

In general, accuracies of GREML and PCR were similar but GREML slightly outperformed PCR. Inclusion of genotyping information of validation animals into model training (semi-supervised PCR), did not result in more accurate genomic predictions. The highest achievable PCR accuracies were obtained across a wide range of numbers of PC fitted in the regression (from one to more than 1000), across test populations and traits. Using cross-validation within the reference population to derive the number of PC, yielded substantially lower accuracies than the highest achievable accuracies obtained across all possible numbers of PC.

CONCLUSIONS

On average, PCR performed only slightly less well than GREML. When the optimal number of PC was determined based on realized accuracy in the testing population, PCR showed a higher potential in terms of achievable accuracy that was not capitalized when PC selection was based on cross-validation. A standard approach for selecting the optimal set of PC in PCR remains a challenge.

Collapse

Azevedo CF, Silva FF, de Resende MDV, Lopes MS, Duijvesteijn N, Guimarães SEF, Lopes PS, Kelly MJ, Viana JMS, Knol EF. Supervised independent component analysis as an alternative method for genomic selection in pigs. J Anim Breed Genet 2014;131:452-61. [PMID: 25039677 DOI: 10.1111/jbg.12104] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 06/05/2014] [Indexed: 11/28/2022]

Borel P, Desmarchelier C, Nowicki M, Bott R, Morange S, Lesavre N. Interindividual variability of lutein bioavailability in healthy men: characterization, genetic variants involved, and relation with fasting plasma lutein concentration. Am J Clin Nutr 2014;100:168-75. [PMID: 24808487 DOI: 10.3945/ajcn.114.085720] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

Lutein accumulates in the macula and brain, where it is assumed to play physiologic roles. The bioavailability of lutein is assumed to display a high interindividual variability that has been hypothesized to be attributable, at least partly, to genetic polymorphisms.

OBJECTIVES

We characterized the interindividual variability in lutein bioavailability in humans, assessed the relation between this variability and the fasting blood lutein concentration, and identified single nucleotide polymorphisms (SNPs) involved in this phenomenon.

DESIGN

In a randomized, 2-way crossover study, 39 healthy men consumed a meal that contained a lutein supplement or the same meal for which lutein was provided through a tomato puree. The lutein concentration was measured in plasma chylomicrons isolated at regular time intervals over 8 h postprandially. Multivariate statistical analyses were used to identify a combination of SNPs associated with the postprandial chylomicron lutein response (0-8-h area under the curve). A total of 1785 SNPs in 51 candidate genes were selected.

RESULTS

Postprandial chylomicron lutein responses to meals were very variable (CV of 75% and 137% for the lutein-supplement meal and the meal with tomato-sourced lutein, respectively). Postprandial chylomicron lutein responses measured after the 2 meals were positively correlated (r = 0.68, P < 0.0001) and positively correlated to the fasting plasma lutein concentration (r = 0.51, P < 0.005 for the lutein-supplement-containing meal). A significant (P = 1.9 × 10(-4)) and validated partial least-squares regression model, which included 29 SNPs in 15 genes, explained most of the variance in the postprandial chylomicron lutein response.

CONCLUSIONS

The ability to respond to lutein appears to be, at least in part, genetically determined. The ability is explained, in large part, by a combination of SNPs in 15 genes related to both lutein and chylomicron metabolism. Finally, our results suggest that the ability to respond to lutein and blood lutein status are related. This trial was registered at clinicaltrials.gov as NCT02100774.

Collapse

Affiliation(s)

Patrick Borel From Institut National de la Recherche Agronomique (INRA), Unité Mixte de Recherche (UMR) INRA1260, Marseille, France (PB, CD, MN, and RB); the Institut National de la Recherche Médicale (INSERM), UMR_S 1062, Marseille, France (PB, CD, MN, and RB); Aix Marseille Université, Nutrition Obésité et Risque Thrombotique, Marseille, France (PB, CD, MN, and RB); the Centre d'Investigation Clinique (CIC) Hôpital de la Conception, Marseille, France (SM); and the CIC Hôpital Nord, Marseille, France (NL)
Charles Desmarchelier From Institut National de la Recherche Agronomique (INRA), Unité Mixte de Recherche (UMR) INRA1260, Marseille, France (PB, CD, MN, and RB); the Institut National de la Recherche Médicale (INSERM), UMR_S 1062, Marseille, France (PB, CD, MN, and RB); Aix Marseille Université, Nutrition Obésité et Risque Thrombotique, Marseille, France (PB, CD, MN, and RB); the Centre d'Investigation Clinique (CIC) Hôpital de la Conception, Marseille, France (SM); and the CIC Hôpital Nord, Marseille, France (NL)
Marion Nowicki From Institut National de la Recherche Agronomique (INRA), Unité Mixte de Recherche (UMR) INRA1260, Marseille, France (PB, CD, MN, and RB); the Institut National de la Recherche Médicale (INSERM), UMR_S 1062, Marseille, France (PB, CD, MN, and RB); Aix Marseille Université, Nutrition Obésité et Risque Thrombotique, Marseille, France (PB, CD, MN, and RB); the Centre d'Investigation Clinique (CIC) Hôpital de la Conception, Marseille, France (SM); and the CIC Hôpital Nord, Marseille, France (NL)
Romain Bott From Institut National de la Recherche Agronomique (INRA), Unité Mixte de Recherche (UMR) INRA1260, Marseille, France (PB, CD, MN, and RB); the Institut National de la Recherche Médicale (INSERM), UMR_S 1062, Marseille, France (PB, CD, MN, and RB); Aix Marseille Université, Nutrition Obésité et Risque Thrombotique, Marseille, France (PB, CD, MN, and RB); the Centre d'Investigation Clinique (CIC) Hôpital de la Conception, Marseille, France (SM); and the CIC Hôpital Nord, Marseille, France (NL)
Sophie Morange From Institut National de la Recherche Agronomique (INRA), Unité Mixte de Recherche (UMR) INRA1260, Marseille, France (PB, CD, MN, and RB); the Institut National de la Recherche Médicale (INSERM), UMR_S 1062, Marseille, France (PB, CD, MN, and RB); Aix Marseille Université, Nutrition Obésité et Risque Thrombotique, Marseille, France (PB, CD, MN, and RB); the Centre d'Investigation Clinique (CIC) Hôpital de la Conception, Marseille, France (SM); and the CIC Hôpital Nord, Marseille, France (NL)
Nathalie Lesavre From Institut National de la Recherche Agronomique (INRA), Unité Mixte de Recherche (UMR) INRA1260, Marseille, France (PB, CD, MN, and RB); the Institut National de la Recherche Médicale (INSERM), UMR_S 1062, Marseille, France (PB, CD, MN, and RB); Aix Marseille Université, Nutrition Obésité et Risque Thrombotique, Marseille, France (PB, CD, MN, and RB); the Centre d'Investigation Clinique (CIC) Hôpital de la Conception, Marseille, France (SM); and the CIC Hôpital Nord, Marseille, France (NL)

Collapse

Desmarchelier C, Martin JC, Planells R, Gastaldi M, Nowicki M, Goncalves A, Valéro R, Lairon D, Borel P. The postprandial chylomicron triacylglycerol response to dietary fat in healthy male adults is significantly explained by a combination of single nucleotide polymorphisms in genes involved in triacylglycerol metabolism. J Clin Endocrinol Metab 2014;99:E484-8. [PMID: 24423365 DOI: 10.1210/jc.2013-3962] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Smaragdov MG. Genomic selection of milk cattle. The practical application over five years. RUSS J GENET+ 2013. [DOI: 10.1134/s1022795413100104] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Enlarging a training set for genomic selection by imputation of un-genotyped animals in populations of varying genetic architecture. Genet Sel Evol 2013;45:12. [PMID: 23621897 PMCID: PMC3652763 DOI: 10.1186/1297-9686-45-12] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Accepted: 03/24/2013] [Indexed: 02/02/2023] Open

Abstract

Background

The most common application of imputation is to infer genotypes of a high-density panel of markers on animals that are genotyped for a low-density panel. However, the increase in accuracy of genomic predictions resulting from an increase in the number of markers tends to reach a plateau beyond a certain density. Another application of imputation is to increase the size of the training set with un-genotyped animals. This strategy can be particularly successful when a set of closely related individuals are genotyped.

Methods

Imputation on completely un-genotyped dams was performed using known genotypes from the sire of each dam, one offspring and the offspring’s sire. Two methods were applied based on either allele or haplotype frequencies to infer genotypes at ambiguous loci. Results of these methods and of two available software packages were compared. Quality of imputation under different population structures was assessed. The impact of using imputed dams to enlarge training sets on the accuracy of genomic predictions was evaluated for different populations, heritabilities and sizes of training sets.

Results

Imputation accuracy ranged from 0.52 to 0.93 depending on the population structure and the method used. The method that used allele frequencies performed better than the method based on haplotype frequencies. Accuracy of imputation was higher for populations with higher levels of linkage disequilibrium and with larger proportions of markers with more extreme allele frequencies. Inclusion of imputed dams in the training set increased the accuracy of genomic predictions. Gains in accuracy ranged from close to zero to 37.14%, depending on the simulated scenario. Generally, the larger the accuracy already obtained with the genotyped training set, the lower the increase in accuracy achieved by adding imputed dams.

Conclusions

Whenever a reference population resembling the family configuration considered here is available, imputation can be used to achieve an extra increase in accuracy of genomic predictions by enlarging the training set with completely un-genotyped dams. This strategy was shown to be particularly useful for populations with lower levels of linkage disequilibrium, for genomic selection on traits with low heritability, and for species or breeds for which the size of the reference population is limited.

Collapse

de Los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 2013;193:327-45. [PMID: 22745228 PMCID: PMC3567727 DOI: 10.1534/genetics.112.143313] [Citation(s) in RCA: 471] [Impact Index Per Article: 42.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 06/11/2012] [Indexed: 11/18/2022] Open

Gaspa G, Pintus MA, Nicolazzi EL, Vicario D, Valentini A, Dimauro C, Macciotta NPP. Use of principal component approach to predict direct genomic breeding values for beef traits in Italian Simmental cattle1. J Anim Sci 2013;91:29-37. [DOI: 10.2527/jas.2011-5061] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Colombani C, Legarra A, Fritz S, Guillaume F, Croiseau P, Ducrocq V, Robert-Granié C. Application of Bayesian least absolute shrinkage and selection operator (LASSO) and BayesCπ methods for genomic selection in French Holstein and Montbéliarde breeds. J Dairy Sci 2012;96:575-91. [PMID: 23127905 DOI: 10.3168/jds.2011-5225] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2011] [Accepted: 09/14/2012] [Indexed: 11/19/2022]

Abstract

Recently, the amount of available single nucleotide polymorphism (SNP) marker data has considerably increased in dairy cattle breeds, both for research purposes and for application in commercial breeding and selection programs. Bayesian methods are currently used in the genomic evaluation of dairy cattle to handle very large sets of explanatory variables with a limited number of observations. In this study, we applied 2 bayesian methods, BayesCπ and bayesian least absolute shrinkage and selection operator (LASSO), to 2 genotyped and phenotyped reference populations consisting of 3,940 Holstein bulls and 1,172 Montbéliarde bulls with approximately 40,000 polymorphic SNP. We compared the accuracy of the bayesian methods for the prediction of 3 traits (milk yield, fat content, and conception rate) with pedigree-based BLUP, genomic BLUP, partial least squares (PLS) regression, and sparse PLS regression, a variable selection PLS variant. The results showed that the correlations between observed and predicted phenotypes were similar in BayesCπ (including or not pedigree information) and bayesian LASSO for most of the traits and whatever the breed. In the Holstein breed, bayesian methods led to higher correlations than other approaches for fat content and were similar to genomic BLUP for milk yield and to genomic BLUP and PLS regression for the conception rate. In the Montbéliarde breed, no method dominated the others, except BayesCπ for fat content. The better performances of the bayesian methods for fat content in Holstein and Montbéliarde breeds are probably due to the effect of the DGAT1 gene. The SNP identified by the BayesCπ, bayesian LASSO, and sparse PLS regression methods, based on their effect on the different traits of interest, were located at almost the same position on the genome. As the bayesian methods resulted in regressions of direct genomic values on daughter trait deviations closer to 1 than for the other methods tested in this study, bayesian methods are suggested for genomic evaluations of French dairy cattle.

Collapse

Pintus MA, Gaspa G, Nicolazzi EL, Vicario D, Rossoni A, Ajmone-Marsan P, Nardone A, Dimauro C, Macciotta NPP. Prediction of genomic breeding values for dairy traits in Italian Brown and Simmental bulls using a principal component approach. J Dairy Sci 2012;95:3390-400. [PMID: 22612973 DOI: 10.3168/jds.2011-4274] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2011] [Accepted: 02/13/2012] [Indexed: 01/18/2023]

Colombani C, Croiseau P, Fritz S, Guillaume F, Legarra A, Ducrocq V, Robert-Granié C. A comparison of partial least squares (PLS) and sparse PLS regressions in genomic selection in French dairy cattle. J Dairy Sci 2012;95:2120-31. [PMID: 22459857 DOI: 10.3168/jds.2011-4647] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Accepted: 12/09/2011] [Indexed: 01/25/2023]

Pintus M, Nicolazzi E, Van Kaam J, Biffani S, Stella A, Gaspa G, Dimauro C, Macciotta N. Use of different statistical models to predict direct genomic values for productive and functional traits in Italian Holsteins. J Anim Breed Genet 2012;130:32-40. [DOI: 10.1111/j.1439-0388.2012.01019.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2012] [Accepted: 06/16/2012] [Indexed: 01/02/2023]

[Genomic selection and its application]. YI CHUAN = HEREDITAS 2011;33:1308-16. [PMID: 22207376 DOI: 10.3724/sp.j.1005.2011.01308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Abstract

Selective breeding is very important in agricultural production and breeding value estimation is the core of selective breeding. With the development of genetic markers, especially high throughput genotyping technology, it becomes available to estimate breeding value at genome level, i.e. genomic selection (GS). In this review, the methods of GS was categorized into two groups: one is to predict genomic estimated breeding value (GEBV) based on the allele effect, such as least squares, random regression - best linear unbiased prediction (RR-BLUP), Bayes and principle component analysis, etc; the other is to predict GEBV with genetic relationship matrix, which constructs genetic relationship matrix via high throughput genetic markers and then predicts GEBV through linear mixed model, i.e. GBLUP. The basic principles of these methods were also introduced according to the above two classifications. Factors affecting GS accuracy include markers of type and density, length of haplotype, the size of reference population, the extent between marker-QTL and so on. Among the methods of GS, Bayes and GBLUP are usually more accurate than the others and least squares is the worst. GBLUP is time-efficient and can combine pedigree with genotypic information, hence it is superior to other methods. Although progress was made in GS, there are still some challenges, for examples, united breeding, long-term genetic gain with GS, and disentangling markers with and without contribution to the traits. GS has been applied in animal and plant breeding practice and also has the potential to predict genetic predisposition in humans and study evolutionary dynamics. GS, which is more precise than the traditional method, is a breakthrough at measuring genetic relationship. Therefore, GS will be a revolutionary event in the history of animal and plant breeding.

Collapse

Feng ZZ, Yang X, Subedi S, McNicholas PD. The LASSO and sparse least square regression methods for SNP selection in predicting quantitative traits. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;9:629-636. [PMID: 22025756 DOI: 10.1109/tcbb.2011.139] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]