26
|
Lello L, Avery SG, Tellier L, Vazquez AI, de Los Campos G, Hsu SDH. Accurate Genomic Prediction of Human Height. Genetics 2018; 210:477-497. [PMID: 30150289 PMCID: PMC6216598 DOI: 10.1534/genetics.118.301267] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 08/01/2018] [Indexed: 01/08/2023] Open
Abstract
We construct genomic predictors for heritable but extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). The constructed predictors explain, respectively, ∼40, 20, and 9% of total variance for the three traits, in data not used for training. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few centimeters of the prediction. The proportion of variance explained for height is comparable to the estimated common SNP heritability from genome-wide complex trait analysis (GCTA), and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for SNPs. Thus, our results close the gap between prediction R-squared and common SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common variants. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier genome-wide association studies (GWAS) for out-of-sample validation of our results.
Collapse
|
27
|
Toledo-Alvarado H, Vazquez AI, de los Campos G, Tempelman RJ, Gabai G, Cecchinato A, Bittante G. Changes in milk characteristics and fatty acid profile during the estrous cycle in dairy cows. J Dairy Sci 2018; 101:9135-9153. [DOI: 10.3168/jds.2018-14480] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 05/31/2018] [Indexed: 11/19/2022]
|
28
|
Sun M, Vazquez AI, Reynolds RJ, Singh JA, Reeves M, Merriman TR, Gaffo AL, Los Campos GD. Untangling the complex relationships between incident gout risk, serum urate, and its comorbidities. Arthritis Res Ther 2018; 20:90. [PMID: 29720278 PMCID: PMC5932762 DOI: 10.1186/s13075-018-1558-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 03/06/2018] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Many gout comorbidities (e.g., hypertension) are correlated with serum urate. In this investigation, we identified risk factors (e.g., systolic blood pressure [SBP]), that (1) are associated with incident gout, (2) have effects on gout risk that cannot be fully explained by correlated differences in serum urate, and (3) may modulate the relationship between gout and serum urate. METHODS Using data from the Atherosclerosis Risk in Communities (ARIC) study, we estimated the unadjusted associations between gout and risk factors by calculating ORs and using chi-square tests. The adjusted associations were analyzed using logistic regression by sequentially adding (1) one risk factor at a time or (2) all risk factors, to a baseline model that includes serum urate only. Stepwise selection was used to select main effects. Two-way interactions of variables from the main effects model were also analyzed. RESULTS Average gout incidence was 2.7 per 1000 people per year. Serum urate was highly associated with incident gout, with odd ratios of 3.16 [95% CI 2.11, 4.76] and 25.9 [95% CI 17.2, 38.4] for moderately high (6-8 mg/dl) and high serum urate (> 8 mg/dl), relative to normal serum urate (< 6 mg/dl), respectively. Ethnicity and SBP were independently and additively associated with gout after accounting for serum urate levels. No significant interactions were found between serum urate and ethnicity or SBP. CONCLUSIONS Ethnicity and hypertension are predictive of gout risk, and the associations cannot be fully explained by serum urate. For serum urate levels near the crystallization threshold (6-8 mg/dl) African Americans and people with hypertension are at two to three times greater risk for developing gout. The gout risk for this group appears to increase before the onset of severe hyperuricemia.
Collapse
|
29
|
Toledo-Alvarado H, Vazquez AI, de los Campos G, Tempelman RJ, Bittante G, Cecchinato A. Diagnosing pregnancy status using infrared spectra and milk composition in dairy cows. J Dairy Sci 2018; 101:2496-2505. [DOI: 10.3168/jds.2017-13647] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 11/08/2017] [Indexed: 01/01/2023]
|
30
|
Pickens CA, Vazquez AI, Jones AD, Fenton JI. Obesity, adipokines, and C-peptide are associated with distinct plasma phospholipid profiles in adult males, an untargeted lipidomic approach. Sci Rep 2017; 7:6335. [PMID: 28740130 PMCID: PMC5524758 DOI: 10.1038/s41598-017-05785-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Accepted: 06/05/2017] [Indexed: 12/12/2022] Open
Abstract
Obesity is associated with dysregulated lipid metabolism and adipokine secretion. Our group has previously reported obesity and adipokines are associated with % total fatty acid (FA) differences in plasma phospholipids. The objective of our current study was to identify in which complex lipid species (i.e., phosphatidylcholine, sphingolipids, etc) these FA differences occur. Plasma lipidomic profiling (n = 126, >95% Caucasian, 48–65 years) was performed using chromatographic separation and high resolution tandem mass spectrometry. The responses used in the statistical analyses were body mass index (BMI), waist circumference (WC), serum adipokines, cytokines, and a glycemic marker. High-dimensional statistical analyses were performed, all models were adjusted for age and smoking, and p-values were adjusted for false discovery. In Bayesian models, the lipidomic profiles (over 1,700 lipids) accounted for >60% of the inter-individual variation of BMI, WC, and leptin in our population. Across statistical analyses, we report 51 individual plasma lipids were significantly associated with obesity. Obesity was inversely associated lysophospholipids and ether linked phosphatidylcholines. In addition, we identify several unreported lipids associated with obesity that are not present in lipid databases. Taken together, these results provide new insights into the underlying biology associated with obesity and reveal new potential pathways for therapeutic targeting.
Collapse
|
31
|
Bray MS, Herring MP, Dishman RK, O’Connor DP, Jackson AS, Vazquez AI. Genome-wide Association For Exercise Tolerance In The TIGER Study. Med Sci Sports Exerc 2017. [DOI: 10.1249/01.mss.0000517065.40263.14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
32
|
Fahrenkrog AM, Neves LG, Resende MFR, Vazquez AI, de Los Campos G, Dervinis C, Sykes R, Davis M, Davenport R, Barbazuk WB, Kirst M. Genome-wide association study reveals putative regulators of bioenergy traits in Populus deltoides. THE NEW PHYTOLOGIST 2017; 213:799-811. [PMID: 27596807 DOI: 10.1111/nph.14154] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Accepted: 07/13/2016] [Indexed: 05/18/2023]
Abstract
Genome-wide association studies (GWAS) have been used extensively to dissect the genetic regulation of complex traits in plants. These studies have focused largely on the analysis of common genetic variants despite the abundance of rare polymorphisms in several species, and their potential role in trait variation. Here, we conducted the first GWAS in Populus deltoides, a genetically diverse keystone forest species in North America and an important short rotation woody crop for the bioenergy industry. We searched for associations between eight growth and wood composition traits, and common and low-frequency single-nucleotide polymorphisms detected by targeted resequencing of 18 153 genes in a population of 391 unrelated individuals. To increase power to detect associations with low-frequency variants, multiple-marker association tests were used in combination with single-marker association tests. Significant associations were discovered for all phenotypes and are indicative that low-frequency polymorphisms contribute to phenotypic variance of several bioenergy traits. Our results suggest that both common and low-frequency variants need to be considered for a comprehensive understanding of the genetic regulation of complex traits, particularly in species that carry large numbers of rare polymorphisms. These polymorphisms may be critical for the development of specialized plant feedstocks for bioenergy.
Collapse
|
33
|
Vazquez AI, Veturi Y, Behring M, Shrestha S, Kirst M, Resende MFR, de Los Campos G. Increased Proportion of Variance Explained and Prediction Accuracy of Survival of Breast Cancer Patients with Use of Whole-Genome Multiomic Profiles. Genetics 2016; 203:1425-38. [PMID: 27129736 PMCID: PMC4937492 DOI: 10.1534/genetics.115.185181] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2015] [Accepted: 04/12/2015] [Indexed: 11/18/2022] Open
Abstract
Whole-genome multiomic profiles hold valuable information for the analysis and prediction of disease risk and progression. However, integrating high-dimensional multilayer omic data into risk-assessment models is statistically and computationally challenging. We describe a statistical framework, the Bayesian generalized additive model ((BGAM), and present software for integrating multilayer high-dimensional inputs into risk-assessment models. We used BGAM and data from The Cancer Genome Atlas for the analysis and prediction of survival after diagnosis of breast cancer. We developed a sequence of studies to (1) compare predictions based on single omics with those based on clinical covariates commonly used for the assessment of breast cancer patients (COV), (2) evaluate the benefits of combining COV and omics, (3) compare models based on (a) COV and gene expression profiles from oncogenes with (b) COV and whole-genome gene expression (WGGE) profiles, and (4) evaluate the impacts of combining multiple omics and their interactions. We report that (1) WGGE profiles and whole-genome methylation (METH) profiles offer more predictive power than any of the COV commonly used in clinical practice (e.g., subtype and stage), (2) adding WGGE or METH profiles to COV increases prediction accuracy, (3) the predictive power of WGGE profiles is considerably higher than that based on expression from large-effect oncogenes, and (4) the gain in prediction accuracy when combining multiple omics is consistent. Our results show the feasibility of omic integration and highlight the importance of WGGE and METH profiles in breast cancer, achieving gains of up to 7 points area under the curve (AUC) over the COV in some cases.
Collapse
|
34
|
Reynolds RJ, Vazquez AI, Srinivasasainagendra V, Klimentidis YC, Bridges SL, Allison DB, Singh JA. Serum urate gene associations with incident gout, measured in the Framingham Heart Study, are modified by renal disease and not by body mass index. Rheumatol Int 2016; 36:263-70. [PMID: 26427508 PMCID: PMC4724568 DOI: 10.1007/s00296-015-3364-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 09/17/2015] [Indexed: 02/04/2023]
Abstract
We hypothesized that serum urate-associated SNPs, individually or collectively, interact with BMI and renal disease to contribute to risk of incident gout. We measured the incidence of gout and associated comorbidities using the original and offspring cohorts of the Framingham Heart Study. We used direct and imputed genotypes for eight validated serum urate loci. We fit binomial regression models of gout incidence as a function of the covariates, age, type 2 diabetes, sex, and all main and interaction effects of the eight serum urate SNPs with BMI and renal disease. Models were also fit with a genetic risk score for serum urate levels which corresponds to the sum of risk alleles at the eight SNPs. Model covariates, age (P = 5.95E-06), sex (P = 2.46E-39), diabetes (P = 2.34E-07), BMI (P = 1.14E-11) and the SNPs, rs1967017 (P = 9.54E-03), rs13129697 (P = 4.34E-07), rs2199936 (P = 7.28E-03) and rs675209 (P = 4.84E-02) were all associated with incident gout. No BMI by SNP or BMI by serum urate genetic risk score interactions were statistically significant, but renal disease by rs1106766 was statistically significant (P = 6.12E-03). We demonstrated that minor alleles of rs1106766 (intergenic, INHBC) were negatively associated with the risk of incident gout in subjects without renal disease, but not for individuals with renal disease. These analyses demonstrate that a significant component of the risk of gout may involve complex interplay between genes and environment.
Collapse
|
35
|
de Los Campos G, Veturi Y, Vazquez AI, Lehermeier C, Pérez-Rodríguez P. Incorporating Genetic Heterogeneity in Whole-Genome Regressions Using Interactions. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2015; 20:467-490. [PMID: 26660276 PMCID: PMC4666286 DOI: 10.1007/s13253-015-0222-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Accepted: 09/16/2015] [Indexed: 11/22/2022]
Abstract
Naturally and artificially selected populations usually exhibit some degree of stratification. In Genome-Wide Association Studies and in Whole-Genome Regressions (WGR) analyses, population stratification has been either ignored or dealt with as a potential confounder. However, systematic differences in allele frequency and in patterns of linkage disequilibrium can induce sub-population-specific effects. From this perspective, structure acts as an effect modifier rather than as a confounder. In this article, we extend WGR models commonly used in plant and animal breeding to allow for sub-population-specific effects. This is achieved by decomposing marker effects into main effects and interaction components that describe group-specific deviations. The model can be used both with variable selection and shrinkage methods and can be implemented using existing software for genomic selection. Using a wheat and a pig breeding data set, we compare parameter estimates and the prediction accuracy of the interaction WGR model with WGR analysis ignoring population stratification (across-group analysis) and with a stratified (i.e., within-sub-population) WGR analysis. The interaction model renders trait-specific estimates of the average correlation of effects between sub-populations; we find that such correlation not only depends on the extent of genetic differentiation in allele frequencies between groups but also varies among traits. The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches. This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model.
Collapse
|
36
|
Ferragina A, de los Campos G, Vazquez AI, Cecchinato A, Bittante G. Bayesian regression models outperform partial least squares methods for predicting milk components and technological properties using infrared spectral data. J Dairy Sci 2015; 98:8133-51. [PMID: 26387015 DOI: 10.3168/jds.2014-9143] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2014] [Accepted: 07/06/2015] [Indexed: 11/19/2022]
Abstract
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm(-1) were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from calibration to external validation methods, and in moving from PLS and MPLS to Bayesian methods, particularly Bayes A and Bayes B. The maximum R(2) value of validation was obtained with Bayes B and Bayes A. For the FA, C10:0 (% of each FA on total FA basis) had the highest R(2) (0.75, achieved with Bayes A and Bayes B), and among the technological traits, fresh cheese yield R(2) of 0.82 (achieved with Bayes B). These 2 methods have proven to be useful instruments in shrinking and selecting very informative wavelengths and inferring the structure and functions of the analyzed traits. We conclude that Bayesian models are powerful tools for deriving calibration equations, and, importantly, these equations can be easily developed using existing open-source software. As part of our study, we provide scripts based on the open source R software BGLR, which can be used to train customized prediction equations for other traits or populations.
Collapse
|
37
|
Lebrón-Aldea D, Dhurandhar EJ, Pérez-Rodríguez P, Klimentidis YC, Tiwari HK, Vazquez AI. Integrated genomic and BMI analysis for type 2 diabetes risk assessment. Front Genet 2015; 6:75. [PMID: 25852736 PMCID: PMC4362394 DOI: 10.3389/fgene.2015.00075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 02/12/2015] [Indexed: 11/23/2022] Open
Abstract
Type 2 Diabetes (T2D) is a chronic disease arising from the development of insulin absence or resistance within the body, and a complex interplay of environmental and genetic factors. The incidence of T2D has increased throughout the last few decades, together with the occurrence of the obesity epidemic. The consideration of variants identified by Genome Wide Association Studies (GWAS) into risk assessment models for T2D could aid in the identification of at-risk patients who could benefit from preventive medicine. In this study, we build several risk assessment models, evaluated with two different classification approaches (Logistic Regression and Neural Networks), to measure the effect of including genetic information in the prediction of T2D. We used data from to the Original and the Offspring cohorts of the Framingham Heart Study, which provides phenotypic and genetic information for 5245 subjects (4306 controls and 939 cases). Models were built by using several covariates: gender, exposure time, cohort, body mass index (BMI), and 65 SNPs associated to T2D. We fitted Logistic Regressions and Bayesian Regularized Neural Networks and then assessed their predictive ability by using a ten-fold cross validation. We found that the inclusion of genetic information into the risk assessment models increased the predictive ability by 2%, when compared to the baseline model. Furthermore, the models that included BMI at the onset of diabetes as a possible effector, gave an improvement of 6% in the area under the curve derived from the ROC analysis. The highest AUC achieved (0.75) belonged to the model that included BMI, and a genetic score based on the 65 established T2D-associated SNPs. Finally, the inclusion of SNPs and BMI raised predictive ability in all models as expected; however, results from the AUC in Neural Networks and Logistic Regression did not differ significantly in their prediction accuracy.
Collapse
|
38
|
Shendre A, Wiener HW, Zhi D, Vazquez AI, Portman MA, Shrestha S. High-density genotyping of immune loci in Kawasaki disease and IVIG treatment response in European-American case-parent trio study. Genes Immun 2014; 15:534-42. [PMID: 25101798 PMCID: PMC4257866 DOI: 10.1038/gene.2014.47] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Revised: 06/24/2014] [Accepted: 06/25/2014] [Indexed: 12/04/2022]
Abstract
Kawasaki disease (KD) is a diffuse and acute small-vessel vasculitis observed in children, and has genetic and autoimmune components. We genotyped 112 case-parent trios of European decent (confirmed by ancestry informative markers) using the immunoChip array, and performed association analyses with susceptibility to KD and intravenous immunoglobulin (IVIG) non-response. KD susceptibility was assessed using the transmission disequilibrium test, whereas IVIG non-response was evaluated using multivariable logistic regression analysis. We replicated single-nucleotide polymorphisms (SNPs) in three gene regions (FCGR, CD40/CDH22 and HLA-DQB2/HLA-DOB) that have been previously associated with KD and provide support to other findings of several novel SNPs in genes with a potential pathway in KD pathogenesis. SNP rs838143 in the 3'-untranslated region of the FUT1 gene (2.7 × 10(-5)) and rs9847915 in the intergenic region of LOC730109 | BRD7P2 (6.81 × 10(-7)) were the top hits for KD susceptibility in additive and dominant models, respectively. The top hits for IVIG responsiveness were rs1200332 in the intergenic region of BAZ1A | C14orf19 (1.4 × 10(-4)) and rs4889606 in the intron of the STX1B gene (6.95 × 10(-5)) in additive and dominant models, respectively. Our study suggests that genes and biological pathways involved in autoimmune diseases have an important role in the pathogenesis of KD and IVIG response mechanism.
Collapse
|
39
|
Dhurandhar EJ, Vazquez AI, Argyropoulos GA, Allison DB. Even modest prediction accuracy of genomic models can have large clinical utility. Front Genet 2014; 5:417. [PMID: 25506355 PMCID: PMC4246888 DOI: 10.3389/fgene.2014.00417] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Accepted: 11/07/2014] [Indexed: 11/17/2022] Open
Abstract
Whole Genome Prediction (WGP) jointly fits thousands of SNPs into a regression model to yield estimates for the contribution of markers to the overall variance of a particular trait, and for their associations with that trait. To date, WGP has offered only modest prediction accuracy, but in some cases even modest prediction accuracy may be useful. We provide an illustration of this using a theoretical simulation that used WGP to predict weight loss after bariatric surgery with moderate accuracy (R2 = 0.07) to assess the clinical utility of WGP despite these limitations. Prevention of Type 2 Diabetes (T2DM) post-surgery was considered the major outcome. Treating only patients above predefined threshold of predicted weight loss in our simulation, in the realistic context of finite resources for the surgery, significantly reduced lifetime risk of T2DM in the treatable population by selecting those most likely to succeed. Thus, our example illustrates how WGP may be clinically useful in some situations, and even with moderate accuracy, may provide a clear path for turning personalized medicine from theory to reality.
Collapse
|
40
|
Klimentidis YC, Wineinger NE, Vazquez AI, de Los Campos G. Multiple metabolic genetic risk scores and type 2 diabetes risk in three racial/ethnic groups. J Clin Endocrinol Metab 2014; 99:E1814-8. [PMID: 24905067 PMCID: PMC4154088 DOI: 10.1210/jc.2014-1818] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
UNLABELLED CONTEXT/RATIONALE: Meta-analyses of genome-wide association studies have identified many single-nucleotide polymorphisms associated with various metabolic and cardiovascular traits, offering us the opportunity to learn about and capitalize on the links between cardiometabolic traits and type 2 diabetes (T2D). DESIGN In multiple datasets comprising over 30 000 individuals and 3 ethnic/racial groups, we calculated 17 genetic risk scores (GRSs) for glycemic, anthropometric, lipid, hemodynamic, and other traits, based on the results of recent trait-specific meta-analyses of genome-wide association studies, and examined associations with T2D risk. Using a training-testing procedure, we evaluated whether additional GRSs could contribute to risk prediction. RESULTS In European Americans, we find that GRSs for T2D, fasting glucose, fasting insulin, and body mass index are associated with T2D risk. In African Americans, GRSs for T2D, fasting insulin, and waist-to-hip ratio are associated with T2D. In Hispanic Americans, GRSs for T2D and body mass index are associated with T2D. We observed a trend among European Americans suggesting that genetic risk for hyperlipidemia is inversely associated with T2D risk. The use of additional GRSs resulted in only small changes in prediction accuracy in multiple independent validation datasets. CONCLUSIONS The analysis of multiple GRSs can shed light on T2D etiology and how it varies across ethnic/racial groups. Our findings using multiple GRSs are consistent with what is known about the differences in T2D pathogenesis across racial/ethnic groups. However, further work is needed to understand the putative inverse correlation of genetic risk for hyperlipidemia and T2D risk and to develop ethnic-specific GRSs.
Collapse
|
41
|
Aslibekyan S, Wiener HW, Wu G, Zhi D, Shrestha S, de Los Campos G, Vazquez AI. Estimating proportions of explained variance: a comparison of whole genome subsets. BMC Proc 2014; 8:S102. [PMID: 25519356 PMCID: PMC4143698 DOI: 10.1186/1753-6561-8-s1-s102] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Following the publication of the ENCODE project results, there has been increasing interest in investigating different areas of the chromosome and evaluating the relative contribution of each area to expressed phenotypes. This study aims to evaluate the contribution of variants, classified by minor allele frequency and gene annotation, to the observed interindividual differences. In this study, we fitted Bayesian linear regression models to data from Genetic Analysis Workshop 18 (n = 395) to estimate the variance of standardized and log-transformed systolic blood pressure that can be explained by subsets of genetic markers. Rare and very rare variants explained an overall higher proportion of the variance, as did markers located within a gene rather than flanking regions. The proportion of variance explained by rare and very rare variants decreased when we controlled for the number of markers, suggesting that the number of contributing rare alleles plays an important role in the genetic architecture of chronic disease traits. Our findings lend support to the "common disease, rare variant" hypothesis for systolic blood pressure and highlight allele frequency and functional annotation of a polymorphism as potentially crucial considerations in whole genome study designs.
Collapse
|
42
|
Shendre A, Irvin MR, Aouizerat BE, Wiener HW, Vazquez AI, Anastos K, Lazar J, Liu C, Karim R, Limdi NA, Cohen MH, Golub ET, Zhi D, Kaplan RC, Shrestha S. RYR3 gene variants in subclinical atherosclerosis among HIV-infected women in the Women's Interagency HIV Study (WIHS). Atherosclerosis 2014; 233:666-672. [PMID: 24561552 PMCID: PMC3965606 DOI: 10.1016/j.atherosclerosis.2014.01.035] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Revised: 01/15/2014] [Accepted: 01/17/2014] [Indexed: 11/20/2022]
Abstract
BACKGROUND Single nucleotide polymorphisms (SNPs) in the Ryanodine receptor 3 (RYR3) gene are associated with common carotid intima media thickness (CCA cIMT) in HIV-infected men. We evaluated SNPs in the RYR3 gene among HIV-infected women participating in Women's Interagency HIV Study (WIHS). METHODS CCA cIMT was measured using B-mode ultrasound and the 838 SNPs in the RYR3 gene region were genotyped using the Illumina HumanOmni2.5-quad beadchip. The CCA cIMT genetic association was assessed using linear regression analyses among 1213 women and also separately among White (n=139), Black (n=720) and Hispanic (n=354) women after adjusting for confounders. A summary measure of pooled association was estimated using a meta-analytic approach by combining the effect estimates from the three races. Haploblocks were inferred using Gabriel's method and haplotype association analyses were conducted among the three races separately. RESULTS SNP rs62012610 was associated with CCA cIMT among the Hispanics (p=4.41×10(-5)), rs11856930 among Whites (p=5.62×10(-4)), and rs2572204 among Blacks (p=2.45×10(-3)). Meta-analysis revealed several associations of SNPs in the same direction and of similar magnitude, particularly among Blacks and Hispanics. Additionally, several haplotypes within three haploblocks containing SNPs previously related with CCA cIMT were also associated in Whites and Hispanics. DISCUSSION Consistent with previous research among HIV-infected men, SNPs within the RYR3 region were associated with subclinical atherosclerosis among HIV-infected women. Allelic heterogeneity observed across the three races suggests that the contribution of the RYR3 gene to CCA cIMT is complex, and warrants future studies to better understand regional SNP function.
Collapse
|
43
|
Libby EF, Azrad M, Novak L, Vazquez AI, Wilson TR, Demark-Wahnefried W. Obesity is associated with higher 4E-BP1 expression in endometrial cancer. ACTA ACUST UNITED AC 2014; 2014:1-7. [PMID: 24639918 PMCID: PMC3955094 DOI: 10.2147/cbf.s53530] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
PURPOSE Obesity is associated with risk and prognosis of endometrial cancer (EC), and the mammalian target of rapamycin complex 1 (mTORC1) pathway may play an instrumental role. We sought to explore the associations between cellular proliferation, Akt, and 4E binding protein-1 (4E-BP1) (a downstream target of mTORC1), in obese and nonobese women with and without EC. METHODS Archival tissue-specimens from endometrial biopsies were grouped into two broad categories based on the observed disease behavior and similarities in tissue staining patterns: benign/hyperplasia (without cytologic atypia) (n=18) versus atypia (complex hyperplasia with cytologic atypia)/carcinoma (n=25). The characteristics of the study population, including height and weight to determine body mass index (BMI: kg/m2), were abstracted from medical records. Immunohistochemistry was used to assess the phosphorylated (p)Akt, p4E-BP1, and antigen Ki67. RESULTS Cytoplasmic and nuclear pAkt were significantly associated with cytoplasmic p4E-BP1 (ρ=+0.48, ρ=+0.50) (P<0.05) and nuclear p4E-BP1 (ρ=+0.40, ρ=+0.44) (P<0.05); cytoplasmic and nuclear p4E-BP1 were significantly associated with Ki67 (ρ=+0.46, ρ=+0.59) (P<0.05). Compared with the benign/hyperplasia group, the women with atypia/carcinoma had significantly higher cytoplasmic and nuclear p4E-BP1 and Ki67. This staining pattern was similar in obese women; however, in nonobese women, neither cytoplasmic nor nuclear p4E-BP1staining differed between benign/hyperplasia versus atypia/carcinoma. CONCLUSION The activation of 4E-BP1 was higher in the obese women with EC. Adiposity may be a key factor to consider in future studies investigating the role of 4E-BP1 as a biomarker and therapeutic target in EC.
Collapse
|
44
|
Dawson JA, Dhurandhar EJ, Vazquez AI, Peng B, Allison DB. Propagation of obesity across generations: the roles of differential realized fertility and assortative mating by body mass index. Hum Hered 2013; 75:204-12. [PMID: 24081235 DOI: 10.1159/000352007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND/AIMS To quantify the extent to which the increase in obesity observed across recent generations of the American population is associated with the individual or combined effects of assortative mating (AM) for body mass index (BMI) and differential realized fertility by BMI. METHODS A Monte Carlo framework is formed and informed using data collected from the National Longitudinal Survey of Youth (NLSY). The model has 2 portions: one that generates childbirth events on an annual basis and another that produces a BMI for each child. Once the model is informed using the data, a reference distribution of offspring BMIs is simulated. We quantify the effects of our factors of interest by removing them from the model and comparing the resulting offspring BMI distributions with that of the baseline scenario. RESULTS An association between maternal BMI and number of offspring is evidenced in the NLSY data as well as the presence of AM. These 2 factors combined are associated with an increased mean BMI (+0.067, 95% CI: 0.056; 0.078), an increased BMI variance (+0.578, 95% CI: 0.418; 0.736) and an increased prevalence of obesity (RR 1.032, 95% CI: 1.023; 1.041) and BMIs >40 (RR 1.083, 95% CI: 1.053; 1.118) among offspring. CONCLUSION Our investigation suggests that both differential realized fertility and AM by BMI appear to play a role in the increasing prevalence of obesity in America.
Collapse
|
45
|
Klimentidis YC, Vazquez AI, de los Campos G, Allison DB, Dransfield MT, Thannickal VJ. Heritability of pulmonary function estimated from pedigree and whole-genome markers. Front Genet 2013; 4:174. [PMID: 24058366 PMCID: PMC3766834 DOI: 10.3389/fgene.2013.00174] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Accepted: 08/22/2013] [Indexed: 11/13/2022] Open
Abstract
Asthma and chronic obstructive pulmonary disease (COPD) are major worldwide health problems. Pulmonary function testing is a useful diagnostic tool for these diseases, and is known to be influenced by genetic and environmental factors. Previous studies have demonstrated that a substantial proportion of the variation in pulmonary function phenotypes can be explained by familial relationships. The availability of whole-genome single nucleotide polymorphism (SNP) data enables us to further evaluate the extent to which genetic factors account for variation in pulmonary function and to compare pedigree- to SNP-based estimates of heritability. Here, we employ methods developed in the animal breeding field to estimate the heritability of forced expiratory volume in one second (FEV1), forced vital capacity (FVC), and the ratio of these two measures (FEV1/FVC) among subjects in the Framingham Heart Study dataset. We compare heritability estimates based on pedigree-based relationships to those based on genome-wide SNPs. We find that, in a family-based study, estimates of heritability using SNP data are nearly identical to estimates based on pedigree information, and range from 0.50 for FEV1 to 0.66 for FEV1/FVC. Therefore, we conclude that genetic factors account for a sizable proportion of inter-individual differences in pulmonary function, and that estimates of heritability based on SNP data are nearly identical to estimates based on pedigree data. Finally, our findings suggest a higher heritability for FEV1/FVC compared to either FEV1 or FVC.
Collapse
|
46
|
de Los Campos G, Vazquez AI, Fernando R, Klimentidis YC, Sorensen D. Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet 2013; 9:e1003608. [PMID: 23874214 PMCID: PMC3708840 DOI: 10.1371/journal.pgen.1003608] [Citation(s) in RCA: 221] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2012] [Accepted: 05/20/2013] [Indexed: 01/12/2023] Open
Abstract
Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R2) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R2 based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1−b)2, where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R2. However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R2. Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R2 may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects. Despite great advances in genotyping technologies, the ability to predict complex traits and diseases remains limited. Increasing evidence suggests that many of these traits may be affected by a large number of small-effect genes that are difficult to detect in single-variant association studies. Whole-Genome Regression (WGR) methods can be used to confront this challenge and have exhibited good predictive power when applied to animal and plant breeding populations. WGR is receiving increased attention in the field of human genetics. However, human and breeding populations differ greatly in factors that can affect the performance of WGRs. Using theory, simulation and real data analysis, we study the predictive performance of the Genomic Best Linear Unbiased Predictor (G-BLUP), one of the most commonly used WGR methods. We derive upper bounds for the prediction accuracy of G-BLUP under perfect and imperfect LD between markers and genotypes at causal loci and validate such upper bounds using simulation and real data analysis. Imperfect LD between markers and causal loci can impose a very low upper bound on the prediction accuracy of G-BLUP, especially when data involve unrelated individuals. In this context, we propose and evaluate avenues for improving the predictive performance of G-BLUP.
Collapse
|
47
|
de Los Campos G, Pérez P, Vazquez AI, Crossa J. Genome-enabled prediction using the BLR (Bayesian Linear Regression) R-package. Methods Mol Biol 2013; 1019:299-320. [PMID: 23756896 DOI: 10.1007/978-1-62703-447-0_12] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The BLR (Bayesian linear regression) package of R implements several Bayesian regression models for continuous traits. The package was originally developed for implementing the Bayesian LASSO (BL) of Park and Casella (J Am Stat Assoc 103(482):681-686, 2008), extended to accommodate fixed effects and regressions on pedigree using methods described by de los Campos et al. (Genetics 182(1):375-385, 2009). In 2010 we further developed the code into an R-package, reprogrammed some internal aspects of the algorithm in the C language to increase computational speed, and further documented the package (Plant Genome J 3(2):106-116, 2010). The first version of BLR was launched in 2010 and since then the package has been used for multiple publications and is being routinely used for genomic evaluations in some animal and plant breeding programs. In this article we review the models implemented by BLR and illustrate the use of the package with examples.
Collapse
|
48
|
Vazquez AI, de los Campos G, Klimentidis YC, Rosa GJM, Gianola D, Yi N, Allison DB. A comprehensive genetic approach for improving prediction of skin cancer risk in humans. Genetics 2012; 192:1493-502. [PMID: 23051645 PMCID: PMC3512154 DOI: 10.1534/genetics.112.141705] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2012] [Accepted: 09/07/2012] [Indexed: 01/09/2023] Open
Abstract
Prediction of genetic risk for disease is needed for preventive and personalized medicine. Genome-wide association studies have found unprecedented numbers of variants associated with complex human traits and diseases. However, these variants explain only a small proportion of genetic risk. Mounting evidence suggests that many traits, relevant to public health, are affected by large numbers of small-effect genes and that prediction of genetic risk to those traits and diseases could be improved by incorporating large numbers of markers into whole-genome prediction (WGP) models. We developed a WGP model incorporating thousands of markers for prediction of skin cancer risk in humans. We also considered other ways of incorporating genetic information into prediction models, such as family history or ancestry (using principal components, PCs, of informative markers). Prediction accuracy was evaluated using the area under the receiver operating characteristic curve (AUC) estimated in a cross-validation. Incorporation of genetic information (i.e., familial relationships, PCs, or WGP) yielded a significant increase in prediction accuracy: from an AUC of 0.53 for a baseline model that accounted for nongenetic covariates to AUCs of 0.58 (pedigree), 0.62 (PCs), and 0.64 (WGP). In summary, prediction of skin cancer risk could be improved by considering genetic information and using a large number of single-nucleotide polymorphisms (SNPs) in a WGP model, which allows for the detection of patterns of genetic risk that are above and beyond those that can be captured using family history. We discuss avenues for improving prediction accuracy and speculate on the possible use of WGP to prospectively identify individuals at high risk.
Collapse
|
49
|
de los Campos G, Klimentidis YC, Vazquez AI, Allison DB. Prediction of expected years of life using whole-genome markers. PLoS One 2012; 7:e40964. [PMID: 22848416 PMCID: PMC3405107 DOI: 10.1371/journal.pone.0040964] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2011] [Accepted: 06/15/2012] [Indexed: 01/27/2023] Open
Abstract
Genetic factors are believed to account for 25% of the interindividual differences in Years of Life (YL) among humans. However, the genetic loci that have thus far been found to be associated with YL explain a very small proportion of the expected genetic variation in this trait, perhaps reflecting the complexity of the trait and the limitations of traditional association studies when applied to traits affected by a large number of small-effect genes. Using data from the Framingham Heart Study and statistical methods borrowed largely from the field of animal genetics (whole-genome prediction, WGP), we developed a WGP model for the study of YL and evaluated the extent to which thousands of genetic variants across the genome examined simultaneously can be used to predict interindividual differences in YL. We find that a sizable proportion of differences in YL--which were unexplained by age at entry, sex, smoking and BMI--can be accounted for and predicted using WGP methods. The contribution of genomic information to prediction accuracy was even higher than that of smoking and body mass index (BMI) combined; two predictors that are considered among the most important life-shortening factors. We evaluated the impacts of familial relationships and population structure (as described by the first two marker-derived principal components) and concluded that in our dataset population structure explained partially, but not fully the gains in prediction accuracy obtained with WGP. Further inspection of prediction accuracies by age at death indicated that most of the gains in predictive ability achieved with WGP were due to the increased accuracy of prediction of early mortality, perhaps reflecting the ability of WGP to capture differences in genetic risk to deadly diseases such as cancer, which are most often responsible for early mortality in our sample.
Collapse
|
50
|
Pérez-Cabal MA, Vazquez AI, Gianola D, Rosa GJM, Weigel KA. Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts. Front Genet 2012; 3:27. [PMID: 22403583 PMCID: PMC3288819 DOI: 10.3389/fgene.2012.00027] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 02/13/2012] [Indexed: 11/26/2022] Open
Abstract
The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered.
Collapse
|