1
|
Generalization of adiposity genetic loci to US Hispanic women. Nutr Diabetes 2013; 3:e85. [PMID: 23978819 PMCID: PMC3759132 DOI: 10.1038/nutd.2013.26] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Revised: 06/28/2013] [Accepted: 07/22/2013] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND: Obesity is a public health concern. Yet the identification of adiposity-related genetic variants among United States (US) Hispanics, which is the largest US minority group, remains largely unknown. OBJECTIVE: To interrogate an a priori list of 47 (32 overall body mass and 15 central adiposity) index single-nucleotide polymorphisms (SNPs) previously studied in individuals of European descent among 3494 US Hispanic women in the Women's Health Initiative SNP Health Association Resource (WHI SHARe). DESIGN: Cross-sectional analysis of measured body mass index (BMI), waist circumference (WC) and waist-to-hip ratio (WHR) were inverse normally transformed after adjusting for age, smoking, center and global ancestry. WC and WHR models were also adjusted for BMI. Genotyping was performed using the Affymetrix 6.0 array. In the absence of an a priori selected SNP, a proxy was selected (r2⩾0.8 in CEU). RESULTS: Six BMI loci (TMEM18, NUDT3/HMGA1, FAIM2, FTO, MC4R and KCTD15) and two WC/WHR loci (VEGFA and ITPR2-SSPN) were nominally significant (P<0.05) at the index or proxy SNP in the corresponding BMI and WC/WHR models. To account for distinct linkage disequilibrium patterns in Hispanics and further assess generalization of genetic effects at each locus, we interrogated the evidence for association at the 47 surrounding loci within 1 Mb region of the index or proxy SNP. Three additional BMI loci (FANCL, TFAP2B and ETV5) and five WC/WHR loci (DNM3-PIGC, GRB14, ADAMTS9, LY86 and MSRA) displayed Bonferroni-corrected significant associations with BMI and WC/WHR. Conditional analyses of each index SNP (or its proxy) and the most significant SNP within the 1 Mb region supported the possible presence of index-independent signals at each of these eight loci as well as at KCTD15. CONCLUSION: This study provides evidence for the generalization of nine BMI and seven central adiposity loci in Hispanic women. This study expands the current knowledge of common adiposity-related genetic loci to Hispanic women.
Collapse
|
2
|
Boosting for detection of gene-environment interactions. Stat Med 2012; 32:255-66. [PMID: 22764060 DOI: 10.1002/sim.5444] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2011] [Revised: 05/01/2012] [Accepted: 05/02/2012] [Indexed: 11/08/2022]
Abstract
In genetic association studies, it is typically thought that genetic variants and environmental variables jointly will explain more of the inheritance of a phenotype than either of these two components separately. Traditional methods to identify gene-environment interactions typically consider only one measured environmental variable at a time. However, in practice, multiple environmental factors may each be imprecise surrogates for the underlying physiological process that actually interacts with the genetic factors. In this paper, we develop a variant of L(2) boosting that is specifically designed to identify combinations of environmental variables that jointly modify the effect of a gene on a phenotype. Because the effect modifiers might have a small signal compared with the main effects, working in a space that is orthogonal to the main predictors allows us to focus on the interaction space. In a simulation study that investigates some plausible underlying model assumptions, our method outperforms the least absolute shrinkage and selection and Akaike Information Criterion and Bayesian Information Criterion model selection procedures as having the lowest test error. In an example for the Women's Health Initiative-Population Architecture using Genomics and Epidemiology study, the dedicated boosting method was able to pick out two single-nucleotide polymorphisms for which effect modification appears present. The performance was evaluated on an independent test set, and the results are promising.
Collapse
|
3
|
The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery. Genet Epidemiol 2011; 35:410-22. [PMID: 21594894 DOI: 10.1002/gepi.20589] [Citation(s) in RCA: 125] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2011] [Revised: 04/01/2011] [Accepted: 04/03/2011] [Indexed: 01/09/2023]
Abstract
The field of phenomics has been investigating network structure among large arrays of phenotypes, and genome-wide association studies (GWAS) have been used to investigate the relationship between genetic variation and single diseases/outcomes. A novel approach has emerged combining both the exploration of phenotypic structure and genotypic variation, known as the phenome-wide association study (PheWAS). The Population Architecture using Genomics and Epidemiology (PAGE) network is a National Human Genome Research Institute (NHGRI)-supported collaboration of four groups accessing eight extensively characterized epidemiologic studies. The primary focus of PAGE is deep characterization of well-replicated GWAS variants and their relationships to various phenotypes and traits in diverse epidemiologic studies that include European Americans, African Americans, Mexican Americans/Hispanics, Asians/Pacific Islanders, and Native Americans. The rich phenotypic resources of PAGE studies provide a unique opportunity for PheWAS as each genotyped variant can be tested for an association with the wide array of phenotypic measurements available within the studies of PAGE, including prevalent and incident status for multiple common clinical conditions and risk factors, as well as clinical parameters and intermediate biomarkers. The results of PheWAS can be used to discover novel relationships between SNPs, phenotypes, and networks of interrelated phenotypes; identify pleiotropy; provide novel mechanistic insights; and foster hypothesis generation. The PAGE network has developed infrastructure to support and perform PheWAS in a high-throughput manner. As implementing the PheWAS approach has presented several challenges, the infrastructure and methodology, as well as insights gained in this project, are presented herein to benefit the larger scientific community.
Collapse
|
4
|
Response: Re: Calcium Plus Vitamin D Supplementation and the Risk of Breast Cancer. J Natl Cancer Inst 2009. [DOI: 10.1093/jnci/djp066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
5
|
Variation in 24 hemostatic genes and associations with non-fatal myocardial infarction and ischemic stroke. J Thromb Haemost 2008; 6:45-53. [PMID: 17927806 DOI: 10.1111/j.1538-7836.2007.02795.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
BACKGROUND Arterial thrombosis involves platelet aggregation and clot formation, yet little is known about the contribution of genetic variation in fibrin-based hemostatic factors to arterial clotting risk. We hypothesized that common variation in 24 coagulation-fibrinolysis genes would contribute to risk of incident myocardial infarction (MI) or ischemic stroke (IS). METHODS We conducted a population-based, case-control study. Subjects were hypertensive adults and postmenopausal women 30-79 years of age, who sustained a first MI (n = 856) or IS (n = 368) between 1995 and 2002, and controls matched on age, hypertension status, and calendar year (n = 2,689). We investigated the risk of MI and IS associated with (i) global variation within each gene as measured by common haplotypes and (ii) individual haplotypes and single nucleotide polymorphisms (SNPs). Significance was assessed using a 0.2 threshold of the false discovery rate q-value, which accounts for multiple testing. RESULTS After accounting for multiple testing, global genetic variation in factor (F) VIII was associated with IS risk. Two haplotypes in FVIII and one in FXIIIa1 were significantly associated with increased IS risk (all q-values < 0.2). A plasminogen gene SNP was associated with MI risk. All are new discoveries not previously reported. Another 24 tests had P-values < 0.05 and q-values > 0.2 in MI and IS analyses, 23 of which are new and hypothesis generating. CONCLUSIONS Apart from the association of FVIII variation with IS, we found little evidence that common variation in the 24 candidate fibrin-based hemostasis genes strongly influences arterial thrombotic risk, but our results cannot rule out small effects.
Collapse
|
6
|
The Women’s Health Initiative randomized trial of calcium plus vitamin D: Effects on breast cancer and arthralgias. J Clin Oncol 2006. [DOI: 10.1200/jco.2006.24.18_suppl.lba6] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
LBA6 Background: Calcium (Ca) and vitamin D (D) have been associated with reduced breast cancer and breast density in observational studies. Randomized trials have not evaluated Ca/D supplementation for breast cancer prevention. Methods: We randomized 36,282 postmenopausal women without prior breast cancer from 40 WHI centers to 1000 mg of elemental calcium as calcium carbonate and 400 IU of vitamin D3 (N = 18,176) daily or matching placebo (N = 18,106); 54% were also randomized one year previously to hormone therapy (HT) or placebo; conjugated equine estrogen (CEE) plus medroxyprogesterone acetate or CEE alone (the latter for those with prior hysterectomy). Ca/D effects on hip fracture and colorectal cancer have been reported (NEJM 2006). We report here pathologically confirmed invasive breast cancer as a secondary outcome of the Ca/D trial. Baseline serum 25(OH) D levels (in 1787 women) and serial joint symptoms (pain/stiffness and hand/feet swelling 0–3 scale, in a 6% sample) were also assessed. Results: Breast cancer incidence did not differ between Ca/D and placebo randomization groups (528 and 546 cases in Ca/D and placebo; hazard ratio 0.96; 95 percent confidence interval (CI), 0.85, 1.09). While SEER stage and abnormal mammogram frequency were similar between groups, breast cancers were smaller in the Ca/D group (1.54 cm (1.23), mean (SD) versus 1.71 (1.29), P = 0.05). Total vitamin D baseline intake was associated with lower breast cancer risk in the placebo group. Baseline vitamin D (nmol per liter) deficiency was common (≥30, sufficient (n = 266), 16 ≤ 30, insufficient (277), < 16, deficient (743)) but was not related to joint pain (seen in 72.2%, 74.0%, 74.6%, of sufficiency and deficiency groups, respectively). Joint symptoms were lower in women randomized to CEE alone (P < 0.01) but did not significantly differ by Ca/D group assignment and no significant interactions were seen between HT and Ca/D. Conclusion: Among healthy postmenopausal women, Ca/D supplementation did not reduce breast cancer risk but the cancers in those randomized to Ca/D were somewhat smaller. Exogenous estrogen use but not Ca/D supplementation influences arthralgias. [Table: see text]
Collapse
|
7
|
Abstract
Logic Regression is a new adaptive regression methodology that attempts to construct predictors as Boolean combinations of (binary) covariates. In this paper we use this algorithm to deal with single-nucleotide polymorphism (SNP) sequence data. The predictors that are found are interpretable as risk factors of the disease. Significance of these risk factors is assessed using techniques like cross-validation, permutation tests, and independent test sets. These model selection techniques remain valid when data is dependent, as is the case for the family data used here. In our analysis of the Genetic Analysis Workshop 12 data we identify the exact locations of mutations on gene 1 and gene 6 and a number of mutations on gene 2 that are associated with the affected status, without selecting any false positives.
Collapse
|
8
|
Widespread collaboration of Isw2 and Sin3-Rpd3 chromatin remodeling complexes in transcriptional repression. Mol Cell Biol 2001; 21:6450-60. [PMID: 11533234 PMCID: PMC99792 DOI: 10.1128/mcb.21.19.6450-6460.2001] [Citation(s) in RCA: 177] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The yeast Isw2 chromatin remodeling complex functions in parallel with the Sin3-Rpd3 histone deacetylase complex to repress early meiotic genes upon recruitment by Ume6p. For many of these genes, the effect of an isw2 mutation is partially masked by a functional Sin3-Rpd3 complex. To identify the full range of genes repressed or activated by these factors and uncover hidden targets of Isw2-dependent regulation, we performed full genome expression analyses using cDNA microarrays. We find that the Isw2 complex functions mainly in repression of transcription in a parallel pathway with the Sin3-Rpd3 complex. In addition to Ume6 target genes, we find that many Ume6-independent genes are derepressed in mutants lacking functional Isw2 and Sin3-Rpd3 complexes. Conversely, we find that ume6 mutants, but not isw2 sin3 or isw2 rpd3 double mutants, have reduced fidelity of mitotic chromosome segregation, suggesting that one or more functions of Ume6p are independent of Sin3-Rpd3 and Isw2 complexes. Chromatin structure analyses of two nonmeiotic genes reveals increased DNase I sensitivity within their regulatory regions in an isw2 mutant, as seen previously for one meiotic locus. These data suggest that the Isw2 complex functions at Ume6-dependent and -independent loci to create DNase I-inaccessible chromatin structure by regulating the positioning or placement of nucleosomes.
Collapse
|
9
|
Abstract
Experimental and epidemiological evidence suggests that lycopene, a predominant carotenoid found in human serum, may reduce the risk of certain cancers. We examined the association of dietary, physiological, and other factors with serum lycopene concentrations in a subsample of 946 postmenopausal women participating in the Women's Health Initiative. Pearson partial correlation coefficients and linear regression coefficients were calculated after adjustment for age, ethnicity, and serum low-density-lipoprotein (LDL) cholesterol. Serum lycopene was correlated with serum LDL cholesterol (r = 0.23) and dietary lycopene (r = 0.17, both p < 0.001). Individual food items found to be correlated with serum lycopene after adjustment included fresh tomatoes or tomato juice (r = 0.11), cooked tomatoes, tomato sauce, or salsa (r = 0.17), and spaghetti with meat sauce (r = 0.19, all p < 0.01). Age and body mass index were negatively associated with serum lycopene levels (both p < 0.001). Serum lycopene levels were highest in the summer and highest for those living in the northeastern United States. If we postulate that high serum lycopene levels reduce cancer risk, it becomes apparent that we have limited ability to detect this association from studies of lycopene intake. An understanding of factors associated with serum lycopene levels can be useful for the interpretation of studies of dietary lycopene and disease risk.
Collapse
|
10
|
Abstract
Bivariate survival data arise, for example, in twin studies and studies of both eyes or ears of the same individual. Often it is of interest to regress the survival times on a set of predictors. In this paper we extend Wei and Tanner's multiple imputation approach for linear regression with univariate censored data to bivariate censored data. We formulate a class of censored bivariate linear regression methods by iterating between the following two steps: 1. the data is augmented by imputing survival times for censored observations; 2. a linear model is fit to the imputed complete data. We consider three different methods to implement these two steps. In particular, the marginal (independence) approach ignores the possible correlation between two survival times when estimating the regression coefficient. To improve the efficiency, we propose two methods that account for the correlation between the survival times. First, we improve the efficiency by using generalized least squares regression in step 2. Second, instead of generating data from an estimate of the marginal distribution we generate data from a bivariate log-spline density estimate in step 1. Through simulation studies we find that the performance of the two methods that take the dependence into account is close and that they are both more efficient than the marginal approach. The methods are applied to a data set from an otitis media clinical trial.
Collapse
|
11
|
Abstract
We describe the development of a scoring function based on the decomposition P(structure/sequence) proportional to P(sequence/structure) *P(structure), which outperforms previous scoring functions in correctly identifying native-like protein structures in large ensembles of compact decoys. The first term captures sequence-dependent features of protein structures, such as the burial of hydrophobic residues in the core, the second term, universal sequence-independent features, such as the assembly of beta-strands into beta-sheets. The efficacies of a wide variety of sequence-dependent and sequence-independent features of protein structures for recognizing native-like structures were systematically evaluated using ensembles of approximately 30,000 compact conformations with fixed secondary structure for each of 17 small protein domains. The best results were obtained using a core scoring function with P(sequence/structure) parameterized similarly to our previous work (Simons et al., J Mol Biol 1997;268:209-225] and P(structure) focused on secondary structure packing preferences; while several additional features had some discriminatory power on their own, they did not provide any additional discriminatory power when combined with the core scoring function. Our results, on both the training set and the independent decoy set of Park and Levitt (J Mol Biol 1996;258:367-392), suggest that this scoring function should contribute to the prediction of tertiary structure from knowledge of sequence and secondary structure.
Collapse
|
12
|
Hazard regression with interval-censored data. Biometrics 1997; 53:1485-94. [PMID: 9423263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
In a recent paper, Kooperberg, Stone, and Truong (1995a) introduced hazard regression (HARE), in which linear splines and their tensor products are used to estimate the conditional log-hazard function based on possibly censored, positive response data and one or more covariates. Model selection is carried out in an adaptive fashion using maximum likelihood estimation of the unknown coefficients, Rao and Wald statistics to carry out stepwise addition and deletion of basis functions, and the Bayesian Information Criterion (BIC) to select the final model. In the present paper, the HARE methodology is extended to accommodate interval-censored data, time-dependent covariates, and cubic splines. The presence of interval-censored data means that the log-likelihood function may no longer be concave, presenting additional numerical challenges. The extended methodology is applied to a data set containing both interval-censoring and time-dependent covariates. The new software will be available in a future release of S-Plus.
Collapse
|
13
|
Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 1997; 268:209-25. [PMID: 9149153 DOI: 10.1006/jmbi.1997.0959] [Citation(s) in RCA: 950] [Impact Index Per Article: 35.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
We explore the ability of a simple simulated annealing procedure to assemble native-like structures from fragments of unrelated protein structures with similar local sequences using Bayesian scoring functions. Environment and residue pair specific contributions to the scoring functions appear as the first two terms in a series expansion for the residue probability distributions in the protein database; the decoupling of the distance and environment dependencies of the distributions resolves the major problems with current database-derived scoring functions noted by Thomas and Dill. The simulated annealing procedure rapidly and frequently generates native-like structures for small helical proteins and better than random structures for small beta sheet containing proteins. Most of the simulated structures have native-like solvent accessibility and secondary structure patterns, and thus ensembles of these structures provide a particularly challenging set of decoys for evaluating scoring functions. We investigate the effects of multiple sequence information and different types of conformational constraints on the overall performance of the method, and the ability of a variety of recently developed scoring functions to recognize the native-like conformations in the ensembles of simulated structures.
Collapse
|
14
|
Statistical modeling to predict elective surgery time. Comparison with a computer scheduling system and surgeon-provided estimates. Anesthesiology 1996; 85:1235-45. [PMID: 8968169 DOI: 10.1097/00000542-199612000-00003] [Citation(s) in RCA: 106] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
BACKGROUND Accurate estimation of operating times is a prerequisite for the efficient scheduling of the operating suite. The authors, in this study, sought to compare surgeons' time estimates for elective cases with those of commercial scheduling software, and to ascertain whether improvements could be made by regression modeling. METHODS The study was conducted at the University of Washington Medical Center in three phases. Phase 1 retrospectively reviewed surgeons' time estimates and the scheduling system's estimates throughout 1 yr. In phase 2, data were collected prospectively from participating surgeons by means of a data entry form completed at the time of scheduling elective cases. Data included the procedure code, estimated operating time, estimated case difficulty, and potential factors that might affect the duration. In phase 3, identical data were collected from five selected surgeons by personal interview. RESULTS In phase 1, 26 of 43 surgeons provided significantly better estimates than did the scheduling system (P < 0.01), and no surgeon was significantly worse, although the absolute errors were large (34% of 157 min average case length). In phase 2, modeling improved the accuracy of the surgeons' estimates by 11.5%, compared with the scheduling system. In phase 3, applying the model from phase 2 improved the accuracy of the surgeons' estimates by 18.2%. CONCLUSIONS Surgeons provide more accurate time estimates than does the scheduling software as it is used in our institution. Regression modeling effects modest improvements in accuracy. Further improvements would be likely if the hospital information system could provide timely historical data and feedback to the surgeons.
Collapse
|
15
|
Abstract
During the past few years several nonparametric alternatives to the Cox proportional hazards model have appeared in the literature. These methods extend techniques that are well known from regression analysis to the analysis of censored survival data. In this paper we discuss methods based on (partition) trees and (polynomial) splines, analyse two datasets using both Survival Trees and HARE, and compare the strengths and weaknesses of the two methods. One of the strengths of HARE is that its model fitting procedure has an implicit check for proportionality of the underlying hazards model. It also provides an explicit model for the conditional hazards function, which makes it very convenient to obtain graphical summaries. On the other hand, the tree-based methods automatically partition a dataset into groups of cases that are similar in survival history. Results obtained by survival trees and HARE are often complementary. Trees and splines in survival analysis should provide the data analyst with two useful tools when analysing survival data.
Collapse
|
16
|
Using logistic regression to estimate the adjusted attributable risk of low birthweight in an unmatched case-control study. Epidemiology 1991; 2:363-6. [PMID: 1742386 DOI: 10.1097/00001648-199109000-00009] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Other authors have shown how to estimate attributable risk based on stratification. In this paper, we show how to estimate adjusted attributable risks, standard errors, and confidence intervals from an unmatched case-control study that has population-based controls and uses the logistic regression model to estimate relative risk. We apply the method to data from a case-control study of low birthweight. The method is conceptually simple, has no assumptions beyond those of the logistic model, makes use of computer-intensive statistical techniques (the bootstrap), and extends to interactions. A Fortran computer program to carry out the computations is available from the authors upon request.
Collapse
|