1
|
Abstract
Statistical models are at the core of the genome-wide association study (GWAS). In this chapter, we provide an overview of single- and multilocus statistical models, Bayesian, and machine learning approaches for association studies in plants. These models are discussed based on their basic methodology, cofactors adjustment accounted for, statistical power and computational efficiency. New statistical models and machine learning algorithms are both showing improved performance in detecting missed signals, rare mutations and prioritizing causal genetic variants; nevertheless, further optimization and validation studies are required to maximize the power of GWAS.
Collapse
|
2
|
Penalized regression approaches to testing for quantitative trait-rare variant association. Front Genet 2014; 5:121. [PMID: 24860593 PMCID: PMC4026747 DOI: 10.3389/fgene.2014.00121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Accepted: 04/18/2014] [Indexed: 11/13/2022] Open
Abstract
In statistical data analysis, penalized regression is considered an attractive approach for its ability of simultaneous variable selection and parameter estimation. Although penalized regression methods have shown many advantages in variable selection and outcome prediction over other approaches for high-dimensional data, there is a relative paucity of the literature on their applications to hypothesis testing, e.g., in genetic association analysis. In this study, we apply several new penalized regression methods with a novel penalty, called Truncated L1 -penalty (TLP) (Shen et al., 2012), for either variable selection, or both variable selection and parameter grouping, in a data-adaptive way to test for association between a quantitative trait and a group of rare variants. The performance of the new methods are compared with some existing tests, including some recently proposed global tests and penalized regression-based methods, via simulations and an application to the real sequence data of the Genetic Analysis Workshop 17 (GAW17). Although our proposed penalized methods can improve over some existing penalized methods, often they do not outperform some existing global association tests. Some possible problems with utilizing penalized regression methods in genetic hypothesis testing are discussed. Given the capability of penalized regression in selecting causal variants and its sometimes promising performance, further studies are warranted.
Collapse
|
3
|
A rapid gene-based genome-wide association test with multivariate traits. Hum Hered 2013; 76:53-63. [PMID: 24247328 DOI: 10.1159/000356016] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Accepted: 09/26/2013] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES A gene-based genome-wide association study (GWAS) provides a powerful alternative to the traditional single single nucleotide polymorphism (SNP) association analysis due to its substantial reduction in the multiple testing burden and possible gain in power due to modeling multiple SNPs within a gene. A gene-based association analysis on multivariate traits is often of interest, but it imposes substantial analytical as well as computational challenges to implement it at a genome-wide level. METHODS We propose a rapid implementation of the multivariate multiple linear regression (RMMLR) approach in unrelated individuals as well as in families. Our approach allows for covariates. Moreover, the asymptotic distribution of the test statistic is not heavily influenced by the linkage disequilibrium (LD) among the SNPs and hence can be used efficiently to perform a gene-based GWAS. We have developed a corresponding R package to implement such multivariate gene-based GWAS with this RMMLR approach. RESULTS Through extensive simulation, we compared several approaches for both single and multivariate traits. Our RMMLR approach maintained a correct type I error level even for sets of SNPs in strong LD. It also demonstrated a substantial gain in power to detect a gene when it is associated with a subset of the traits. We also studied performances of the approaches on the Minnesota Center for Twin Family Research dataset. CONCLUSIONS In our overall comparison, our RMMLR approach provides an efficient and powerful tool to perform a gene-based GWAS with single or multivariate traits and maintains the type I error appropriately.
Collapse
|
4
|
Abstract
Genome wide association studies have been usually analyzed in a univariate manner. The commonly used univariate tests have one degree of freedom and assume an additive mode of inheritance. The experiment-wise significance of these univariate statistics is obtained by adjusting for multiple testing. Next generation sequencing studies, which assay 10-20 million variants, are beginning to come online. For these studies, the strategy of additive univariate testing and multiple testing adjustment is likely to result in a loss of power due to (1) the substantial multiple testing burden and (2) the possibility of a non-additive causal mode of inheritance. To reduce the power loss we propose: a new method (1) to summarize in a single statistic the strength of the association signals coming from all not-very-rare variants in a linkage disequilibrium block and (2) to incorporate, in any linkage disequilibrium block statistic, the strength of the association signals under multiple modes of inheritance. The proposed linkage disequilibrium block test consists of the sum of squares of nominally significant univariate statistics. We compare the performance of this method to the performance of existing linkage disequilibrium block/gene-based methods. Simulations show that (1) extending methods to combine testing for multiple modes of inheritance leads to substantial power gains, especially for a recessive mode of inheritance, and (2) the proposed method has a good overall performance. Based on simulation results, we provide practical advice on choosing suitable methods for applied analyses.
Collapse
|
5
|
Haplotype kernel association test as a powerful method to identify chromosomal regions harboring uncommon causal variants. Genet Epidemiol 2013; 37:560-70. [PMID: 23740760 DOI: 10.1002/gepi.21740] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2012] [Revised: 05/01/2013] [Accepted: 05/06/2013] [Indexed: 01/09/2023]
Abstract
For most complex diseases, the fraction of heritability that can be explained by the variants discovered from genome-wide association studies is minor. Although the so-called "rare variants" (minor allele frequency [MAF] < 1%) have attracted increasing attention, they are unlikely to account for much of the "missing heritability" because very few people may carry these rare variants. The genetic variants that are likely to fill in the "missing heritability" include uncommon causal variants (MAF < 5%), which are generally untyped in association studies using tagging single-nucleotide polymorphisms (SNPs) or commercial SNP arrays. Developing powerful statistical methods can help to identify chromosomal regions harboring uncommon causal variants, while bypassing the genome-wide or exome-wide next-generation sequencing. In this work, we propose a haplotype kernel association test (HKAT) that is equivalent to testing the variance component of random effects for distinct haplotypes. With an appropriate weighting scheme given to haplotypes, we can further enhance the ability of HKAT to detect uncommon causal variants. With scenarios simulated according to the population genetics theory, HKAT is shown to be a powerful method for detecting chromosomal regions harboring uncommon causal variants.
Collapse
|
6
|
Projection regression models for multivariate imaging phenotype. Genet Epidemiol 2012; 36:631-41. [PMID: 22807230 DOI: 10.1002/gepi.21658] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Revised: 04/02/2012] [Accepted: 05/29/2012] [Indexed: 11/09/2022]
Abstract
This paper presents a projection regression model (PRM) to assess the relationship between a multivariate phenotype and a set of covariates, such as a genetic marker, age, and gender. In the existing literature, a standard statistical approach to this problem is to fit a multivariate linear model to the multivariate phenotype and then use Hotelling's T(2) to test hypotheses of interest. An alternative approach is to fit a simple linear model and test hypotheses for each individual phenotype and then correct for multiplicity. However, even when the dimension of the multivariate phenotype is relatively small, say 5, such standard approaches can suffer from the issue of low statistical power in detecting the association between the multivariate phenotype and the covariates. The PRM generalizes a statistical method based on the principal component of heritability for association analysis in genetic studies of complex multivariate phenotypes. The key components of the PRM include an estimation procedure for extracting several principal directions of multivariate phenotypes relating to covariates and a test procedure based on wild-bootstrap method for testing the association between the weighted multivariate phenotype and explanatory variables. Simulation studies and an imaging genetic dataset are used to examine the finite sample performance of the PRM.
Collapse
|
7
|
Power of single- vs. multi-marker tests of association. Genet Epidemiol 2012; 36:480-7. [PMID: 22648939 PMCID: PMC3708310 DOI: 10.1002/gepi.21642] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Revised: 03/23/2012] [Accepted: 04/23/2012] [Indexed: 01/15/2023]
Abstract
Current genome-wide association studies still heavily rely on a single-marker strategy, in which each single nucleotide polymorphism (SNP) is tested individually for association with a phenotype. Although methods and software packages that consider multimarker models have become available, they have been slow to become widely adopted and their efficacy in real data analysis is often questioned. Based on conducting extensive simulations, here we endeavor to provide more insights into the performance of simple multimarker association tests as compared to single-marker tests. The results reveal the power advantage as well as disadvantage of the two- vs. the single-marker test. Power differentials depend on the correlation structure among tag SNPs, as well as that between tag SNPs and causal variants. A two-marker test has relatively better performance than single-marker tests when the correlation of the two adjacent markers is high. However, using HapMap data, two-marker tests tended to have a greater chance of being less powerful than single-marker tests, due to constraints on the number of actual possible haplotypes in the HapMap data. Yet, the average power difference was small whenever the one-marker test is more powerful, while there were many situations where the two-marker test can be much more powerful. These findings can be useful to guide analyses of future studies.
Collapse
|
8
|
Abstract
Testing multiple markers simultaneously not only can capture the linkage disequilibrium patterns but also can decrease the number of tests and thus alleviate the multiple-testing penalty. If a gene is associated with a phenotype, subjects with similar genotypes in this gene should also have similar phenotypes. Based on this concept, we have developed a general framework that is applicable to continuous traits. Two similarity-based tests (namely, SIMc and SIMp tests) were derived as special cases of the general framework. In our simulation study, we compared the power of the two tests with that of the single-marker analysis, a standard haplotype regression, and a popular and powerful kernel machine regression. Our SIMc test outperforms other tests when the average R(2) (a measure of linkage disequilibrium) between the causal variant and the surrounding markers is larger than 0.3 or when the causal allele is common (say, frequency = 0.3). Our SIMp test outperforms other tests when the causal variant was introduced at common haplotypes (the maximum frequency of risk haplotypes >0.4). We also applied our two tests to an adiposity data set to show their utility.
Collapse
|
9
|
Haplotype-based methods for detecting uncommon causal variants with common SNPs. Genet Epidemiol 2012; 36:572-82. [PMID: 22706849 DOI: 10.1002/gepi.21650] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2012] [Revised: 04/19/2012] [Accepted: 05/09/2012] [Indexed: 01/01/2023]
Abstract
Detecting uncommon causal variants (minor allele frequency [MAF] < 5%) is difficult with commercial single-nucleotide polymorphism (SNP) arrays that are designed to capture common variants (MAF > 5%). Haplotypes can provide insights into underlying linkage disequilibrium (LD) structure and can tag uncommon variants that are not well tagged by common variants. In this work, we propose a wei-SIMc-matching test that inversely weights haplotype similarities with the estimated standard deviation of haplotype counts to boost the power of similarity-based approaches for detecting uncommon causal variants. We then compare the power of the wei-SIMc-matching test with that of several popular haplotype-based tests, including four other similarity-based tests, a global score test for haplotypes (global), a test based on the maximum score statistic over all haplotypes (max), and two newly proposed haplotype-based tests for rare variant detection. With systematic simulations under a wide range of LD patterns, the results show that wei-SIMc-matching and global are the two most powerful tests. Among these two tests, wei-SIMc-matching has reliable asymptotic P-values, whereas global needs permutations to obtain reliable P-values when the frequencies of some haplotype categories are low or when the trait is skewed. Therefore, we recommend wei-SIMc-matching for detecting uncommon causal variants with surrounding common SNPs, in light of its power and computational feasibility.
Collapse
|
10
|
To stratify or not to stratify: power considerations for population-based genome-wide association studies of quantitative traits. Genet Epidemiol 2012; 35:867-79. [PMID: 22125224 DOI: 10.1002/gepi.20637] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Meta-analyses of genome-wide association studies require numerous study partners to conduct pre-defined analyses and thus simple but efficient analyses plans. Potential differences between strata (e.g. men and women) are usually ignored, but often the question arises whether stratified analyses help to unravel the genetics of a phenotype or if they unnecessarily increase the burden of analyses. To decide whether to stratify or not to stratify, we compare general analytical power computations for the overall analysis with those of stratified analyses considering quantitative trait analyses and two strata. We also relate the stratification problem to interaction modeling and exemplify theoretical considerations on obesity and renal function genetics. We demonstrate that the overall analyses have better power compared to stratified analyses as long as the signals are pronounced in both strata with consistent effect direction. Stratified analyses are advantageous in the case of signals with zero (or very small) effect in one stratum and for signals with opposite effect direction in the two strata. Applying the joint test for a main SNP effect and SNP-stratum interaction beats both overall and stratified analyses regarding power, but involves more complex models. In summary, we recommend to employ stratified analyses or the joint test to better understand the potential of strata-specific signals with opposite effect direction. Only after systematic genome-wide searches for opposite effect direction loci have been conducted, we will know if such signals exist and to what extent stratified analyses can depict loci that otherwise are missed.
Collapse
|
11
|
LAMC1 gene is associated with premature ovarian failure. Maturitas 2012; 71:402-6. [PMID: 22321639 DOI: 10.1016/j.maturitas.2012.01.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Revised: 01/12/2012] [Accepted: 01/13/2012] [Indexed: 01/18/2023]
Abstract
OBJECTIVES Common variations with modest effect in complex and polygenic disease such as premature ovarian failure (POF) can be detected by a genome wide association study. We performed a genome wide association study to identify predisposing genes associated with an increased risk of POF. STUDY DESIGN In stage I, genome wide association study was performed using 24 POF patients and 24 matched controls. A strongly associated region was re-tested to confirm the association with POF in stage II using 98 patients and 218 matched controls. RESULTS In the stage I, we found a strongly associated region that was located on chromosome 1q31 and encoded the laminin gamma 1 (LAMC1) gene. All 22 single nucleotide polymorphisms (SNPs) in the LAMC1 formed a linkage disequilibrium block and two haplotypes were significantly associated with POF. In the stage II, 14 SNPs, the majority of which were SNPs located in coding region and tagging SNPs, were genotyped. Distributions of 9 SNPs of them including one nonsynonymous SNP (rs20558) and one haplotype (HT1, C-C-T-G-C-C-A-T-T-C) were significantly higher in POF patients than in control group (86.6% and 74.5%, respectively, OR=2.209, CI: 1.139-4.284, P=0.017). CONCLUSIONS We showed for the first time that LAMC1 is significantly associated with POF, and specifically, possession of at least one HT1 was associated with susceptibility to POF. This result means that HT1 may co-exist with causative variant for susceptibility to POF in linkage disequilibrium and that the LAMC1 may be involved in POF pathogenesis.
Collapse
|
12
|
Permutation-based approaches do not adequately allow for linkage disequilibrium in gene-wide multi-locus association analysis. Eur J Hum Genet 2012; 20:890-6. [PMID: 22317971 DOI: 10.1038/ejhg.2012.8] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Additional information about risk genes or risk pathways for diseases can be extracted from genome-wide association studies through analyses of groups of markers. The most commonly employed approaches involve combining individual marker data by adding the test statistics, or summing the logarithms of their P-values, and then using permutation testing to derive empirical P-values that allow for the statistical dependence of single-marker tests arising from linkage disequilibrium (LD). In the present study, we use simulated data to show that these approaches fail to reflect the structure of the sampling error, and the effect of this is to give undue weight to correlated markers. We show that the results obtained are internally inconsistent in the presence of strong LD, and are externally inconsistent with the results derived from multi-locus analysis. We also show that the results obtained from regression and multivariate Hotelling T(2) (H-T2) testing, but not those obtained from permutations, are consistent with the theoretically expected distributions, and that the H-T2 test has greater power to detect gene-wide associations in real datasets. Finally, we show that while the results from permutation testing can be made to approximate those from regression and multivariate Hotelling T(2) testing through aggressive LD pruning of markers, this comes at the cost of loss of information. We conclude that when conducting multi-locus analyses of sets of single-nucleotide polymorphisms, regression or multivariate Hotelling T(2) testing, which give equivalent results, are preferable to the other more commonly applied approaches.
Collapse
|
13
|
A two-stage association study identifies methyl-CpG-binding domain protein 2 gene polymorphisms as candidates for breast cancer susceptibility. Eur J Hum Genet 2012; 20:682-9. [PMID: 22258532 PMCID: PMC3355265 DOI: 10.1038/ejhg.2011.273] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Genome-wide association studies for breast cancer have identified over 40 single-nucleotide polymorphisms (SNPs), a subset of which remains statistically significant after genome-wide correction. Improved strategies for mining of genome-wide association data have been suggested to address heritable component of genetic risk in breast cancer. In this study, we attempted a two-stage association design using markers from a genome-wide study (stage 1, Affymetrix Human SNP 6.0 array, cases=302, controls=321). We restricted our analysis to DNA repair/modifications/metabolism pathway related gene polymorphisms for their obvious role in carcinogenesis in general and for their known protein–protein interactions vis-à-vis, potential epistatic effects. We selected 22 SNPs based on linkage disequilibrium patterns and high statistical significance. Genotyping assays in an independent replication study of 1178 cases and 1314 controls were attempted using Sequenom iPLEX Gold platform (stage 2). Six SNPs (rs8094493, rs4041245, rs7614, rs13250873, rs1556459 and rs2297381) showed consistent and statistically significant associations with breast cancer risk in both stages, with allelic odds ratios (and P-values) of 0.85 (0.0021), 0.86 (0.0026), 0.86 (0.0041), 1.17 (0.0043), 1.20 (0.0103) and 1.13 (0.0154), respectively, in combined analysis (N=3115). Of these, three polymorphisms were located in methyl-CpG-binding domain protein 2 gene regions and were in strong linkage disequilibrium. The remaining three SNPs were in proximity to RAD21 homolog (S. pombe), O-6-methylguanine-DNA methyltransferase and RNA polymerase II-associated protein 1. The identified markers may be relevant to breast cancer susceptibility in populations if these findings are confirmed in independent cohorts.
Collapse
|
14
|
Copy number variants for schizophrenia and related psychotic disorders in Oceanic Palau: risk and transmission in extended pedigrees. Biol Psychiatry 2011; 70:1115-21. [PMID: 21982423 PMCID: PMC3224197 DOI: 10.1016/j.biopsych.2011.08.009] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/26/2011] [Revised: 07/08/2011] [Accepted: 08/02/2011] [Indexed: 12/31/2022]
Abstract
BACKGROUND We report on copy number variants (CNVs) found in Palauan subjects ascertained for schizophrenia and related psychotic disorders in extended pedigrees in Palau. We compare CNVs found in this Oceanic population with those seen in other samples, typically of European ancestry. Assessing CNVs in Palauan extended pedigrees yields insight into the evolution of risk CNVs, such as how they arise, are transmitted, and are lost from populations by stochastic or selective processes, none of which are easily measured from case-control samples. METHODS DNA samples from 197 subjects affected with schizophrenia and related psychotic disorders, 185 of their relatives, and 159 control subjects were successfully characterized for CNVs using Affymetrix Genomewide Human SNP Array 5.0. RESULTS Copy number variants thought to be associated with risk for schizophrenia and related disorders also occur in affected individuals in Palau, specifically 15q11.2 and 1q21.1 deletions, partial duplication of IL1RAPL1 (Xp21.3), and chromosome X duplications (Klinefelter's syndrome). Partial duplication within A2BP1 appears to convey an eightfold increased risk in male subjects (95% confidence interval, .8-84.4) but not female subjects (odds ratio = .4, 95% confidence interval, .03-4.9). Affected-only linkage analysis using this variant yields a logarithm of the odds score of 3.5. CONCLUSIONS This study reveals CNVs that confer risk to schizophrenia and related psychotic disorders in Palau, most of which have been previously observed in samples of European ancestry. Only a few of these CNVs show evidence that they have existed for many generations, consistent with risk variants diminishing reproductive success.
Collapse
|
15
|
Evaluation of an approximation method for assessment of overall significance of multiple-dependent tests in a genomewide association study. Genet Epidemiol 2011; 35:861-6. [PMID: 22006681 DOI: 10.1002/gepi.20636] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Revised: 09/07/2011] [Accepted: 09/07/2011] [Indexed: 11/07/2022]
Abstract
We describe implementation of a set-based method to assess the significance of findings from genomewide association study data. Our method, implemented in PLINK, is based on theoretical approximation of Fisher's statistics such that the combination of P-vales at a gene or across a pathway is carried out in a manner that accounts for the correlation structure, or linkage disequilibrium, between single nucleotide polymorphisms. We compare our method to a permutation-based product of P-values approach and show a typical correlation in excess of 0.98 for a number of comparisons. The method gives Type I error rates that are less than or equal to the corresponding nominal significance levels, making it robust to the effects of false positives. We show that in broadly similar populations, reference data sets of markers are an appropriate substrate for deriving marker-marker linkage disequilibrium (LD), negating the need to access individual level genotypes, greatly facilitating its generic applicability. We show that the method is thus robust to LD-associated bias and has equivalent performance to permutation-based methods, with a significantly shorter runtime. This is particularly relevant at a time of increasing public availability of significantly larger genetic data sets and should go a long way to assist in the rapid analysis of these data sets.
Collapse
|
16
|
Kernel machine SNP-set analysis for censored survival outcomes in genome-wide association studies. Genet Epidemiol 2011; 35:620-31. [PMID: 21818772 DOI: 10.1002/gepi.20610] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2010] [Revised: 05/06/2011] [Accepted: 06/03/2011] [Indexed: 02/01/2023]
Abstract
In this article, we develop a powerful test for identifying single nucleotide polymorphism (SNP)-sets that are predictive of survival with data from genome-wide association studies. We first group typed SNPs into SNP-sets based on genomic features and then apply a score test to assess the overall effect of each SNP-set on the survival outcome through a kernel machine Cox regression framework. This approach uses genetic information from all SNPs in the SNP-set simultaneously and accounts for linkage disequilibrium (LD), leading to a powerful test with reduced degrees of freedom when the typed SNPs are in LD with each other. This type of test also has the advantage of capturing the potentially nonlinear effects of the SNPs, SNP-SNP interactions (epistasis), and the joint effects of multiple causal variants. By simulating SNP data based on the LD structure of real genes from the HapMap project, we demonstrate that our proposed test is more powerful than the standard single SNP minimum P-value-based test for association studies with censored survival outcomes. We illustrate the proposed test with a real data application.
Collapse
|
17
|
Genetic analysis of vertebral trabecular bone density and cross-sectional area in older men. Osteoporos Int 2011; 22:1079-90. [PMID: 21153022 PMCID: PMC3691107 DOI: 10.1007/s00198-010-1296-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/06/2009] [Accepted: 04/13/2010] [Indexed: 12/21/2022]
Abstract
UNLABELLED We investigated 383 bone candidate genes for associations between single nucleotide polymorphisms and vertebral trabecular volumetric bone mineral density (vBMD) and cross-sectional area (CSA) in 2,018 Caucasian men aged ≥ 65 years. SNPs in TGFBR3, SOST, KL, CALCR, LEP, CSF1R, PTN, GNRH2, FGFR2, and MEPE were associated with vBMD and SNPs in CYP11B1, DVL2, DLX5, WNT4, and PAX7 were associated with CSA in independent study samples (p < 0.005). INRODUCTION Vertebral bone mineral density and cross-sectional area are important determinants of vertebral bone strength. Little is known about the specific genetic variants that influence these phenotypes in humans. METHODS We investigated the potential genetic variants associated with vertebral trabecular volumetric BMD and CSA measured by quantitative computed tomography. We initially tested for association between these phenotypes and 4608 tagging and potentially functional single nucleotide polymorphisms (SNPs) in 383 candidate genes in 862 community-dwelling Caucasian men aged ≥ 65 years in the Osteoporotic Fractures in Men Study. RESULTS SNP associations were then validated by genotyping an additional 1,156 randomly sampled men from the same cohort. We identified 11 SNPs in 10 genes (TGFBR3, SOST, KL, CALCR, LEP, CSF1R, PTN, GNRH2, FGFR2, and MEPE) that were consistently associated with trabecular vBMD and five SNPs in five genes (CYP11B1, DVL2, DLX5, WNT4, and PAX7) that were consistently associated with CSA in both samples (p < 0.005). CONCLUSION None of the SNPs associated with trabecular vBMD were associated with CSA. Our findings raise the possibility that at least some of the loci for vertebral trabecular BMD and bone size may be distinct.
Collapse
|
18
|
Multilocus association testing of quantitative traits based on partial least-squares analysis. PLoS One 2011; 6:e16739. [PMID: 21304821 PMCID: PMC3033421 DOI: 10.1371/journal.pone.0016739] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2010] [Accepted: 01/05/2011] [Indexed: 01/28/2023] Open
Abstract
Because of combining the genetic information of multiple loci, multilocus association studies (MLAS) are expected to be more powerful than single locus association studies (SLAS) in disease genes mapping. However, some researchers found that MLAS had similar or reduced power relative to SLAS, which was partly attributed to the increased degrees of freedom (dfs) in MLAS. Based on partial least-squares (PLS) analysis, we develop a MLAS approach, while avoiding large dfs in MLAS. In this approach, genotypes are first decomposed into the PLS components that not only capture majority of the genetic information of multiple loci, but also are relevant for target traits. The extracted PLS components are then regressed on target traits to detect association under multilinear regression. Simulation study based on real data from the HapMap project were used to assess the performance of our PLS-based MLAS as well as other popular multilinear regression-based MLAS approaches under various scenarios, considering genetic effects and linkage disequilibrium structure of candidate genetic regions. Using PLS-based MLAS approach, we conducted a genome-wide MLAS of lean body mass, and compared it with our previous genome-wide SLAS of lean body mass. Simulations and real data analyses results support the improved power of our PLS-based MLAS in disease genes mapping relative to other three MLAS approaches investigated in this study. We aim to provide an effective and powerful MLAS approach, which may help to overcome the limitations of SLAS in disease genes mapping.
Collapse
|
19
|
Powerful multi-marker association tests: unifying genomic distance-based regression and logistic regression. Genet Epidemiol 2011; 34:680-8. [PMID: 20976795 DOI: 10.1002/gepi.20529] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
To detect genetic association with common and complex diseases, many statistical tests have been proposed for candidate gene or genome-wide association studies with the case-control design. Due to linkage disequilibrium (LD), multi-marker association tests can gain power over single-marker tests with a Bonferroni multiple testing adjustment. Among many existing multi-marker association tests, most target to detect only one of many possible aspects in distributional differences between the genotypes of cases and controls, such as allele frequency differences, while a few new ones aim to target two or three aspects, all of which can be implemented in logistic regression. In contrast to logistic regression, a genomic distance-based regression (GDBR) approach aims to detect some high-order genotypic differences between cases and controls. A recent study has confirmed the high power of GDBR tests. At this moment, the popular logistic regression and the emerging GDBR approaches are completely unrelated; for example, one has to choose between the two. In this article, we reformulate GDBR as logistic regression, opening a venue to constructing other powerful tests while overcoming some limitations of GDBR. For example, asymptotic distributions can replace time-consuming permutations for deriving P-values and covariates, including gene-gene interactions, can be easily incorporated. Importantly, this reformulation facilitates combining GDBR with other existing methods in a unified framework of logistic regression. In particular, we show that Fisher's P-value combining method can boost statistical power by incorporating information from allele frequencies, Hardy-Weinberg disequilibrium, LD patterns, and other higher-order interactions among multi-markers as captured by GDBR.
Collapse
|
20
|
No association of psychosis in Alzheimer disease with neurodegenerative pathway genes. Neurobiol Aging 2010; 32:555.e9-11. [PMID: 21093110 DOI: 10.1016/j.neurobiolaging.2010.10.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2010] [Revised: 09/24/2010] [Accepted: 10/03/2010] [Indexed: 01/21/2023]
Abstract
Psychotic symptoms occur in approximately 40% of subjects with Alzheimer disease (AD with psychosis; AD + P) and identify a subgroup with more rapid cognitive decline. We evaluated in 867 AD subjects the association of AD + P with genes which may modify the pathological process via effects on the accumulation of amyloid beta (Aβ) protein and/or hyperphosphorylated microtubule-associated protein tau (MAPT): amyloid precursor protein (APP), beta-site amyloid precursor protein cleaving enzyme (BACE1), sortilin-related receptor (SORL1), and MAPT. Each gene was thoroughly interrogated with tag single-nucleotide polymorphisms (SNPs), and gene-based tests were used to enhance power. We found no association of these genes with AD + P.
Collapse
|
21
|
Discovering joint associations between disease and gene pairs with a novel similarity test. BMC Genet 2010; 11:86. [PMID: 20920333 PMCID: PMC2959050 DOI: 10.1186/1471-2156-11-86] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2010] [Accepted: 10/04/2010] [Indexed: 11/13/2022] Open
Abstract
Background Genes in a functional pathway can have complex interactions. A gene might activate or suppress another gene, so it is of interest to test joint associations of gene pairs. To simultaneously detect the joint association between disease and two genes (or two chromosomal regions), we propose a new test with the use of genomic similarities. Our test is designed to detect epistasis in the absence of main effects, main effects in the absence of epistasis, or the presence of both main effects and epistasis. Results The simulation results show that our similarity test with the matching measure is more powerful than the Pearson's χ2 test when the disease mutants were introduced at common haplotypes, but is less powerful when the disease mutants were introduced at rare haplotypes. Our similarity tests with the counting measures are more sensitive to marker informativity and linkage disequilibrium patterns, and thus are often inferior to the similarity test with the matching measure and the Pearson's χ2 test. Conclusions In detecting joint associations between disease and gene pairs, our similarity test is a complementary method to the Pearson's χ2 test.
Collapse
|
22
|
Pathway analysis comparison using Crohn's disease genome wide association studies. BMC Med Genomics 2010; 3:25. [PMID: 20584322 PMCID: PMC2908056 DOI: 10.1186/1755-8794-3-25] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2009] [Accepted: 06/28/2010] [Indexed: 02/03/2023] Open
Abstract
Background The use of biological annotation such as genes and pathways in the analysis of gene expression data has aided the identification of genes for follow-up studies and suggested functional information to uncharacterized genes. Several studies have applied similar methods to genome wide association studies and identified a number of disease related pathways. However, many questions remain on how to best approach this problem, such as whether there is a need to obtain a score to summarize association evidence at the gene level, and whether a pathway, dominated by just a few highly significant genes, is of interest. Methods We evaluated the performance of two pathway-based methods (Random Set, and Binomial approximation to the hypergeometric test) based on their applications to three data sets of Crohn's disease. We consider both the disease status as a phenotype as well as the residuals after conditioning on IL23R, a known Crohn's related gene, as a phenotype. Results Our results show that Random Set method has the most power to identify disease related pathways. We confirm previously reported disease related pathways and provide evidence for IL-2 Receptor Beta Chain in T cell Activation and IL-9 signaling as Crohn's disease associated pathways. Conclusions Our results highlight the need to apply powerful gene score methods prior to pathway enrichment tests, and that controlling for genes that attain genome wide significance enable further biological insight.
Collapse
|
23
|
Association tests using kernel-based measures of multi-locus genotype similarity between individuals. Genet Epidemiol 2010; 34:213-21. [PMID: 19697357 DOI: 10.1002/gepi.20451] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
In a genetic association study, it is often desirable to perform an overall test of whether any or all single-nucleotide polymorphisms (SNPs) in a gene are associated with a phenotype. Several such tests exist, but most of them are powerful only under very specific assumptions about the genetic effects of the individual SNPs. In addition, some of the existing tests assume that the direction of the effect of each SNP is known, which is a highly unlikely scenario. Here, we propose a new kernel-based association test of joint association of several SNPs. Our test is non-parametric and robust, and does not make any assumption about the directions of individual SNP effects. It can be used to test multiple correlated SNPs within a gene and can also be used to test independent SNPs or genes in a biological pathway. Our test uses an analysis of variance paradigm to compare variation between cases and controls to the variation within the groups. The variation is measured using kernel functions for each marker, and then a composite statistic is constructed to combine the markers into a single test. We present simulation results comparing our statistic to the U-statistic-based method by Schaid et al. ([2005] Am. J. Hum. Genet. 76:780-793) and another statistic by Wessel and Schork ([2006] Am. J. Hum. Genet. 79:792-806). We consider a variety of different disease models and assumptions about how many SNPs within the gene are actually associated with disease. Our results indicate that our statistic has higher power than other statistics under most realistic conditions.
Collapse
|
24
|
Abstract
GWAS have emerged as popular tools for identifying genetic variants that are associated with disease risk. Standard analysis of a case-control GWAS involves assessing the association between each individual genotyped SNP and disease risk. However, this approach suffers from limited reproducibility and difficulties in detecting multi-SNP and epistatic effects. As an alternative analytical strategy, we propose grouping SNPs together into SNP sets on the basis of proximity to genomic features such as genes or haplotype blocks, then testing the joint effect of each SNP set. Testing of each SNP set proceeds via the logistic kernel-machine-based test, which is based on a statistical framework that allows for flexible modeling of epistatic and nonlinear SNP effects. This flexibility and the ability to naturally adjust for covariate effects are important features of our test that make it appealing in comparison to individual SNP tests and existing multimarker tests. Using simulated data based on the International HapMap Project, we show that SNP-set testing can have improved power over standard individual-SNP analysis under a wide range of settings. In particular, we find that our approach has higher power than individual-SNP analysis when the median correlation between the disease-susceptibility variant and the genotyped SNPs is moderate to high. When the correlation is low, both individual-SNP analysis and the SNP-set analysis tend to have low power. We apply SNP-set analysis to analyze the Cancer Genetic Markers of Susceptibility (CGEMS) breast cancer GWAS discovery-phase data.
Collapse
|
25
|
Maternal serum 25-hydroxyvitamin D concentrations are associated with small-for-gestational age births in white women. J Nutr 2010; 140:999-1006. [PMID: 20200114 PMCID: PMC2855265 DOI: 10.3945/jn.109.119636] [Citation(s) in RCA: 216] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Maternal vitamin D deficiency has been associated with numerous adverse health outcomes, but its association with fetal growth restriction remains uncertain. We sought to elucidate the association between maternal serum 25-hydroxyvitamin D [25(OH)D] concentrations in early pregnancy and the risk of small-for-gestational age birth (SGA) and explore the association between maternal single nucleotide polymorphisms (SNP) in the vitamin D receptor (VDR) gene and the risk of SGA. We conducted a nested case-control study of nulliparous pregnant women with singleton pregnancies who delivered SGA infants (n = 77 white and n = 34 black) or non-SGA infants (n = 196 white and n = 105 black). Women were followed from <16 wk gestation to delivery. Women's banked sera at <22 wk were newly measured for 25(OH)D and DNA extracted for VDR genotyping. SGA was defined as live-born infants that were <10th percentile of birth weight according to nomograms based on gender and gestational age. After confounder adjustment, there was a U-shaped relation between serum 25(OH)D and risk of SGA among white mothers, with the lowest risk from 60 to 80 nmol/L. Compared with serum 25(OH)D 37.5-75 nmol/L, SGA odds ratios (95% CI) for levels <37.5 and >75 nmol/L were 7.5 (1.8, 31.9) and 2.1 (1.2, 3.8), respectively. There was no relation between 25(OH)D and SGA risk among black mothers. One SNP in the VDR gene among white women and 3 SNP in black women were significantly associated with SGA. Our results suggest that vitamin D has a complex relation with fetal growth that may vary by race.
Collapse
|
26
|
A data-adaptive sum test for disease association with multiple common or rare variants. Hum Hered 2010; 70:42-54. [PMID: 20413981 PMCID: PMC2912645 DOI: 10.1159/000288704] [Citation(s) in RCA: 241] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2009] [Accepted: 02/05/2010] [Indexed: 12/14/2022] Open
Abstract
Since associations between complex diseases and common variants are typically weak, and approaches to genotyping rare variants (e.g. by next-generation resequencing) multiply, there is an urgent demand to develop powerful association tests that are able to detect disease associations with both common and rare variants. In this article we present such a test. It is based on data-adaptive modifications to a so-called Sum test originally proposed for common variants, which aims to strike a balance between utilizing information on multiple markers in linkage disequilibrium and reducing the cost of large degrees of freedom or of multiple testing adjustment. When applied to multiple common or rare variants in a candidate region, the proposed test is easy to use with 1 degree of freedom and without the need for multiple testing adjustment. We show that the proposed test has high power across a wide range of scenarios with either common or rare variants, or both. In particular, in some situations the proposed test performs better than several commonly used methods.
Collapse
|
27
|
Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet Epidemiol 2010; 34:201-12. [PMID: 19810024 PMCID: PMC3158797 DOI: 10.1002/gepi.20448] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The joint use of information from multiple markers may be more effective to reveal association between a genomic region and a trait than single marker analysis. In this article, we compare the performance of seven multi-marker methods. These methods include (1) single marker analysis (either the best-scoring single nucleotide polymorphism in a candidate region or a combined test based on Fisher's method); (2) fixed effects regression models where the predictors are either the observed genotypes in the region, principal components that explain a proportion of the genetic variation, or predictors based on Fourier transformation for the genotypes; and (3) variance components analysis. In our simulation studies, we consider genetic models where the association is due to one, two, or three markers, and the disease-causing markers have varying allele frequencies. We use information from either all the markers in a region or information only from tagging markers. Our simulation results suggest that when there is one disease-causing variant, the best-scoring marker method is preferred whereas the variance components method and the principal components method work well for more common disease-causing variants. When there is more than one disease-causing variant, the principal components method seems to perform well over all the scenarios studied. When these methods are applied to analyze associations between all the markers in or near a gene and disease status for an inflammatory bowel disease data set, the analysis based on the principal components method leads to biologically more consistent discoveries than other methods.
Collapse
|
28
|
Abstract
In contrast to conventional dual-energy X-ray absorptiometry, quantitative computed tomography separately measures trabecular and cortical volumetric bone mineral density (vBMD). Little is known about the genetic variants associated with trabecular and cortical vBMD in humans, although both may be important for determining bone strength and osteoporotic risk. In the current analysis, we tested the hypothesis that there are genetic variants associated with trabecular and cortical vBMD at the femoral neck by genotyping 4608 tagging and potentially functional single-nucleotide polymorphisms (SNPs) in 383 bone metabolism candidate genes in 822 Caucasian men aged 65 years or older from the Osteoporotic Fractures in Men Study (MrOS). Promising SNP associations then were tested for replication in an additional 1155 men from the same study. We identified SNPs in five genes (IFNAR2, NFATC1, SMAD1, HOXA, and KLF10) that were robustly associated with cortical vBMD and SNPs in nine genes (APC, ATF2, BMP3, BMP7, FGF18, FLT1, TGFB3, THRB, and RUNX1) that were robustly associated with trabecular vBMD. There was no overlap between genes associated with cortical vBMD and trabecular vBMD. These findings identify novel genetic variants for cortical and trabecular vBMD and raise the possibility that some genetic loci may be unique for each bone compartment.
Collapse
|
29
|
Power analysis of principal components regression in genetic association studies. J Zhejiang Univ Sci B 2010; 10:721-30. [PMID: 19816996 DOI: 10.1631/jzus.b0830866] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Association analysis provides an opportunity to find genetic variants underlying complex traits. A principal components regression (PCR)-based approach was shown to outperform some competing approaches. However, a limitation of this method is that the principal components (PCs) selected from single nucleotide polymorphisms (SNPs) may be unrelated to the phenotype. In this article, we investigate the theoretical properties of such a method in more detail. We first derive the exact power function of the test based on PCR, and hence clarify the relationship between the test power and the degrees of freedom (DF). Next, we extend the PCR test to a general weighted PCs test, which provides a unified framework for understanding the properties of some related statistics. We then compare the performance of these tests. We also introduce several data-driven adaptive alternatives to overcome difficulties in the PCR approach. Finally, we illustrate our results using simulations based on real genotype data. Simulation study shows the risk of using the unsupervised rule to determine the number of PCs, and demonstrates that there is no single uniformly powerful method for detecting genetic variants.
Collapse
|
30
|
Test selection with application to detecting disease association with multiple SNPs. Hum Hered 2009; 69:120-30. [PMID: 19996609 DOI: 10.1159/000264449] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2009] [Accepted: 07/08/2009] [Indexed: 11/19/2022] Open
Abstract
We consider the motivating problem of testing for association between a phenotype and multiple single nucleotide polymorphisms (SNPs) within a candidate gene or region. Various statistical approaches have been proposed, including those based on either (combining univariate) single-locus analyses or (multivariate) multilocus analyses. However, it is known in theory that there is no single uniformly most powerful test to detect association with multiple SNPs. On the other hand, several tests have been shown to be among frequent winners across a range of practical situations, but the identity of the most powerful one changes with the situation in an unknown way. Here we propose a novel test selection procedure to select from five such tests: a so-called UminP test that combines multiple univariate/single-locus score tests by taking the minimum of their p values as its test statistic, a multivariate score test and its two modifications, and a so-called sum test. We also illustrate its application to selecting genotype codings for the sum test since the performance of the sum test depends on its genotype coding in an unknown way. Our major contributions include the methodology of estimating the power of a given test with a given dataset and the idea of using the estimated power as the criterion for test selection. We also propose a fast simulation-based method to calculate p values for the test selection procedure and for any method of combining p values. Our numerical results indicated that the proposed test selection procedure always yielded power close to the most powerful test among the candidate tests at any given situation, and in particular, our proposed test selection performed either better than or as well as the popular combining method of taking the minimum p value of the candidate tests.
Collapse
|
31
|
Statistical tests of genetic association in the presence of gene-gene and gene-environment interactions. Hum Hered 2009; 69:131-42. [PMID: 19996610 DOI: 10.1159/000264450] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 07/27/2009] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND While its importance is well recognized, it remains challenging to test genetic association in the presence of gene-gene (or gene-environment) interactions. A major technical difficulty lies in the fact that a general model of gene-gene interactions calls for the use of often a large number of parameters, leading to possibly reduced statistical power. An emerging theme of some recent work is to reduce the number of such parameters through dimension reduction. Wang et al. [2009] proposed such an approach based on the partial least squares (PLS) for dimension reduction. They compared their method with several others using simulated data, establishing that their PLS test performed best. Unfortunately, Wang et al. did not include in their evaluations several powerful tests just recently discovered for analyzing multiple SNPs in a candidate gene or region. METHODS In this paper, we first extend these tests to the current context to detect gene-gene interactions in the presence of nuisance parameters, then compare these tests with the PLS test using the simulated data of Wang et al. [2009]. RESULTS It is confirmed that some other tests can be more powerful than the PLS test, though there is no uniform winner. Some interesting, albeit not new, observations are also made: some of the new tests are more robust to the large number of parameters in a model and may thus perform well; on the other hand, even for a purely epistatic genetic model, some of the tests applied to a logistic main-effects model without any interaction terms may be superior to that based on a full model that explicitly accounts for gene-gene interactions. CONCLUSION The proposed statistical tests are potentially useful in practice.
Collapse
|
32
|
High-density association study of 383 candidate genes for volumetric BMD at the femoral neck and lumbar spine among older men. J Bone Miner Res 2009; 24:2039-49. [PMID: 19453261 PMCID: PMC2791518 DOI: 10.1359/jbmr.090524] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Genetics is a well-established but poorly understood determinant of BMD. Whereas some genetic variants may influence BMD throughout the body, others may be skeletal site specific. We initially screened for associations between 4608 tagging and potentially functional single nucleotide polymorphisms (SNPs) in 383 candidate genes and femoral neck and lumbar spine volumetric BMD (vBMD) measured from QCT scans among 862 community-dwelling white men >or=65 yr of age in the Osteoporotic Fractures in Men Study (MrOS). The most promising SNP associations (p < 0.01) were validated by genotyping an additional 1156 white men from MrOS. This analysis identified 8 SNPs in 6 genes (APC, DMP1, FGFR2, FLT1, HOXA, and PTN) that were associated with femoral neck vBMD and 13 SNPs in 7 genes (APC, BMPR1B, FOXC2, HOXA, IGFBP2, NFATC1, and SOST) that were associated with lumbar spine vBMD in both genotyping samples (p < 0.05). Although most associations were specific to one skeletal site, SNPs in the APC and HOXA gene regions were associated with both femoral neck and lumbar spine BMD. This analysis identifies several novel and robust genetic associations for volumetric BMD, and these findings in combination with other data suggest the presence of genetic loci for volumetric BMD that are at least to some extent skeletal-site specific.
Collapse
|
33
|
Molecular variation in neuropeptide Y and bone mineral density among men of African ancestry. Calcif Tissue Int 2009; 85:507-13. [PMID: 19865784 PMCID: PMC4905686 DOI: 10.1007/s00223-009-9307-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2009] [Accepted: 10/04/2009] [Indexed: 01/09/2023]
Abstract
Neuropeptide Y (NPY) is a physiological candidate gene for the regulation of body weight and has more recently been implicated in regulating bone mass. The current study sought to test if inherited variation in NPY might influence BMD in a population of African-ancestry men who have high bone mineral density (BMD). We genotyped 17 tagging single-nucleotide polymorphisms (SNPs) across the NPY gene region in 1,113 randomly selected men of African ancestry aged >or=40 years and tested for association with anthropometric characteristics and proximal femur BMD. The homozygous rare genotype of four SNPs was associated with a 0.92-1.59% decrease in stature (corrected P < 0.05). No SNP was associated with body mass index or body weight. Two SNPs in a 5-kb linkage disequilibrium block encompassing exons 3 and 4 were associated with proximal femur BMD, adjusted for age, body weight, and height (corrected P < 0.05). These results suggest that genetic variation at the NPY locus may contribute to bone density, independently of body weight.
Collapse
|
34
|
Abstract
OBJECTIVE Published studies suggest associations between circadian gene polymorphisms and bipolar I disorder (BPI), as well as schizoaffective disorder (SZA) and schizophrenia (SZ). The results are plausible, based on prior studies of circadian abnormalities. As replications have not been attempted uniformly, we evaluated representative, common polymorphisms in all three disorders. METHODS We assayed 276 publicly available 'tag' single nucleotide polymorphisms (SNPs) at 21 circadian genes among 523 patients with BPI, 527 patients with SZ/SZA, and 477 screened adult controls. Detected associations were evaluated in relation to two published genome-wide association studies (GWAS). RESULTS Using gene-based tests, suggestive associations were noted between EGR3 and BPI (p = 0.017), and between NPAS2 and SZ/SZA (p = 0.034). Three SNPs were associated with both sets of disorders (NPAS2: rs13025524 and rs11123857; RORB: rs10491929; p < 0.05). None of the associations remained significant following corrections for multiple comparisons. Approximately 15% of the analyzed SNPs overlapped with an independent study that conducted GWAS for BPI; suggestive overlap between the GWAS analyses and ours was noted at ARNTL. CONCLUSIONS Several suggestive, novel associations were detected with circadian genes and BPI and SZ/SZA, but the present analyses do not support associations with common polymorphisms that confer risk with odds ratios greater than 1.5. Additional analyses using adequately powered samples are warranted to further evaluate these results.
Collapse
|
35
|
Abstract
We consider detecting associations between a trait and multiple single nucleotide polymorphisms (SNPs) in linkage disequilibrium (LD). To maximize the use of information contained in multiple SNPs while minimizing the cost of large degrees of freedom (DF) in testing multiple parameters, we first theoretically explore the sum test derived under a working assumption of a common association strength between the trait and each SNP, testing on the corresponding parameter with only one DF. Under the scenarios that the association strengths between the trait and the SNPs are close to each other (and in the same direction), as considered by Wang and Elston [Am. J. Hum. Genet. [2007] 80:353-360], we show with simulated data that the sum test was powerful as compared to several existing tests; otherwise, the sum test might have much reduced power. To overcome the limitation of the sum test, based on our theoretical analysis of the sum test, we propose five new tests that are closely related to each other and are shown to consistently perform similarly well across a wide range of scenarios. We point out the close connection of the proposed tests to the Goeman test. Furthermore, we derive the asymptotic distributions of the proposed tests so that P-values can be easily calculated, in contrast to the use of computationally demanding permutations or simulations for the Goeman test. A distinguishing feature of the five new tests is their use of a diagonal working covariance matrix, rather than a full covariance matrix as used in the usual Wald or score test. We recommend the routine use of two of the new tests, along with several other tests, to detect disease associations with multiple linked SNPs.
Collapse
|
36
|
Regression-based approach for testing the association between multi-region haplotype configuration and complex trait. BMC Genet 2009; 10:56. [PMID: 19761592 PMCID: PMC2760580 DOI: 10.1186/1471-2156-10-56] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2009] [Accepted: 09/17/2009] [Indexed: 11/10/2022] Open
Abstract
Background It is quite common that the genetic architecture of complex traits involves many genes and their interactions. Therefore, dealing with multiple unlinked genomic regions simultaneously is desirable. Results In this paper we develop a regression-based approach to assess the interactions of haplotypes that belong to different unlinked regions, and we use score statistics to test the null hypothesis of non-genetic association. Additionally, multiple marker combinations at each unlinked region are considered. The multiple tests are settled via the minP approach. The P value of the "best" multi-region multi-marker configuration is corrected via Monte-Carlo simulations. Through simulation studies, we assess the performance of the proposed approach and demonstrate its validity and power in testing for haplotype interaction association. Conclusion Our simulations showed that, for binary trait without covariates, our proposed methods prove to be equal and even more powerful than htr and hapcc which are part of the FAMHAP program. Additionally, our model can be applied to a wider variety of traits and allow adjustment for other covariates. To test the validity, our methods are applied to analyze the association between four unlinked candidate genes and pig meat quality.
Collapse
|
37
|
Improving power in genetic-association studies via wavelet transformation. BMC Genet 2009; 10:53. [PMID: 19747393 PMCID: PMC2759953 DOI: 10.1186/1471-2156-10-53] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2009] [Accepted: 09/11/2009] [Indexed: 12/02/2022] Open
Abstract
Background A key to increasing the power of multilocus association tests is to reduce the number of degrees of freedom by suppressing noise from data. One of the difficulties is to decide how much noise to suppress. An often overlooked problem is that commonly used association tests based on genotype data cannot utilize the genetic information contained in spatial ordering of SNPs (see proof in the Appendix), which may prevent them from achieving higher power. Results We develop a score test based on wavelet transform with empirical Bayesian thresholding. Extensive simulation studies are carried out under various LD structures as well as using HapMap data from many different chromosomes for both qualitative and quantitative traits. Simulation results show that the proposed test automatically adjusts the level of noise suppression according to LD structures, and it is able to consistently achieve higher or similar powers than many commonly used association tests including the principle component regression method (PCReg). Conclusion The wavelet-based score test automatically suppresses the right amount of noise and uses the information contained in spatial ordering of SNPs to achieve higher power.
Collapse
|
38
|
SHARE: an adaptive algorithm to select the most informative set of SNPs for candidate genetic association. Biostatistics 2009; 10:680-93. [PMID: 19605740 PMCID: PMC2742496 DOI: 10.1093/biostatistics/kxp023] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Association studies have been widely used to identify genetic liability variants for complex diseases. While scanning the chromosomal region 1 single nucleotide polymorphism (SNP) at a time may not fully explore linkage disequilibrium, haplotype analyses tend to require a fairly large number of parameters, thus potentially losing power. Clustering algorithms, such as the cladistic approach, have been proposed to reduce the dimensionality, yet they have important limitations. We propose a SNP-Haplotype Adaptive REgression (SHARE) algorithm that seeks the most informative set of SNPs for genetic association in a targeted candidate region by growing and shrinking haplotypes with 1 more or less SNP in a stepwise fashion, and comparing prediction errors of different models via cross-validation. Depending on the evolutionary history of the disease mutations and the markers, this set may contain a single SNP or several SNPs that lay a foundation for haplotype analyses. Haplotype phase ambiguity is effectively accounted for by treating haplotype reconstruction as a part of the learning procedure. Simulations and a data application show that our method has improved power over existing methodologies and that the results are informative in the search for disease-causal loci.
Collapse
|
39
|
Association of the cannabinoid receptor gene (CNR1) with ADHD and post-traumatic stress disorder. Am J Med Genet B Neuropsychiatr Genet 2008; 147B:1488-94. [PMID: 18213623 PMCID: PMC2685476 DOI: 10.1002/ajmg.b.30693] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Attention deficit hyperactivity disorder (ADHD) is a highly heritable disorder affecting some 5-10% of children and 4-5% of adults. The cannabinoid receptor gene (CNR1) is a positional candidate gene due to its location near an identified ADHD linkage peak on chromosome 6, its role in stress and dopamine regulation, its association with other psychiatric disorders that co-occur with ADHD, and its function in learning and memory. We tested SNP variants at the CNR1 gene in two independent samples-an unselected adolescent sample from Northern Finland, and a family-based sample of trios (an ADHD child and their parents). In addition to using the trios for association study, the parents (with and without ADHD) were used as an additional case/control sample of adults for association tests. ADHD and its co-morbid psychiatric disorders were examined. A significant association was detected for a SNP haplotype (C-G) with ADHD (P = 0.008). A sex by genotype interaction was observed as well with this haplotype posing a greater risk in males than females. An association of an alternative SNP haplotype in this gene was found for post-traumatic stress disorder (PTSD) (P = 0.04 for C-A, and P = 0.01 for C-G). These observations require replication, however, they suggest that the CNR1 gene may be a risk factor for ADHD and possibly PTSD, and that this gene warrants further investigation for a role in neuropsychiatric disorders.
Collapse
|
40
|
|
41
|
Network-based model weighting to detect multiple loci influencing complex diseases. Hum Genet 2008; 124:225-34. [PMID: 18719944 DOI: 10.1007/s00439-008-0545-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2008] [Accepted: 08/12/2008] [Indexed: 01/20/2023]
Abstract
For genome-wide association studies, it has been increasingly recognized that the popular locus-by-locus search for DNA variants associated with disease susceptibility may not be effective, especially when there are interactions between or among multiple loci, for which a multi-loci search strategy may be more productive. However, even if computationally feasible, a genome-wide search over all possible multiple loci requires exploring a huge model space and making costly adjustment for multiple testing, leading to reduced statistical power. On the other hand, there are accumulating data suggesting that protein products of many disease-causing genes tend to interact with each other, or cluster in the same biological pathway. To incorporate this prior knowledge and existing data on gene networks, we propose a gene network-based method to improve statistical power over that of the exhaustive search by giving higher weights to models involving genes nearby in a network. We use simulated data under realistic scenarios, including a large-scale human protein-protein interaction network and 23 known ataxia-causing genes, to demonstrate potential gain by our proposed method when disease-genes are clustered in a network.
Collapse
|
42
|
Association of candidate genes with antisocial drug dependence in adolescents. Drug Alcohol Depend 2008; 96:90-8. [PMID: 18384978 PMCID: PMC2574676 DOI: 10.1016/j.drugalcdep.2008.02.004] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/07/2007] [Revised: 02/01/2008] [Accepted: 02/05/2008] [Indexed: 11/28/2022]
Abstract
The Colorado Center For Antisocial Drug Dependence (CADD) is using several research designs and strategies in its study of the genetic basis for antisocial drug dependence in adolescents. This study reports single nucleotide polymorphism (SNP) association results from a targeted gene assay (SNP chip) of 231 primarily Caucasian male probands in treatment with antisocial drug dependence and a matched set of community controls. The SNP chip was designed to assay 1500 SNPs distributed across 50 candidate genes that have had associations with substance use disorders and conduct disorder. There was an average gene-wide inter-SNP interval of 3000 base pairs. After eliminating SNPs with poor signals and low minor allele frequencies, 60 nominally significant associations were found among the remaining 1073 SNPs in 18 of 49 candidate genes. Although none of the SNPs achieved genome-wide association significance levels (defined as p<.000001), two genes probed with multiple SNPs (OPRM1 and CHRNA2) emerged as plausible candidates for a role in antisocial drug dependence after gene-based permutation tests. The custom-designed SNP chip served as an effective and flexible platform for rapid interrogation of a large number of plausible candidate genes.
Collapse
|
43
|
Abstract
With the rapid development of modern genotyping technology, it is becoming commonplace to genotype densely spaced genetic markers such as single nucleotide polymorphisms (SNPs) along the genome. This development has inspired a strong interest in using multiple markers located in the target region for the detection of association. We introduce a principal components (PCs) regression method for candidate gene association studies where multiple SNPs from the candidate region tend to be correlated. In this approach, the total variance in the original genotype scores is decomposed into parts that correspond to uncorrelated PCs. The PCs with the largest variances are then used as regressors in a multiple regression. Simulation studies suggest that this approach can have higher power than some popular methods. An application to CHI3L2 gene expression data confirms a significant association between CHI3L2 gene expression level and SNPs from this gene that has been previously reported by others.
Collapse
|
44
|
Comprehensive evaluation of positional candidates in the IL-18 pathway reveals suggestive associations with schizophrenia and herpes virus seropositivity. Am J Med Genet B Neuropsychiatr Genet 2008; 147:343-50. [PMID: 18092318 DOI: 10.1002/ajmg.b.30603] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Interactions between genetic variation and environmental factors have been invoked in schizophrenia genesis, but pathways linking them are uncertain. We used a pathway-oriented approach to evaluate six genes mediating IL18 function (IL-18, IL18BP, IL18R1, IL18RAP, IL12B, and IL12A). The first five are also localized to regions previously linked with schizophrenia. Fifty-four representative tag SNPs were selected from comprehensive sequence data and genotyped in 478 patients with schizophrenia/schizoaffective disorder (DSM IV criteria) and 501 unscreened control individuals. Exposure to three herpes viruses previously suggested as risk factors for schizophrenia was estimated simultaneously among the cases. Five SNPs in four genes were associated with schizophrenia, most prominently rs2272127 at IL18RAP (P = 0.0007, odds ratio for C allele 1.49, 95% CI: 1.18-1.87; P = 0.03 following correction for multiple comparisons). Exploratory analysis revealed that rs2272127 was also associated with herpes simplex virus 1 (HSV1) seropositivity in cases (P = 0.04, OR for G allele 1.58, 95% CI: 1.04-2.39). Similar patterns were observed at another correlated SNP (rs11465702, P = 0.005 and 0.006, respectively for associations with schizophrenia and HSV1 seropositivity). We suggest plausible, testable hypotheses linking IL-18 signaling and HSV1 in schizophrenia pathogenesis.
Collapse
|
45
|
FBAT-SNP-PC: an approach for multiple markers and single trait in family-based association tests. Hum Hered 2008; 66:122-6. [PMID: 18382091 DOI: 10.1159/000119111] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
Abstract
OBJECTIVE Develop a new test for family-based association studies and continuous traits that incorporates power- enhancing techniques from two existing testing strategies. METHODS The new procedure initiates with an extraction of the relevant information from the variability of the genotypes and an assessment of the approximate individual markers effects and their directions. This information is incorporated in the construction of the actual test statistic through a selection of a data-determined number of optimal linear combinations of the offspring genotypes which, in a power enhancing step, are consequently combined into a single degree of freedom test. We conduct a comparison simulation study in which the performance of the new test is contrasted with the test that is currently known to offer the highest overall power, FBAT-LC. RESULTS The new test has an overall performance very similar to that of FBAT-LC but attains higher power in candidate genes with lower average pairwise correlations and moderate to high allele frequencies with large gains (up to 80%) for some of the analyzed genes possessing the above-mentioned characteristics. CONCLUSION The new test is a promising tool for candidate gene studies with substantial power gains for genes that are characterized by SNPs with low mean pairwise correlation.
Collapse
|
46
|
A powerful and flexible multilocus association test for quantitative traits. Am J Hum Genet 2008; 82:386-97. [PMID: 18252219 DOI: 10.1016/j.ajhg.2007.10.010] [Citation(s) in RCA: 195] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2007] [Revised: 10/04/2007] [Accepted: 10/16/2007] [Indexed: 01/01/2023] Open
Abstract
Association mapping of complex traits typically employs tagSNP genotype data to identify a trait locus within a region of interest. However, considerable debate exists regarding the most powerful strategy for utilizing such tagSNP data for inference. A popular approach tests each tagSNP within the region individually, but such tests could lose power as a result of incomplete linkage disequilibrium between the genotyped tagSNP and the trait locus. Alternatively, one can jointly test all tagSNPs simultaneously within the region (by using genotypes or haplotypes), but such multivariate tests have large degrees of freedom that can also compromise power. Here, we consider a semiparametric model for quantitative-trait mapping that uses genetic information from multiple tagSNPs simultaneously in analysis but produces a test statistic with reduced degrees of freedom compared to existing multivariate approaches. We fit this model by using a dimension-reducing technique called least-squares kernel machines, which we show is identical to analysis using a specific linear mixed model (which we can fit by using standard software packages like SAS and R). Using simulated SNP data based on real data from the International HapMap Project, we demonstrate that our approach often has superior performance for association mapping of quantitative traits compared to the popular approach of single-tagSNP testing. Our approach is also flexible, because it allows easy modeling of covariates and, if interest exists, high-dimensional interactions among tagSNPs and environmental predictors.
Collapse
|
47
|
RGS4 polymorphisms predict clinical manifestations and responses to risperidone treatment in patients with schizophrenia. J Clin Psychopharmacol 2008; 28:64-8. [PMID: 18204343 DOI: 10.1097/jcp.0b013e3181603f5a] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVE Polymorphisms of the gene encoding the regulator of G-protein signaling subtype 4 (RGS4) are associated with schizophrenia. This study aims to investigate the association of 4 RGS4 polymorphisms (single nucleotide polymorphisms [SNPs] 1, 4, 7, and 18), implicated in previous studies, with baseline symptoms and treatment response to risperidone in patients with schizophrenia. METHODS One hundred twenty patients with acutely exacerbated schizophrenia who had never been treated by atypical antipsychotics were recruited. They received optimal treatment of risperidone for up to 42 days in the inpatient research unit. Patients' social functions were monitored by Nurses' Observation Scale for Inpatients Evaluation and clinical manifestations, by Positive and Negative Syndrome Scale. RESULTS At baseline status, the A/A genotype at SNP7 of RGS4 was associated with poorer social function when compared with the G/G genotype. After risperidone treatment, the A/A genotype at SNP1 was associated with greater improvement at social function, and the A/A genotype at SNP18 was associated with greater improvement at social function, Positive and Negative Syndrome Scale total score, and positive- and negative-symptom subscale. CONCLUSIONS These findings suggest that RGS4 variances influence clinical manifestations of schizophrenia as well as the treatment response to risperidone, suggesting that RGS4 plays a role in the fundamental process of disease pathophysiology.
Collapse
|
48
|
An association study of RGS4 polymorphisms with clinical phenotypes of schizophrenia in a Chinese population. Am J Med Genet B Neuropsychiatr Genet 2008; 147B:77-85. [PMID: 17722013 DOI: 10.1002/ajmg.b.30577] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The regulator of G-protein signaling 4 (RGS4) has been suggested as a candidate gene for schizophrenia. However, following an initial positive report, subsequent association studies between RGS4 and schizophrenia have yielded inconclusive results. Also, few studies have investigated the association of RGS4 polymorphisms with the phenotypic subgroups of schizophrenia. To further clarify the role of RGS4 in this disease, we performed a case-control study (504 cases and 531 controls of Han Chinese descent) to examine the association of RGS4 with schizophrenia and with clinical and neurocognitive profiles. The four markers (SNPs 1, 4, 7, and 18) implicated in the original association study were genotyped. We detected significant association of four-marker haplotypes with schizophrenia (UNPHASED: global P = 0.037; PHASE: global P = 0.048). The haplotype G-G-G-G, which was implicated in at least three previous studies, was the major risk haplotype (UNPHASED: P = 0.019; PHASE: P = 0.010). Regarding the clinical phenotypes, the Wechsler Adult Intelligence Test (WAIS) information subtest score was associated with SNP4 genotypes (P = 0.001). PANSS total and global psychopathology scores were also associated with SNP4, but may not reliably reflect the general severity of disease as the scores may be affected by confounders like medication response. Our study provides further support for a role of RGS4 in the pathogenesis of schizophrenia. We identified G-G-G-G as the risk haplotype in our Chinese sample. The association with information subtest score suggests an effect of RGS4 on premorbid functioning, which may be related to neurodevelopmental processes. Further independent studies are required to verify our findings.
Collapse
|
49
|
Abstract
Association methods based on linkage disequilibrium (LD) offer a promising approach for detecting genetic variations that are responsible for complex human diseases. Although methods based on individual single nucleotide polymorphisms (SNPs) may lead to significant findings, methods based on haplotypes comprising multiple SNPs on the same inherited chromosome may provide additional power for mapping disease genes and also provide insight on factors influencing the dependency among genetic markers. Such insights may provide information essential for understanding human evolution and also for identifying cis-interactions between two or more causal variants. Because obtaining haplotype information directly from experiments can be cost prohibitive in most studies, especially in large scale studies, haplotype analysis presents many unique challenges. In this chapter, we focus on two main issues: haplotype inference and haplotype-association analysis. We first provide a detailed review of methods for haplotype inference using unrelated individuals as well as related individuals from pedigrees. We then cover a number of statistical methods that employ haplotype information in association analysis. In addition, we discuss the advantages and limitations of different methods.
Collapse
|
50
|
Haplotype-based association analysis via variance-components score test. Am J Hum Genet 2007; 81:927-38. [PMID: 17924336 PMCID: PMC2265651 DOI: 10.1086/521558] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2007] [Accepted: 07/11/2007] [Indexed: 12/14/2022] Open
Abstract
Haplotypes provide a more informative format of polymorphisms for genetic association analysis than do individual single-nucleotide polymorphisms. However, the practical efficacy of haplotype-based association analysis is challenged by a trade-off between the benefits of modeling abundant variation and the cost of the extra degrees of freedom. To reduce the degrees of freedom, several strategies have been considered in the literature. They include (1) clustering evolutionarily close haplotypes, (2) modeling the level of haplotype sharing, and (3) smoothing haplotype effects by introducing a correlation structure for haplotype effects and studying the variance components (VC) for association. Although the first two strategies enjoy a fair extent of power gain, empirical evidence showed that VC methods may exhibit only similar or less power than the standard haplotype regression method, even in cases of many haplotypes. In this study, we report possible reasons that cause the underpowered phenomenon and show how the power of the VC strategy can be improved. We construct a score test based on the restricted maximum likelihood or the marginal likelihood function of the VC and identify its nontypical limiting distribution. Through simulation, we demonstrate the validity of the test and investigate the power performance of the VC approach and that of the standard haplotype regression approach. With suitable choices for the correlation structure, the proposed method can be directly applied to unphased genotypic data. Our method is applicable to a wide-ranging class of models and is computationally efficient and easy to implement. The broad coverage and the fast and easy implementation of this method make the VC strategy an effective tool for haplotype analysis, even in modern genomewide association studies.
Collapse
|