1
|
Im C, Sapkota Y, Moon W, Kawashima M, Nakamura M, Tokunaga K, Yasui Y. Genome-wide haplotype association analysis of primary biliary cholangitis risk in Japanese. Sci Rep 2018; 8:7806. [PMID: 29773854 PMCID: PMC5958065 DOI: 10.1038/s41598-018-26112-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 04/30/2018] [Indexed: 12/16/2022] Open
Abstract
Primary biliary cholangitis (PBC) susceptibility loci have largely been discovered through single SNP association testing. In this study, we report genic haplotype patterns associated with PBC risk genome-wide in two Japanese cohorts. Among the 74 genic PBC risk haplotype candidates we detected with a novel methodological approach in a discovery cohort of 1,937 Japanese, nearly two-thirds were replicated (49 haplotypes, Bonferroni-corrected P < 6.8 × 10-4) in an independent Japanese cohort (N = 949). Along with corroborating known PBC-associated loci (TNFSF15, HLA-DRA), risk haplotypes may potentially model cis-interactions that regulate gene expression. For example, one replicated haplotype association (9q32-9q33.1, OR = 1.7, P = 3.0 × 10-21) consists of intergenic SNPs outside of the human leukocyte antigen (HLA) region that overlap regulatory histone mark peaks in liver and blood cells, and are significantly associated with TNFSF8 expression in whole blood. We also replicated a novel haplotype association involving non-HLA SNPs mapped to UMAD1 (7p21.3; OR = 15.2, P = 3.9 × 10-9) that overlap enhancer peaks in liver and memory Th cells. Our analysis demonstrates the utility of haplotype association analyses in discovering and characterizing PBC susceptibility loci.
Collapse
Affiliation(s)
- Cindy Im
- School of Public Health, University of Alberta, Edmonton, Alberta, T6G 1C9, Canada.
| | - Yadav Sapkota
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Wonjong Moon
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Minae Kawashima
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Minoru Nakamura
- Department of Hepatology, Nagasaki University Graduate School of Biomedical Sciences and Clinical Research Center, National Hospital Organization Nagasaki Medical Center, Omura, Nagasaki, 856-8562, Japan
| | - Katsushi Tokunaga
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Yutaka Yasui
- School of Public Health, University of Alberta, Edmonton, Alberta, T6G 1C9, Canada. .,Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
| |
Collapse
|
2
|
Abstract
Human genetic research in the past decade has generated a wealth of data from the genome-wide association scan era, much of which is catalogued and freely available. These data will typically test the relationship between a single nucleotide variant or polymorphism (SNP) and some outcome, disease, or trait. Ongoing investigations will yield a similar wealth of data regarding epigenetic phenomena. These data will typically test the relationship between DNA methylation at a single genomic location/region and some outcome. Most of these findings will be the result of cross-sectional investigations typically using ascertained cases and controls. Consequently, most methodological consideration focuses on methods appropriate for simple case-control comparisons. It is expected that a growing number of investigators with longitudinal experimental prevention or intervention cohorts will also measure genetic and epigenetic indicators as part of their investigations, harvesting the wealth of information generated by the genome-wide association study (GWAS) era to allow for targeted hypothesis testing in the next generation of prevention and intervention trials. Herein, we discuss appropriate quality control and statistical modelling of genetic, polygenic, and epigenetic measures in longitudinal models. We specifically discuss quality control, population stratification, genotype imputation, pathway approaches, and proper modelling of an interaction between a specific genetic variant and an environment variable (GxE interaction).
Collapse
Affiliation(s)
- Shawn J Latendresse
- Department of Psychology and Neuroscience, Baylor University, One Bear Place #97334, Waco, TX, 76798, USA.
| | - Rashelle Musci
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, 624 N. Broadway Ave, Baltimore, MD, 21205, USA
| | - Brion S Maher
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, 624 N. Broadway Ave, Baltimore, MD, 21205, USA.
| |
Collapse
|
3
|
Abstract
Haplotype analysis forms the basis of much of genetic association analysis using both related and unrelated individuals (we concentrate on unrelated). For example, haplotype analysis indirectly underlies the SNP imputation methods that are used for testing trait associations with known but unmeasured variants and for performing collaborative post-GWAS meta-analysis. This chapter is focused on the direct use of haplotypes in association testing. It reviews the rationale for haplotype-based association testing, discusses statistical issues related to haplotype uncertainty that affect the analysis, then gives practical guidance for testing haplotype-based associations with phenotype or outcome trait, first of candidate gene regions and then for the genome as a whole. Haplotypes are interesting for two reasons, first they may be in closer LD with a causal variant than any single measured SNP, and therefore may enhance the coverage value of the genotypes over single SNP analysis. Second, haplotypes may themselves be the causal variants of interest and some solid examples of this have appeared in the literature.This chapter discusses three possible approaches to incorporation of SNP haplotype analysis into generalized linear regression models: (1) a simple substitution method involving imputed haplotypes, (2) simultaneous maximum likelihood (ML) estimation of all parameters, including haplotype frequencies and regression parameters, and (3) a simplified approximation to full ML for case-control data.Examples of the various approaches for a haplotype analysis of a candidate gene are provided. We compare the behavior of the approximation-based methods and argue that in most instances the simpler methods hold up well in practice. We also describe the practical implementation of haplotype risk estimation genome-wide and discuss several shortcuts that can be used to speed up otherwise potentially very intensive computational requirements.
Collapse
Affiliation(s)
- Daniel O Stram
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, 1540 Alcazar Street, Los Angeles, CA, 90032, USA.
| |
Collapse
|
4
|
Datta AS, Biswas S. Comparison of haplotype-based statistical tests for disease association with rare and common variants. Brief Bioinform 2015; 17:657-71. [PMID: 26338417 DOI: 10.1093/bib/bbv072] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Indexed: 01/26/2023] Open
Abstract
Recent literature has highlighted the advantages of haplotype association methods for detecting rare variants associated with common diseases. As several new haplotype association methods have been proposed in the past few years, a comparison of new and standard methods is important and timely for guidance to the practitioners. We consider nine methods-Haplo.score, Haplo.glm, Hapassoc, Bayesian hierarchical Generalized Linear Model (BhGLM), Logistic Bayesian LASSO (LBL), regularized GLM (rGLM), Haplotype Kernel Association Test, wei-SIMc-matching and Weighted Haplotype and Imputation-based Tests. These can be divided into two types-individual haplotype-specific tests and global tests depending on whether there is just one overall test for a haplotype region (global) or there is an individual test for each haplotype in the region. Haplo.score is the only method that tests for both; Haplo.glm, Hapassoc, BhGLM and LBL are individual haplotype-specific, while the rest are global tests. For comparison, we also apply a popular collapsing method-Sequence Kernel Association Test (SKAT) and its two variants-SKAT-O (Optimal) and SKAT-C (Combined). We carry out an extensive comparison on our simulated data sets as well as on the Genetic Analysis Workshop (GAW) 18 simulated data. Further, we apply the methods to GAW18 real hypertension data and Dallas Heart Study sequence data. We find that LBL, Haplo.score (global test) and rGLM perform well over the scenarios considered here. Also, haplotype methods are more powerful (albeit more computationally intensive) than SKAT and its variants in scenarios where multiple causal variants act interactively to produce haplotype effects.
Collapse
|
5
|
Saravani R, Hasanian-Langroudi F, Validad MH, Yari D, Bahari G, Faramarzi M, Khateri M, Bahadoram S. Evaluation of possible relationship between COL4A4 gene polymorphisms and risk of keratoconus. Cornea. 2015;34:318-322. [PMID: 25651396 DOI: 10.1097/ico.0000000000000356] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
PURPOSE Keratoconus (KC) is a genetically heterogeneous corneal dystrophy with unknown etiology that causes loss of visual acuity. Evidence has shown that corneas from patients with KC contain reduced amounts of total collagen proteins, and collagen type IV has been suggested as a candidate gene in KC pathogenesis. This study aimed to evaluate the possible associations between collagen type IV alpha-4 chain (COL4A4) polymorphisms (rs2229813 G/A, M1327V and rs2228555 A/G, V1516V) and susceptibility to KC. METHODS A total of 262 Iranian subjects including 112 patients with KC and 150 healthy individuals as controls were recruited in this case-control study. Diagnosis was based on clinical examination, electronic refractometry, and keratometry. Genotyping for the COL4A4 rs2229813 and rs2228555 variants was executed using allele-specific polymerase chain reaction and Tetra-ARMS polymerase chain reaction, respectively. RESULTS A significant difference was found between the 2 groups regarding allelic and genotyping distribution of COL4A4 polymorphism at position rs2229813 G>A. The COL4A4 rs2229813 AA and GA+AA genotypes were risk factors for developing KC (odds ratio [OR] = 2.1, P = 0.036 and OR = 1.7, P = 0.042, for the AA and GA+AA genotypes, respectively). The COL4A4 rs2229813 A allele was also associated with an increased risk for KC (OR = 1.5, 95% confidence intervals: 1.1-2.2, P = 0.018). However, in our study, we found no association between COL4A4 rs2228555 polymorphism and the risk of KC. CONCLUSIONS We suggest that the COL4A4 rs2229813 AA and GA+AA genotypes as well as the A allele play roles as risk factors for developing KC in our population.
Collapse
|
6
|
Neely ML, Bondell HD, Tzeng JY. A penalized likelihood approach for investigating gene-drug interactions in pharmacogenetic studies. Biometrics 2015; 71:529-37. [PMID: 25604216 DOI: 10.1111/biom.12259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2013] [Revised: 09/01/2014] [Accepted: 09/01/2014] [Indexed: 11/28/2022]
Abstract
Pharmacogenetics investigates the relationship between heritable genetic variation and the variation in how individuals respond to drug therapies. Often, gene-drug interactions play a primary role in this response, and identifying these effects can aid in the development of individualized treatment regimes. Haplotypes can hold key information in understanding the association between genetic variation and drug response. However, the standard approach for haplotype-based association analysis does not directly address the research questions dictated by individualized medicine. A complementary post-hoc analysis is required, and this post-hoc analysis is usually under powered after adjusting for multiple comparisons and may lead to seemingly contradictory conclusions. In this work, we propose a penalized likelihood approach that is able to overcome the drawbacks of the standard approach and yield the desired personalized output. We demonstrate the utility of our method by applying it to the Scottish Randomized Trial in Ovarian Cancer. We also conducted simulation studies and showed that the proposed penalized method has comparable or more power than the standard approach and maintains low Type I error rates for both binary and quantitative drug responses. The largest performance gains are seen when the haplotype frequency is low, the difference in effect sizes are small, or the true relationship among the drugs is more complex.
Collapse
Affiliation(s)
- Megan L Neely
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, 27705, U.S.A
| | - Howard D Bondell
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, 27695, U.S.A
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, 27695, U.S.A.,Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, 27695, U.S.A
| |
Collapse
|
7
|
Abstract
Rapidly improving sequencing technologies provide unprecedented opportunities for analyzing genome-wide patterns of polymorphisms. In particular, they have great potential for linkage-disequilibrium analyses on both global and local genetic scales, which will substantially improve our ability to derive evolutionary inferences. However, there are some difficulties with analyzing high-throughput sequencing data, including high error rates associated with base reads and complications from the random sampling of sequenced chromosomes in diploid organisms. To overcome these difficulties, we developed a maximum-likelihood estimator of linkage disequilibrium for use with error-prone sampling data. Computer simulations indicate that the estimator is nearly unbiased with a sampling variance at high coverage asymptotically approaching the value expected when all relevant information is accurately estimated. The estimator does not require phasing of haplotypes and enables the estimation of linkage disequilibrium even when all individual reads cover just single polymorphic sites.
Collapse
|
8
|
Hasanian-Langroudi F, Saravani R, Validad MH, Bahari G, Yari D. Association of Lysyl oxidase (LOX) Polymorphisms with the Risk of Keratoconus in an Iranian Population. Ophthalmic Genet 2014; 36:309-14. [PMID: 24502826 DOI: 10.3109/13816810.2014.881507] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
BACKGROUND Keratoconus is a connective tissue-related eye disease with unknown etiology that causes the loss of visual acuity. Lysyl oxidase (LOX) is an amine oxidase that catalyzes the covalent cross-link of collagens and elastin in the extracellular environment, thus determining the mechanical properties of connective tissue. The current study aimed to investigate the possible associations between two LOX polymorphisms, rs1800449 and rs2288393, and susceptibility to keratoconus. METHODS A total of 262 Iranian subjects including 112 patients with keratoconus and 150 healthy individuals as controls were recruited. Genotyping for the LOX variants was performed using allele-specific PCR. RESULTS A significant difference was found between two groups regarding allelic and genotyping distribution of LOX polymorphism at position rs1800449 G>A. The frequency of AA and GA + AA genotypes were increased in patients compared to controls (17% versus 8% and 62.5% versus 50%, respectively), showing a statistically significant difference (OR = 2.827, 95% CI: 1.251-6.391, p = 0.012). The A allele was associated with an increased risk for keratoconus, with the frequency of 39.9% and 29% in patients and controls, respectively (OR = 1.614, 95% CI: 1.119-2.326, p = 0.011). Furthermore, the haplotype analysis revealed that the rs1800449G/rs2288393C is a protective factor against keratoconus (OR = 0.425, 95% CI = 0.296-0.609, p = 0.001). Conversely, the +473A/rs2288393C (OR = 3.703, 95% CI = 2.230-6.149, p = 0.001) and +473G/rs2288393G (OR = 15.48, 95% CI = 3.805-63.03, p = 0.001) haplotypes were identified as risk factors for keratoconus. CONCLUSION Our study demonstrated that the LOX rs1800449 genotypes (AA and GA + AA) and allele (A) appears to confer risk for susceptibility to keratoconus.
Collapse
Affiliation(s)
| | - Ramin Saravani
- a Cellular and Molecular Research Center .,b Department of Clinical Biochemistry , School of Medicine , and
| | - Mohammad-Hosein Validad
- c Department of Ophthalmology , Alzahra Eye Hospital, Zahedan University of Medical Sciences , Zahedan , Iran
| | | | - Davood Yari
- b Department of Clinical Biochemistry , School of Medicine , and
| |
Collapse
|
9
|
Burkett KM, Greenwood CMT, McNeney B, Graham J. Gene genealogies for genetic association mapping, with application to Crohn's disease. Front Genet 2013; 4:260. [PMID: 24348515 PMCID: PMC3845011 DOI: 10.3389/fgene.2013.00260] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 11/12/2013] [Indexed: 11/30/2022] Open
Abstract
A gene genealogy describes relationships among haplotypes sampled from a population. Knowledge of the gene genealogy for a set of haplotypes is useful for estimation of population genetic parameters and it also has potential application in finding disease-predisposing genetic variants. As the true gene genealogy is unknown, Markov chain Monte Carlo (MCMC) approaches have been used to sample genealogies conditional on data at multiple genetic markers. We previously implemented an MCMC algorithm to sample from an approximation to the distribution of the gene genealogy conditional on haplotype data. Our approach samples ancestral trees, recombination and mutation rates at a genomic focal point. In this work, we describe how our sampler can be used to find disease-predisposing genetic variants in samples of cases and controls. We use a tree-based association statistic that quantifies the degree to which case haplotypes are more closely related to each other around the focal point than control haplotypes, without relying on a disease model. As the ancestral tree is a latent variable, so is the tree-based association statistic. We show how the sampler can be used to estimate the posterior distribution of the latent test statistic and corresponding latent p-values, which together comprise a fuzzy p-value. We illustrate the approach on a publicly-available dataset from a study of Crohn's disease that consists of genotypes at multiple SNP markers in a small genomic region. We estimate the posterior distribution of the tree-based association statistic and the recombination rate at multiple focal points in the region. Reassuringly, the posterior mean recombination rates estimated at the different focal points are consistent with previously published estimates. The tree-based association approach finds multiple sub-regions where the case haplotypes are more genetically related than the control haplotypes, and that there may be one or multiple disease-predisposing loci.
Collapse
Affiliation(s)
- Kelly M Burkett
- Department of Statistics and Actuarial Science, Simon Fraser University Burnaby, BC, Canada ; Department of Epidemiology, Biostatistics and Occupational Health, McGill University Montreal, QC, Canada
| | - Celia M T Greenwood
- Department of Oncology, Department of Epidemiology, Biostatistics and Occupational Health, and Division of Cancer Epidemiology, McGill University Montreal, QC, Canada ; Lady Davis Institute for Medical Research, Jewish General Hospital Montreal, QC, Canada
| | - Brad McNeney
- Department of Statistics and Actuarial Science, Simon Fraser University Burnaby, BC, Canada
| | - Jinko Graham
- Department of Statistics and Actuarial Science, Simon Fraser University Burnaby, BC, Canada
| |
Collapse
|
10
|
Burkett KM, McNeney B, Graham J. Markov chain Monte Carlo sampling of gene genealogies conditional on unphased SNP genotype data. Stat Appl Genet Mol Biol 2013; 12:559-81. [PMID: 23962961 DOI: 10.1515/sagmb-2012-0011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The gene genealogy is a tree describing the ancestral relationships among genes sampled from unrelated individuals. Knowledge of the tree is useful for inference of population-genetic parameters and has potential application in gene-mapping. Markov chain Monte Carlo approaches that sample genealogies conditional on observed genetic data typically assume that haplotype data are observed even though commonly-used genotyping technologies provide only unphased genotype data. We have extended our haplotype-based genealogy sampler, sampletrees, to handle unphased genotype data. We use the sampled haplotype configurations as a diagnostic for adequate sampling of the tree space based on the reasoning that if haplotype sampling is restricted, sampling from the tree space will also be restricted. We compare the distributions of sampled haplotypes across multiple runs of sampletrees, and to those estimated by the phase inference program, PHASE. Performance was excellent for the majority of individuals as shown by the consistency of results across multiple runs. However, for some individuals in some datasets, sampletrees had problems sampling haplotype configurations; longer run lengths would be required for these datasets. For many datasets though, we expect that sampletrees will be useful for sampling from the posterior distribution of gene genealogies given unphased genotype data.
Collapse
|
11
|
Eskandari-Nasab E, Moghadampour M, Asadi-Saghandi A, Kharazi-Nejad E, Rezaeifar A, Pourmasoumi H. Levels of interleukin-(IL)-12p40 are markedly increased in Brucellosis among patients with specific IL-12B genotypes. Scand J Immunol 2013; 78:85-91. [PMID: 23578145 DOI: 10.1111/sji.12054] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 04/02/2013] [Indexed: 02/04/2023]
Abstract
Brucellosis remains a major zoonosis worldwide. Brucella antigens induce the production of T-helper 1 (Th1) cytokines such as interleukin-12 (IL-12) in humans. We aimed to investigate the association of two single nucleotide polymorphisms (SNPs) in the gene encoding the IL-12p40 cytokine (IL-12B) with brucellosis and to examine the functionality of these SNPs through measuring serum levels of IL-12p40. We genotyped IL-12B gene rs3212227, A>C; rs6887695 G>C polymorphisms in a case-control study on a total of 281 subjects including 153 patients with active brucellosis and 128 healthy controls, using RFLP and serum IL-12p40 levels, were assessed by ELISA. The rs3212227 minor allele (C) and homozygote genotype (CC) were more frequent in controls compared with patients with brucellosis (P = 0.006, OR = 0.608, 95%CI = 0.429-0.861 for the C allele; P = 0.024, OR = 0.443, 95% CI: 0.218-0.900 for the CC genotype). Comparison of IL-12B genotypes and serum levels of the IL-12p40 revealed that rs3212227 AA genotype, with higher frequency in patients than in controls, was associated with increased levels of the cytokine (P = 0.0001). Furthermore, the distribution of haplotype and genotype combinations in our study suggested that rs3212227C/rs6887695C haplotype or CC/GC or CC/CC genotype combinations may protect controls against Brucella infection by contributing to a functional downregulation of the serum IL-12p40 production in vivo, as shown by ELISA (P < 0.05). Overall, our study demonstrated that rs3212227 A variant was associated with higher levels of serum IL-12p40 and could possibly contribute to an inherited predisposition to brucellosis.
Collapse
Affiliation(s)
- E Eskandari-Nasab
- Infectious Diseases and Tropical Medicine Research Center, Zahedan University of Medical Sciences, Zahedan, Iran
| | | | | | | | | | | |
Collapse
|
12
|
Abstract
Haplotypes contain genealogical information and play a prominent part in population genetic and evolutionary studies. However, haplotype inference is a complex statistical problem, showing considerable internal algorithm variability and among-algorithm discordance. Thus, haplotypes inferred by statistical algorithms often contain hidden uncertainties, which may complicate and even mislead downstream analysis. Consensus strategy is one of the effective means to increase the confidence of inferred haplotypes. Here, we present a consensus tool, the CVhaplot package, to automate consensus techniques for haplotype inference. It generates consensus haplotypes from inferrals of competing algorithms to increase the confidence of haplotype inference results, while improving the performance of individual algorithms by considering their internal variability. It can effectively identify uncertain haplotypes potentially associated with inference errors. In addition, this tool allows file format conversion for several popular algorithms and extends the applicability of some algorithms to complex data containing triallelic polymorphic sites. CVhaplot is written in PERL and freely available at http://www.ioz.ac.cn/department/agripest/group/zhangdx/CVhaplot.htm.
Collapse
Affiliation(s)
- Zu-Shi Huang
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China Center for Computational and Evolutionary Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | | |
Collapse
|
13
|
Rezaeifar A, Eskandari-Nasab E, Moghadampour M, Kharazi-Nejad E, Hasani SSA, Asadi-Saghandi A, Hadadi-Fishani M, Sepanjnia A, Sadeghi-Kalani B. The association of interleukin-18 promoter polymorphisms and serum levels with duodenal ulcer, and their correlations with bacterial CagA and VacA virulence factors. ACTA ACUST UNITED AC 2013; 45:584-92. [PMID: 23746337 DOI: 10.3109/00365548.2013.794301] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
BACKGROUND We analyzed the impact of interleukin (IL)-18 promoter polymorphisms on IL-18 serum levels in Helicobacter pylori-infected duodenal ulcer (DU) patients and healthy asymptomatic (AS) carriers. We also aimed to determine the association of the H. pylori virulence factors CagA and VacA antibodies with serum concentrations of IL-18 in order to elucidate any correlation between them. METHODS Three groups of patients were enrolled: DU patients (67 individuals), AS carriers (48 individuals), and H. pylori-negative subjects (26 individuals). Serum concentrations of IL-18 were determined by ELISA. Patient sera were tested by Western blot method to determine the presence of serum antibodies to bacterial CagA and VacA. Genotyping of IL-18 promoter polymorphisms at positions - 137G/C and - 607C/A were performed by allele-specific primer PCR protocol. RESULTS Our study revealed that serum IL-18 levels are positively influenced by CagA-positive H. pylori strains, so that maximum levels of IL-18 were detected in DU patients with the CagA(+) phenotype, regardless of the presence of the anti-VacA antibody. Regarding IL-18 promoter polymorphisms, the AA genotype and A allele at position - 607C/A were found to be significantly lower in DU patients than in AS carriers and H. pylori-negative subjects (p = 0.032 and 0.043, respectively). CONCLUSIONS The IL-18 - 607C variant was associated with higher levels of serum IL-18 and an increased risk of DU. Moreover, our findings indicated that serum concentrations of IL-18 were influenced by CagA factor, irrespective of the VacA status, suggesting that high levels of IL-18 in CagA-positive subjects predisposes to susceptibility to DU.
Collapse
Affiliation(s)
- Alireza Rezaeifar
- Department of Clinical Biochemistry, School of Medicine, Zabol University of Medical Sciences, Zabol, Iran
| | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Lin WY, Yi N, Lou XY, Zhi D, Zhang K, Gao G, Tiwari HK, Liu N. Haplotype kernel association test as a powerful method to identify chromosomal regions harboring uncommon causal variants. Genet Epidemiol 2013; 37:560-70. [PMID: 23740760 DOI: 10.1002/gepi.21740] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2012] [Revised: 05/01/2013] [Accepted: 05/06/2013] [Indexed: 01/09/2023]
Abstract
For most complex diseases, the fraction of heritability that can be explained by the variants discovered from genome-wide association studies is minor. Although the so-called "rare variants" (minor allele frequency [MAF] < 1%) have attracted increasing attention, they are unlikely to account for much of the "missing heritability" because very few people may carry these rare variants. The genetic variants that are likely to fill in the "missing heritability" include uncommon causal variants (MAF < 5%), which are generally untyped in association studies using tagging single-nucleotide polymorphisms (SNPs) or commercial SNP arrays. Developing powerful statistical methods can help to identify chromosomal regions harboring uncommon causal variants, while bypassing the genome-wide or exome-wide next-generation sequencing. In this work, we propose a haplotype kernel association test (HKAT) that is equivalent to testing the variance component of random effects for distinct haplotypes. With an appropriate weighting scheme given to haplotypes, we can further enhance the ability of HKAT to detect uncommon causal variants. With scenarios simulated according to the population genetics theory, HKAT is shown to be a powerful method for detecting chromosomal regions harboring uncommon causal variants.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | | | | | | | | | | | | | | |
Collapse
|
15
|
Metzger BPH, Gelembiuk GW, Lee CE. Direct sequencing of haplotypes from diploid individuals through a modified emulsion PCR-based single-molecule sequencing approach. Mol Ecol Resour 2013; 13:135-43. [PMID: 23231626 DOI: 10.1111/1755-0998.12034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Revised: 10/08/2012] [Accepted: 10/11/2012] [Indexed: 11/30/2022]
Abstract
While standard DNA-sequencing approaches readily yield genotypic sequence data, haplotype information is often of greater utility for population genetic analyses. However, obtaining individual haplotype sequences can be costly and time-consuming and sometimes requires statistical reconstruction approaches that are subject to bias and error. Advancements have recently been made in determining individual chromosomal sequences in large-scale genomic studies, yet few options exist for obtaining this information from large numbers of highly polymorphic individuals in a cost-effective manner. As a solution, we developed a simple PCR-based method for obtaining sequence information from individual DNA strands using standard laboratory equipment. The method employs a water-in-oil emulsion to separate the PCR mixture into thousands of individual microreactors. PCR within these small vesicles results in amplification from only a single starting DNA template molecule and thus a single haplotype. We improved upon previous approaches by including SYBR Green I and a melted agarose solution in the PCR, allowing easy identification and separation of individually amplified DNA molecules. We demonstrate the use of this method on a highly polymorphic estuarine population of the copepod Eurytemora affinis for which current molecular and computational methods for haplotype determination have been inadequate.
Collapse
|
16
|
Lin WY, Yi N, Zhi D, Zhang K, Gao G, Tiwari HK, Liu N. Haplotype-based methods for detecting uncommon causal variants with common SNPs. Genet Epidemiol 2012; 36:572-82. [PMID: 22706849 DOI: 10.1002/gepi.21650] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2012] [Revised: 04/19/2012] [Accepted: 05/09/2012] [Indexed: 01/01/2023]
Abstract
Detecting uncommon causal variants (minor allele frequency [MAF] < 5%) is difficult with commercial single-nucleotide polymorphism (SNP) arrays that are designed to capture common variants (MAF > 5%). Haplotypes can provide insights into underlying linkage disequilibrium (LD) structure and can tag uncommon variants that are not well tagged by common variants. In this work, we propose a wei-SIMc-matching test that inversely weights haplotype similarities with the estimated standard deviation of haplotype counts to boost the power of similarity-based approaches for detecting uncommon causal variants. We then compare the power of the wei-SIMc-matching test with that of several popular haplotype-based tests, including four other similarity-based tests, a global score test for haplotypes (global), a test based on the maximum score statistic over all haplotypes (max), and two newly proposed haplotype-based tests for rare variant detection. With systematic simulations under a wide range of LD patterns, the results show that wei-SIMc-matching and global are the two most powerful tests. Among these two tests, wei-SIMc-matching has reliable asymptotic P-values, whereas global needs permutations to obtain reliable P-values when the frequencies of some haplotype categories are low or when the trait is skewed. Therefore, we recommend wei-SIMc-matching for detecting uncommon causal variants with surrounding common SNPs, in light of its power and computational feasibility.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | | | | | | | | | | | | |
Collapse
|
17
|
Wang Z, Liu J, Wang J, Wang Y, Wang N, Li Y, Li R, Wu R. Dynamic modeling of genes controlling cancer stem cell proliferation. Front Genet 2012; 3:84. [PMID: 22661984 PMCID: PMC3357477 DOI: 10.3389/fgene.2012.00084] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2012] [Accepted: 04/26/2012] [Indexed: 12/18/2022] Open
Abstract
The growing evidence that cancer originates from stem cells (SC) holds a great promise to eliminate this disease by designing specific drug therapies for removing cancer SC. Translation of this knowledge into predictive tests for the clinic is hampered due to the lack of methods to discriminate cancer SC from non-cancer SC. Here, we address this issue by describing a conceptual strategy for identifying the genetic origins of cancer SC. The strategy incorporates a high-dimensional group of differential equations that characterizes the proliferation, differentiation, and reprogramming of cancer SC in a dynamic cellular and molecular system. The deployment of robust mathematical models will help uncover and explain many still unknown aspects of cell behavior, tissue function, and network organization related to the formation and division of cancer SC. The statistical method developed allows biologically meaningful hypotheses about the genetic control mechanisms of carcinogenesis and metastasis to be tested in a quantitative manner.
Collapse
Affiliation(s)
- Zhong Wang
- Center for Statistical Genetics, The Pennsylvania State University Hershey, PA, USA
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Abstract
This chapter reviews the rationale for the use of haplotypes in association-based testing, discusses statistical issues related to haplotype uncertainty that complicate the analysis, then gives practical guidance for testing haplotype-based associations with phenotype or outcome trait, first of candidate gene regions and then for the genome as a whole. Haplotypes are interesting for two reasons: First, they may be in closer LD with a causal variant than any single measured SNP, and therefore may enhance the coverage value of the genotypes over single SNP analysis. Second, haplotypes may themselves be the causal variants of interest and some solid examples of this have appeared in the literature. This chapter discusses three possible approaches to incorporation of SNP haplotype analysis into generalized linear regression models: (1) a simple substitution method involving imputed haplotypes; (2) simultaneous maximum likelihood (ML) estimation of all parameters, including haplotype frequencies and regression parameters; and (3) a simplified approximation to full ML for case-control data. Examples of the various approaches for a haplotype analysis of a candidate gene are provided. We compare the behavior of the approximation-based methods and show that in most instances the simpler methods hold up well in practice. We also describe the practical implementation of genome-wide haplotype risk estimation and discuss several shortcuts that can be used to speed up otherwise potentially very intensive computational requirements.
Collapse
Affiliation(s)
- Daniel O Stram
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | | |
Collapse
|
19
|
Hill-Burns EM, Factor SA, Zabetian CP, Thomson G, Payami H. Evidence for more than one Parkinson's disease-associated variant within the HLA region. PLoS One 2011; 6:e27109. [PMID: 22096524 PMCID: PMC3212531 DOI: 10.1371/journal.pone.0027109] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2011] [Accepted: 10/10/2011] [Indexed: 11/18/2022] Open
Abstract
Parkinson's disease (PD) was recently found to be associated with HLA in a genome-wide association study (GWAS). Follow-up GWAS's replicated the PD-HLA association but their top hits differ. Do the different hits tag the same locus or is there more than one PD-associated variant within HLA? We show that the top GWAS hits are not correlated with each other (0.00≤r(2)≤0.15). Using our GWAS (2000 cases, 1986 controls) we conducted step-wise conditional analysis on 107 SNPs with P<10(-3) for PD-association; 103 dropped-out, four remained significant. Each SNP, when conditioned on the other three, yielded P(SNP1) = 5×10(-4), P(SNP2) = 5×10(-4), P(SNP3) = 4×10(-3) and P(SNP4) = 0.025. The four SNPs were not correlated (0.01≤r(2)≤0.20). Haplotype analysis (excluding rare SNP2) revealed increasing PD risk with increasing risk alleles from OR = 1.27, P = 5×10(-3) for one risk allele to OR = 1.65, P = 4×10(-8) for three. Using additional 843 cases and 856 controls we replicated the independent effects of SNP1 (P(conditioned-on-SNP4) = 0.04) and SNP4 (P(conditioned-on-SNP1) = 0.04); SNP2 and SNP3 could not be replicated. In pooled GWAS and replication, SNP1 had OR(conditioned-on-SNP4) = 1.23, P(conditioned-on-SNP4) = 6×10(-7); SNP4 had OR(conditioned-on-SNP1) = 1.18, P(conditioned-on-SNP1) = 3×10(-3); and the haplotype with both risk alleles had OR = 1.48, P = 2×10(-12). Genotypic OR increased with the number of risk alleles an individual possessed up to OR = 1.94, P = 2×10(-11) for individuals who were homozygous for the risk allele at both SNP1 and SNP4. SNP1 is a variant in HLA-DRA and is associated with HLA-DRA, DRB5 and DQA2 gene expression. SNP4 is correlated (r(2) = 0.95) with variants that are associated with HLA-DQA2 expression, and with the top HLA SNP from the IPDGC GWAS (r(2) = 0.60). Our findings suggest more than one PD-HLA association; either different alleles of the same gene, or separate loci.
Collapse
Affiliation(s)
- Erin M. Hill-Burns
- New York State Department of Health Wadsworth Center, Albany, New York, United States of America
| | - Stewart A. Factor
- Department of Neurology, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Cyrus P. Zabetian
- Veteran's Affairs Puget Sound Health Care System and Department of Neurology, University of Washington, Seattle, Washington, United States of America
| | - Glenys Thomson
- Department of Integrative Biology, University of California, Berkeley, California, United States of America
| | - Haydeh Payami
- New York State Department of Health Wadsworth Center, Albany, New York, United States of America
| |
Collapse
|
20
|
Hamza TH, Chen H, Hill-Burns EM, Rhodes SL, Montimurro J, Kay DM, Tenesa A, Kusel VI, Sheehan P, Eaaswarkhanth M, Yearout D, Samii A, Roberts JW, Agarwal P, Bordelon Y, Park Y, Wang L, Gao J, Vance JM, Kendler KS, Bacanu SA, Scott WK, Ritz B, Nutt J, Factor SA, Zabetian CP, Payami H. Genome-wide gene-environment study identifies glutamate receptor gene GRIN2A as a Parkinson's disease modifier gene via interaction with coffee. PLoS Genet 2011; 7:e1002237. [PMID: 21876681 PMCID: PMC3158052 DOI: 10.1371/journal.pgen.1002237] [Citation(s) in RCA: 157] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2011] [Accepted: 06/24/2011] [Indexed: 11/18/2022] Open
Abstract
Our aim was to identify genes that influence the inverse association of coffee with the risk of developing Parkinson's disease (PD). We used genome-wide genotype data and lifetime caffeinated-coffee-consumption data on 1,458 persons with PD and 931 without PD from the NeuroGenetics Research Consortium (NGRC), and we performed a genome-wide association and interaction study (GWAIS), testing each SNP's main-effect plus its interaction with coffee, adjusting for sex, age, and two principal components. We then stratified subjects as heavy or light coffee-drinkers and performed genome-wide association study (GWAS) in each group. We replicated the most significant SNP. Finally, we imputed the NGRC dataset, increasing genomic coverage to examine the region of interest in detail. The primary analyses (GWAIS, GWAS, Replication) were performed using genotyped data. In GWAIS, the most significant signal came from rs4998386 and the neighboring SNPs in GRIN2A. GRIN2A encodes an NMDA-glutamate-receptor subunit and regulates excitatory neurotransmission in the brain. Achieving P2df = 10−6, GRIN2A surpassed all known PD susceptibility genes in significance in the GWAIS. In stratified GWAS, the GRIN2A signal was present in heavy coffee-drinkers (OR = 0.43; P = 6×10−7) but not in light coffee-drinkers. The a priori Replication hypothesis that “Among heavy coffee-drinkers, rs4998386_T carriers have lower PD risk than rs4998386_CC carriers” was confirmed: ORReplication = 0.59, PReplication = 10−3; ORPooled = 0.51, PPooled = 7×10−8. Compared to light coffee-drinkers with rs4998386_CC genotype, heavy coffee-drinkers with rs4998386_CC genotype had 18% lower risk (P = 3×10−3), whereas heavy coffee-drinkers with rs4998386_TC genotype had 59% lower risk (P = 6×10−13). Imputation revealed a block of SNPs that achieved P2df<5×10−8 in GWAIS, and OR = 0.41, P = 3×10−8 in heavy coffee-drinkers. This study is proof of concept that inclusion of environmental factors can help identify genes that are missed in GWAS. Both adenosine antagonists (caffeine-like) and glutamate antagonists (GRIN2A-related) are being tested in clinical trials for treatment of PD. GRIN2A may be a useful pharmacogenetic marker for subdividing individuals in clinical trials to determine which medications might work best for which patients. Parkinson's disease (PD), like most common disorders, involves interactions between genetic make-up and environmental exposures that are unique to each individual. Caffeinated-coffee consumption may protect some people from developing PD, although not all benefit equally. In a genome-wide search, we discovered that variations in the glutamate-receptor gene GRIN2A modulate the risk of developing PD in heavy coffee drinkers. The study was hypothesis-free, that is, we cast a net across the entire genome allowing statistical significance to point us to a genetic variant, regardless of whether it fell in a genomic desert or an important gene. Fortuitously, the most significant finding was in a well-known gene, GRIN2A, which regulates brain signals that control movement and behavior. Our finding is important for three reasons: First, it is a proof of concept that studying genes and environment on the whole-genome scale is feasible, and this approach can identify important genes that are missed when environmental exposures are ignored. Second, the knowledge of interaction between GRIN2A, which is involved in neurotransmission in the brain, and caffeine, which is an adenosine-A2A-receptor antagonist, will stimulate new research towards understanding the cause and progression of PD. Third, the results may lead to personalized prevention of and treatment for PD.
Collapse
Affiliation(s)
- Taye H Hamza
- New York State Department of Health Wadsworth Center, Albany, New York, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Jiao S, Hsu L, Hutter CM, Peters U. The use of imputed values in the meta-analysis of genome-wide association studies. Genet Epidemiol 2011; 35:597-605. [PMID: 21769935 DOI: 10.1002/gepi.20608] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2011] [Revised: 06/02/2011] [Accepted: 06/03/2011] [Indexed: 11/09/2022]
Abstract
In genome-wide association studies (GWAS), it is a common practice to impute the genotypes of untyped single nucleotide polymorphism (SNP) by exploiting the linkage disequilibrium structure among SNPs. The use of imputed genotypes improves genome coverage and makes it possible to perform meta-analysis combining results from studies genotyped on different platforms. A popular way of using imputed data is the "expectation-substitution" method, which treats the imputed dosage as if it were the true genotype. In current practice, the estimates given by the expectation-substitution method are usually combined using inverse variance weighting (IVM) scheme in meta-analysis. However, the IVM is not optimal as the estimates given by the expectation-substitution method are generally biased. The optimal weight is, in fact, proportional to the inverse variance and the expected value of the effect size estimates. We show both theoretically and numerically that the bias of the estimates is very small under practical conditions of low effect sizes in GWAS. This finding validates the use of the expectation-substitution method, and shows the inverse variance is a good approximation of the optimal weight. Through simulation, we compared the power of the IVM method with several methods including the optimal weight, the regular z-score meta-analysis and a recently proposed "imputation aware" meta-analysis method (Zaitlen and Eskin [2010] Genet Epidemiol 34:537-542). Our results show that the performance of the inverse variance weight is always indistinguishable from the optimal weight and similar to or better than the other two methods.
Collapse
Affiliation(s)
- Shuo Jiao
- Cancer Prevention Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
| | | | | | | |
Collapse
|
22
|
Xu H, George V. A Monte Carlo test of linkage disequilibrium for single nucleotide polymorphisms. BMC Res Notes 2011; 4:124. [PMID: 21492446 DOI: 10.1186/1756-0500-4-124] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2011] [Accepted: 04/14/2011] [Indexed: 11/13/2022] Open
Abstract
Background Genetic association studies, especially genome-wide studies, make use of linkage disequilibrium(LD) information between single nucleotide polymorphisms (SNPs). LD is also used for studying genome structure and has been valuable for evolutionary studies. The strength of LD is commonly measured by r2, a statistic closely related to the Pearson's χ2 statistic. However, the computation and testing of linkage disequilibrium using r2 requires known haplotype counts of the SNP pair, which can be a problem for most population-based studies where the haplotype phase is unknown. Most statistical genetic packages use likelihood-based methods to infer haplotypes. However, the variability of haplotype estimation needs to be accounted for in the test for linkage disequilibrium. Findings We develop a Monte Carlo based test for LD based on the null distribution of the r2 statistic. Our test is based on r2 and can be reported together with r2. Simulation studies show that it offers slightly better power than existing methods. Conclusions Our approach provides an alternative test for LD and has been implemented as a R program for ease of use. It also provides a general framework to account for other haplotype inference methods in LD testing.
Collapse
|
23
|
Koehler ML, Bondell HD, Tzeng JY. Evaluating haplotype effects in case-control studies via penalized-likelihood approaches: prospective or retrospective analysis? Genet Epidemiol 2011; 34:892-911. [PMID: 21104891 DOI: 10.1002/gepi.20545] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Penalized likelihood methods have become increasingly popular in recent years for evaluating haplotype-phenotype association in case-control studies. Although a retrospective likelihood is dictated by the sampling scheme, these penalized methods are typically built on prospective likelihoods due to their modeling simplicity and computational feasibility. It has been well documented that for unpenalized methods, prospective analyses of case-control data can be valid but less efficient than their retrospective counterparts when testing for association, and result in substantial bias when estimating the haplotype effects. For penalized methods, which combine effect estimation and testing in one step, the impact of using a prospective likelihood is not clear. In this work, we examine the consequences of ignoring the sampling scheme for haplotype-based penalized likelihood methods. Our results suggest that the impact of prospective analyses depends on (1) the underlying genetic mode and (2) the genetic model adopted in the analysis. When the correct genetic model is used, the difference between the two analyses is negligible for additive and slight for dominant haplotype effects. For recessive haplotype effects, the more appropriate retrospective likelihood clearly outperforms the prospective likelihood. If an additive model is incorrectly used, as the true underlying genetic mode is unknown a priori, both retrospective and prospective penalized methods suffer from a sizeable power loss and increase in bias. The impact of using the incorrect genetic model is much bigger on retrospective analyses than prospective analyses, and results in comparable performances for both methods. An application of these methods to the Genetic Analysis Workshop 15 rheumatoid arthritis data is provided.
Collapse
Affiliation(s)
- Megan L Koehler
- Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, USA
| | | | | |
Collapse
|
24
|
Bagos PG. Meta-analysis of haplotype-association studies: comparison of methods and empirical evaluation of the literature. BMC Genet 2011; 12:8. [PMID: 21247440 PMCID: PMC3087509 DOI: 10.1186/1471-2156-12-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2010] [Accepted: 01/19/2011] [Indexed: 01/05/2023] Open
Abstract
Background Meta-analysis is a popular methodology in several fields of medical research, including genetic association studies. However, the methods used for meta-analysis of association studies that report haplotypes have not been studied in detail. In this work, methods for performing meta-analysis of haplotype association studies are summarized, compared and presented in a unified framework along with an empirical evaluation of the literature. Results We present multivariate methods that use summary-based data as well as methods that use binary and count data in a generalized linear mixed model framework (logistic regression, multinomial regression and Poisson regression). The methods presented here avoid the inflation of the type I error rate that could be the result of the traditional approach of comparing a haplotype against the remaining ones, whereas, they can be fitted using standard software. Moreover, formal global tests are presented for assessing the statistical significance of the overall association. Although the methods presented here assume that the haplotypes are directly observed, they can be easily extended to allow for such an uncertainty by weighting the haplotypes by their probability. Conclusions An empirical evaluation of the published literature and a comparison against the meta-analyses that use single nucleotide polymorphisms, suggests that the studies reporting meta-analysis of haplotypes contain approximately half of the included studies and produce significant results twice more often. We show that this excess of statistically significant results, stems from the sub-optimal method of analysis used and, in approximately half of the cases, the statistical significance is refuted if the data are properly re-analyzed. Illustrative examples of code are given in Stata and it is anticipated that the methods developed in this work will be widely applied in the meta-analysis of haplotype association studies.
Collapse
Affiliation(s)
- Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Central Greece, Lamia, Greece.
| |
Collapse
|
25
|
Hooton H, Dubern B, Henegar C, Paternoster L, Nohr EA, Alili R, Rousseau F, Pelloux V, Galan P, Hercberg S, Arner P, Sørensen TIA, Clément K. Association between CST3 rs2424577 polymorphism and corpulence related phenotypes during lifetime in populations of European ancestry. Obes Facts 2011; 4:131-44. [PMID: 21577020 PMCID: PMC6444514 DOI: 10.1159/000327797] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVE Cystatin C, a protein coded by CST3 gene, is implicated in adipose tissue biology. Our hypothesis is that common variants in CST3 gene could play a role in the development of corpulence during lifetime. METHODS Two tag SNPs were selected to capture all SNPs in the CST3 region. We first investigated the association of the two tag SNPs individually and combined into haplotypes with corpulence related phenotypes in 4,288 French subjects (BMI = 24.31 ( 3.74 kg/m²). Significant findings were replicated in five independent populations--790 Danish lean men (BMI = 24.63 ( 2.30 kg/m²), 672 Danish obese men (BMI = 33.23 ( 2.34 kg/m²), 763 Swedish women (BMI = 21.73 ( 2.87 kg/m²), 1,848 Danish lean women (BMI = 22.66 ( 2.85 kg/m²) and 2,061 Danish obese women (BMI = 37.01 ( 3.59 kg/m²). RESULTS Rs2424577 was associated with BMI in three independent populations--G/G carriers were less corpulent than A/A carriers in the French individuals (p = 0.045) and in the Danish lean men (p = 0.021), and they were more corpulent in the group of Swedish women (p = 0.004). This phenomenon has been described as a flip-flop phenomenon, probably caused by a multilocus effect. CONCLUSION CST3 rs2424577 is associated with BMI in a complex fashion. This association is probably caused by the interaction between several functional variants.
Collapse
Affiliation(s)
- Henri Hooton
- INSERM U872 Equipe 7, Centre de Recherche des Cordeliers 15 Rue de l’Ecole de Medecine, 75006 Paris, France.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
van Winkel R, Rutten BP, Peerbooms O, Peuskens J, van Os J, De Hert M. MTHFR and risk of metabolic syndrome in patients with schizophrenia. Schizophr Res 2010; 121:193-8. [PMID: 20547447 DOI: 10.1016/j.schres.2010.05.030] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Revised: 05/18/2010] [Accepted: 05/25/2010] [Indexed: 10/19/2022]
Abstract
OBJECTIVE Meta-analyses have implicated polymorphisms in MTHFR, encoding a critical enzyme in folate and homocysteine metabolism, in both schizophrenia and CVD. METHOD A possible association between the C677T and A1298C polymorphisms of the MTHFR gene on the one hand, and metabolic syndrome on the other, was examined in a naturalistic cohort of 518 patients with a schizophrenia spectrum disorder screened for metabolic disturbances at the Catholic University of Louvain, Belgium. RESULTS MTHFR A1298C, but not C677T, was associated with the metabolic syndrome, C/C genotypes having a 2.4 times higher risk compared to A/A genotypes (95% CI 1.25-4.76, p=0.009). Haplotype analysis revealed similar findings, showing greater risk for metabolic syndrome associated with the 677C/1298C haplotype compared to the reference 677C/1298A haplotype (OR 1.72, 95% CI 1.24-2.39, p=0.001). These associations were not explained by circulating folate levels. Differences between A1298C genotype groups were considerably greater in the subsample treated with clozapine or olanzapine (OR C/C versus A/A 3.87, 95% CI 1.51-9.96) than in subsample treated with any of the other antipsychotics (OR C/C versus A/A 1.30, 95% CI 0.47-3.74), although this did not formally reach statistical significance in the current cross-sectional study (gene-by-group interaction chi(2)=3.0, df=1, p=0.08). CONCLUSION These data provide evidence supporting an association between MTHFR and risk of metabolic syndrome in patients with schizophrenia. Prospective studies evaluating the course of metabolic outcomes after initiation of antipsychotic medication are needed to evaluate possible gene-by-treatment interaction more specifically.
Collapse
Affiliation(s)
- Ruud van Winkel
- Department of Psychiatry and Neuropsychology, EURON, South Limburg Mental Health Research and Teaching Network, Maastricht University Medical Centre, PO box 616, 6200 MD Maastricht, The Netherlands.
| | | | | | | | | | | |
Collapse
|
27
|
Genovese G, Leibon G, Pollak MR, Rockmore DN. Improved IBD detection using incomplete haplotype information. BMC Genet 2010; 11:58. [PMID: 20591167 PMCID: PMC2914765 DOI: 10.1186/1471-2156-11-58] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2009] [Accepted: 06/30/2010] [Indexed: 11/10/2022] Open
Abstract
Background The availability of high density genetic maps and genotyping platforms has transformed human genetic studies. The use of these platforms has enabled population-based genome-wide association studies. However, in inheritance-based studies, current methods do not take full advantage of the information present in such genotyping analyses. Results In this paper we describe an improved method for identifying genetic regions shared identical-by-descent (IBD) from recent common ancestors. This method improves existing methods by taking advantage of phase information even if it is less than fully accurate or missing. We present an analysis of how using phase information increases the accuracy of IBD detection compared to using only genotype information. Conclusions Our algorithm should have utility in a wide range of genetic studies that rely on identification of shared genetic material in large families or small populations.
Collapse
Affiliation(s)
- Giulio Genovese
- Department of Mathematics, Dartmouth College, Hanover NH 03755, USA.
| | | | | | | |
Collapse
|
28
|
Abstract
Missing data arise in genetic association studies when genotypes are unknown or when haplotypes are of direct interest. We provide a general likelihood-based framework for making inference on genetic effects and gene-environment interactions with such missing data. We allow genetic and environmental variables to be correlated while leaving the distribution of environmental variables completely unspecified. We consider 3 major study designs-cross-sectional, case-control, and cohort designs-and construct appropriate likelihood functions for all common phenotypes (e.g. case-control status, quantitative traits, and potentially censored ages at onset of disease). The likelihood functions involve both finite- and infinite-dimensional parameters. The maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Expectation-Maximization (EM) algorithms are developed to implement the corresponding inference procedures. Extensive simulation studies demonstrate that the proposed inferential and numerical methods perform well in practical settings. Illustration with a genome-wide association study of lung cancer is provided.
Collapse
Affiliation(s)
- Y J Hu
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599-7420, USA
| | | | | |
Collapse
|
29
|
Abstract
Haplotypes can hold key information to understand the role of candidate genes in disease etiology. However, standard haplotype analysis has yet been able to fully reveal the information retained by haplotypes. In most analysis, haplotype inference focuses on relative effects compared with an arbitrarily chosen baseline haplotype. It does not depict the effect structure unless an additional inference procedure is used in a secondary post hoc analysis, and such analysis tends to be lack of power. In this study, we propose a penalized regression approach to systematically evaluate the pattern and structure of the haplotype effects. By specifying an L1 penalty on the pairwise difference of the haplotype effects, we present a model-based haplotype analysis to detect and to characterize the haplotypic association signals. The proposed method avoids the need to choose a baseline haplotype; it simultaneously carries out the effect estimation and effect comparison of all haplotypes, and outputs the haplotype group structure based on their effect size. Finally, our penalty weights are theoretically designed to balance the likelihood and the penalty term in an appropriate manner. The proposed method can be used as a tool to comprehend candidate regions identified from a genome or chromosomal scan. Simulation studies reveal the better abilities of the proposed method to identify the haplotype effect structure compared with the traditional haplotype association methods, demonstrating the informativeness and powerfulness of the proposed method.
Collapse
|
30
|
Bardel C, Danjean V, Morange P, Génin E, Darlu P. On the use of phylogeny-based tests to detect association between quantitative traits and haplotypes. Genet Epidemiol 2010; 33:729-39. [PMID: 19399905 DOI: 10.1002/gepi.20425] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
With the increasing availability of genetic data, several SNPs in a candidate gene can be combined into haplotypes to test for association with a quantitative trait. When the number of SNPs increases, the number of haplotypes can become very large and there is a need to group them together. The use of the phylogenetic relationships between haplotypes provides a natural and efficient way of grouping. Moreover, it allows us to identify disease or quantitative trait-related loci. In this article, we describe ALTree-q, a phylogeny-based approach to test for association between quantitative traits and haplotypes and to identify putative quantitative trait nucleotides (QTN). This study focuses on ALTree-q association test which is based on one-way analyses of variance (ANOVA) performed at the different levels of the tree. The statistical properties (type-one error and power rates) were estimated through simulations under different genetic models and were compared to another phylogeny-based test, TreeScan, (Templeton, 2005) and to a haplotypic omnibus test consisting in a one-way ANOVA between all haplotypes. For dominant and additive models ALTree-q is usually the most powerful test whereas TreeScan performs better under a recessive model. However, power depends strongly on the recurrence rate of the QTN, on the QTN allele frequency, and on the linkage disequilibrium between the QTN and other markers. An application of the method on Thrombin Activatable Fibronolysis Inhibitor Antigen levels in European and African samples confirms a possible association with polymorphisms of the CPB2 gene and identifies several QTNs.
Collapse
|
31
|
Jones B, Walsh D, Werner L, Fiumera A. Using blocks of linked single nucleotide polymorphisms as highly polymorphic genetic markers for parentage analysis. Mol Ecol Resour 2008; 9:487-97. [PMID: 21564678 DOI: 10.1111/j.1755-0998.2008.02444.x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Single nucleotide polymorphisms (SNPs) are plentiful in most genomes and amenable to high throughput genotyping, but they are not yet popular for parentage or paternity analysis. The markers are bi-allelic, so individually they contain little information about parentage, and in nonmodel organisms the process of identifying large numbers of unlinked SNPs can be daunting. We explore the possibility of using blocks of between three and 26 linked SNPs as highly polymorphic molecular markers for reconstructing male genotypes in polyandrous organisms with moderate (five offspring) to large (25 offspring) clutches of offspring. Haplotypes are inferred for each block of linked SNPs using the programs Haplore and Phase 2.1. Each multi-SNP haplotype is then treated as a separate allele, producing a highly polymorphic, 'microsatellite-like' marker. A simulation study is performed using haplotype frequencies derived from empirical data sets from Drosophila melanogaster and Mus musculus populations. We find that the markers produced are competitive with microsatellite loci in terms of single parent exclusion probabilities, particularly when using six or more linked SNPs to form a haplotype. These markers contain only modest rates of missing data and genotyping or phasing errors and thus should be seriously considered as molecular markers for parentage analysis, particularly when the study is interested in the functional significance of polymorphisms across the genome.
Collapse
Affiliation(s)
- Beatrix Jones
- Centre for Mathematical Biology, Massey University, Private Bag 102-904, North Shore Mail Centre, Auckland 0745, New Zealand, Institute of Information and Mathematical Sciences, Massey University, Private Bag 102-904, North Shore Mail Centre, Auckland 0745, New Zealand, Dana Farber Cancer Research Center, Boston, MA 02115, USA, Department of Biological Sciences, Binghamton University, PO Box 6000, Binghamton, NY 13902, USA
| | | | | | | |
Collapse
|
32
|
Abstract
The analysis of genomewide association studies requires methods that are both computationally feasible and statistically powerful. Given the large-scale collection of single nucleotide polymorphisms (SNPs), it is desirable to explore the information contained in their interrelationships. In particular, utilizing haplotypes rather than individual SNPs and accounting for correlations of polymorphisms in adjustment for multiple testing can lead to increased power. We present a statistically powerful and numerically efficient method based on sliding windows of adjacent SNPs to detect haplotype-disease association in genomewide studies. This method consists of an efficient algorithm to calculate a proper likelihood-ratio statistic for any given window of SNPs, along with an accurate and efficient Monte Carlo procedure to adjust for multiple testing. Simulation studies using the HapMap data showed that the proposed method performs well in realistic situations. We applied the new method to a case-control study on rheumatoid arthritis and identified several loci worthy of further investigations.
Collapse
Affiliation(s)
- B E Huang
- Department of Biostatistics, University of North Carolina, North Carolina 27599-7420, USA
| | | | | |
Collapse
|
33
|
Lin D, Hu Y, Huang B. Simple and efficient analysis of disease association with missing genotype data. Am J Hum Genet 2008; 82:444-52. [PMID: 18252224 DOI: 10.1016/j.ajhg.2007.11.004] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2007] [Revised: 11/06/2007] [Accepted: 11/19/2007] [Indexed: 10/22/2022] Open
Abstract
Missing genotype data arise in association studies when the single-nucleotide polymorphisms (SNPs) on the genotyping platform are not assayed successfully, when the SNPs of interest are not on the platform, or when total sequence variation is determined only on a small fraction of individuals. We present a simple and flexible likelihood framework to study SNP-disease associations with such missing genotype data. Our likelihood makes full use of all available data in case-control studies and reference panels (e.g., the HapMap), and it properly accounts for the biased nature of the case-control sampling as well as the uncertainty in inferring unknown variants. The corresponding maximum-likelihood estimators for genetic effects and gene-environment interactions are unbiased and statistically efficient. We developed fast and stable numerical algorithms to calculate the maximum-likelihood estimators and their variances, and we implemented these algorithms in a freely available computer program. Simulation studies demonstrated that the new approach is more powerful than existing methods while providing accurate control of the type I error. An application to a case-control study on rheumatoid arthritis revealed several loci that deserve further investigations.
Collapse
|
34
|
Abstract
Association methods based on linkage disequilibrium (LD) offer a promising approach for detecting genetic variations that are responsible for complex human diseases. Although methods based on individual single nucleotide polymorphisms (SNPs) may lead to significant findings, methods based on haplotypes comprising multiple SNPs on the same inherited chromosome may provide additional power for mapping disease genes and also provide insight on factors influencing the dependency among genetic markers. Such insights may provide information essential for understanding human evolution and also for identifying cis-interactions between two or more causal variants. Because obtaining haplotype information directly from experiments can be cost prohibitive in most studies, especially in large scale studies, haplotype analysis presents many unique challenges. In this chapter, we focus on two main issues: haplotype inference and haplotype-association analysis. We first provide a detailed review of methods for haplotype inference using unrelated individuals as well as related individuals from pedigrees. We then cover a number of statistical methods that employ haplotype information in association analysis. In addition, we discuss the advantages and limitations of different methods.
Collapse
Affiliation(s)
- Nianjun Liu
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | | | | |
Collapse
|
35
|
Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 2007; 81:1084-97. [PMID: 17924348 DOI: 10.1086/521987] [Citation(s) in RCA: 2123] [Impact Index Per Article: 124.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2007] [Accepted: 07/30/2007] [Indexed: 11/03/2022] Open
Abstract
Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Statistics, The University of Auckland, Auckland, New Zealand.
| | | |
Collapse
|
36
|
|
37
|
|