1
|
Nazarian A, Philipp I, Culminskaya I, He L, Kulminski AM. Inter- and intra-chromosomal modulators of the APOE ɛ2 and ɛ4 effects on the Alzheimer's disease risk. GeroScience 2023; 45:233-247. [PMID: 35809216 PMCID: PMC9886755 DOI: 10.1007/s11357-022-00617-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 06/24/2022] [Indexed: 02/03/2023] Open
Abstract
The mechanisms of incomplete penetrance of risk-modifying impacts of apolipoprotein E (APOE) ε2 and ε4 alleles on Alzheimer's disease (AD) have not been fully understood. We performed genome-wide analysis of differences in linkage disequilibrium (LD) patterns between 6,136 AD-affected and 10,555 AD-unaffected subjects from five independent studies to explore whether the association of the APOE ε2 allele (encoded by rs7412 polymorphism) and ε4 allele (encoded by rs429358 polymorphism) with AD was modulated by autosomal polymorphisms. The LD analysis identified 24 (mostly inter-chromosomal) and 57 (primarily intra-chromosomal) autosomal polymorphisms with significant differences in LD with either rs7412 or rs429358, respectively, between AD-affected and AD-unaffected subjects, indicating their potential modulatory roles. Our Cox regression analysis showed that minor alleles of four inter-chromosomal and ten intra-chromosomal polymorphisms exerted significant modulating effects on the ε2- and ε4-associated AD risks, respectively, and identified ε2-independent (rs2884183 polymorphism, 11q22.3) and ε4-independent (rs483082 polymorphism, 19q13.32) associations with AD. Our functional analysis highlighted ε2- and/or ε4-linked processes affecting the lipid and lipoprotein metabolism and cell junction organization which may contribute to AD pathogenesis. These findings provide insights into the ε2- and ε4-associated mechanisms of AD pathogenesis, underlying their incomplete penetrance.
Collapse
Affiliation(s)
- Alireza Nazarian
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Erwin Mill Building, 2024 W. Main St, Durham, NC, 27705, USA.
| | - Ian Philipp
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Erwin Mill Building, 2024 W. Main St, Durham, NC, 27705, USA
| | - Irina Culminskaya
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Erwin Mill Building, 2024 W. Main St, Durham, NC, 27705, USA
| | - Liang He
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Erwin Mill Building, 2024 W. Main St, Durham, NC, 27705, USA
| | - Alexander M Kulminski
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Erwin Mill Building, 2024 W. Main St, Durham, NC, 27705, USA.
| |
Collapse
|
2
|
Kulminski AM, Jain-Washburn E, Loiko E, Loika Y, Feng F, Culminskaya I, for the Alzheimer’s Disease Neuroimaging Initiative. Associations of the APOE ε2 and ε4 alleles and polygenic profiles comprising APOE-TOMM40-APOC1 variants with Alzheimer's disease biomarkers. Aging (Albany NY) 2022; 14:9782-9804. [PMID: 36399096 PMCID: PMC9831745 DOI: 10.18632/aging.204384] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 10/31/2022] [Indexed: 11/19/2022]
Abstract
Capturing the genetic architecture of Alzheimer's disease (AD) is challenging because of the complex interplay of genetic and non-genetic factors in its etiology. It has been suggested that AD biomarkers may improve the characterization of AD pathology and its genetic architecture. Most studies have focused on connections of individual genetic variants with AD biomarkers, whereas the role of combinations of genetic variants is substantially underexplored. We examined the associations of the APOE ε2 and ε4 alleles and polygenic profiles comprising the ε4-encoding rs429358, TOMM40 rs2075650, and APOC1 rs12721046 polymorphisms with cerebrospinal fluid (CSF) and plasma amyloid β (Aβ40 and Aβ42) and tau biomarkers. Our findings support associations of the ε4 alleles with both plasma and CSF Aβ42 and CSF tau, and the ε2 alleles with baseline, but not longitudinal, CSF Aβ42 measurements. We found that the ε4-bearing polygenic profiles conferring higher and lower AD risks are differentially associated with tau but not Aβ42. Modulation of the effect of the ε4 alleles by TOMM40 and APOC1 variants indicates the potential genetic mechanism of differential roles of Aβ and tau in AD pathogenesis.
Collapse
Affiliation(s)
- Alexander M. Kulminski
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC 27705, USA
| | - Ethan Jain-Washburn
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC 27705, USA
| | - Elena Loiko
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC 27705, USA
| | - Yury Loika
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC 27705, USA
| | - Fan Feng
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC 27705, USA
| | - Irina Culminskaya
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC 27705, USA
| | | |
Collapse
|
3
|
Kulminski AM, Philipp I, Shu L, Culminskaya I. Definitive roles of TOMM40-APOE-APOC1 variants in the Alzheimer's risk. Neurobiol Aging 2022; 110:122-131. [PMID: 34625307 PMCID: PMC8758518 DOI: 10.1016/j.neurobiolaging.2021.09.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 09/06/2021] [Accepted: 09/07/2021] [Indexed: 02/03/2023]
Abstract
Despite advances, the roles of genetic variants from the APOE-harboring 19q13.32 region in Alzheimer's disease (AD) remain controversial. We leverage a comprehensive approach to gain insights into a more homogeneous genetic architecture of AD in this region. We use a sample of 2,673 AD-affected and 16,246 unaffected subjects from 4 studies and validate our main findings in the landmark Alzheimer's Disease Genetics Consortium cohort (3,662 AD-cases and 1,541 controls). We report the remarkably high excesses of the AD risk for carriers of the ε4 allele who also carry minor alleles of rs2075650 (TOMM40) and rs12721046 (APOC1) polymorphisms compared to carriers of their major alleles. The exceptionally high 4.37-fold (p=1.34 × 10-3) excess was particularly identified for the minor allele homozygotes. The beneficial and adverse variants were significantly depleted and enriched, respectively, in the AD-affected families. This study provides compelling evidence for the definitive roles of the APOE-TOMM40-APOC1 variants in the AD risk.
Collapse
Affiliation(s)
- Alexander M. Kulminski
- Corresponding Author: Alexander M. Kulminski, Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC 27708, USA,
| | | | | | | |
Collapse
|
4
|
Xue Y, Ding J, Wang J, Zhang S, Pan D. Two-phase SSU and SKAT in genetic association studies. J Genet 2020. [DOI: 10.1007/s12041-019-1166-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
5
|
Kulminski AM, Philipp I, Loika Y, He L, Culminskaya I. Haplotype architecture of the Alzheimer's risk in the APOE region via co-skewness. ALZHEIMER'S & DEMENTIA (AMSTERDAM, NETHERLANDS) 2020; 12:e12129. [PMID: 33204816 PMCID: PMC7656174 DOI: 10.1002/dad2.12129] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 10/07/2020] [Accepted: 10/08/2020] [Indexed: 12/30/2022]
Abstract
INTRODUCTION As a multifactorial polygenic disorder, Alzheimer's disease (AD) can be associated with complex haplotypes or compound genotypes. METHODS We examined associations of 4960 single nucleotide polymorphism (SNP) triples, comprising 32 SNPs from five genes in the apolipoprotein E gene (APOE) region with AD in a sample of 2789 AD-affected and 16,334 unaffected subjects. RESULTS We identified a large number of 1127 AD-associated triples, comprising SNPs from all five genes, in support of definitive roles of complex haplotypes in predisposition to AD. These haplotypes may not include the APOE ε4 and ε2 alleles. For triples with rs429358 or rs7412, which encode these alleles, AD is characterized mainly by strengthening connections of the ε4 allele and weakening connections of the ε2 allele with the other alleles in this region. DISCUSSION Dissecting heterogeneity attributed to AD-associated complex haplotypes in the APOE region will target more homogeneous polygenic profiles of people at high risk of AD.
Collapse
Affiliation(s)
- Alexander M. Kulminski
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth CarolinaUSA
| | - Ian Philipp
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth CarolinaUSA
| | - Yury Loika
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth CarolinaUSA
| | - Liang He
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth CarolinaUSA
| | - Irina Culminskaya
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth CarolinaUSA
| |
Collapse
|
6
|
Kulminski AM, Shu L, Loika Y, Nazarian A, Arbeev K, Ukraintseva S, Yashin A, Culminskaya I. APOE region molecular signatures of Alzheimer's disease across races/ethnicities. Neurobiol Aging 2020; 87:141.e1-141.e8. [PMID: 31813627 PMCID: PMC7064423 DOI: 10.1016/j.neurobiolaging.2019.11.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 08/19/2019] [Accepted: 11/06/2019] [Indexed: 11/20/2022]
Abstract
The role of even the strongest genetic risk factor for Alzheimer's disease (AD), the apolipoprotein E (APOE) ε4 allele, in its etiology remains poorly understood. We examined molecular signatures of AD defined as differences in linkage disequilibrium patterns between AD-affected and -unaffected whites (2673/16,246), Hispanics (392/867), and African Americans (285/1789), separately. We focused on 29 polymorphisms from 5 genes in the APOE region emphasizing beneficial and adverse effects of the APOE ε2- and ε4-coding single-nucleotide polymorphisms, respectively, and the differences in the linkage disequilibrium structures involving these alleles between AD-affected and -unaffected subjects. Susceptibility to AD is likely the result of complex interactions of the ε2 and ε4 alleles with other polymorphisms in the APOE region, and these interactions differ across races/ethnicities corroborating differences in the adverse and beneficial effects of the ε4 and ε2 alleles. Our findings support complex race/ethnicity-specific haplotypes promoting and protecting against AD in this region. They contribute to better understanding of polygenic and resilient mechanisms, which can explain why even homozygous ε4 carriers may not develop AD.
Collapse
Affiliation(s)
- Alexander M Kulminski
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA.
| | - Leonardo Shu
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| | - Yury Loika
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| | - Alireza Nazarian
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| | - Konstantin Arbeev
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| | - Svetlana Ukraintseva
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| | - Anatoliy Yashin
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| | - Irina Culminskaya
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA
| |
Collapse
|
7
|
Kulminski AM, Shu L, Loika Y, He L, Nazarian A, Arbeev K, Ukraintseva S, Yashin A, Culminskaya I. Genetic and regulatory architecture of Alzheimer's disease in the APOE region. ALZHEIMER'S & DEMENTIA (AMSTERDAM, NETHERLANDS) 2020; 12:e12008. [PMID: 32211503 PMCID: PMC7085286 DOI: 10.1002/dad2.12008] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 11/07/2019] [Accepted: 11/20/2019] [Indexed: 12/29/2022]
Abstract
INTRODUCTION Apolipoprotein E (APOE) ε2 and ε4 alleles encoded by rs7412 and rs429358 polymorphisms, respectively, are landmark contra and pro "risk" factors for Alzheimer's disease (AD). METHODS We examined differences in linkage disequilibrium (LD) structures between (1) AD-affected and unaffected subjects and (2) older AD-unaffected and younger subjects in the 19q13.3 region harboring rs7412 and rs429358. RESULTS AD is associated with sex-nonspecific heterogeneous patterns of decreased and increased LD of rs7412 and rs429358, respectively, with other polymorphisms from five genes in this region in AD-affected subjects. The LD patterns in older AD-unaffected subjects resembled those in younger individuals. Polarization of the ε4- and ε2 allele-related heterogeneous LD clusters differentiated cell types and implicated specific tissues in AD pathogenesis. DISCUSSION Protection and predisposition to AD is characterized by an interplay of rs7412 and rs429358, with multiple polymorphisms in the 19q13.3 region in a tissue-specific manner, which is not driven by common evolutionary forces.
Collapse
Affiliation(s)
- Alexander M. Kulminski
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth Carolina
| | - Leonardo Shu
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth Carolina
| | - Yury Loika
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth Carolina
| | - Liang He
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth Carolina
| | - Alireza Nazarian
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth Carolina
| | - Konstantin Arbeev
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth Carolina
| | - Svetlana Ukraintseva
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth Carolina
| | - Anatoliy Yashin
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth Carolina
| | - Irina Culminskaya
- Biodemography of Aging Research UnitSocial Science Research InstituteDuke UniversityDurhamNorth Carolina
| |
Collapse
|
8
|
Xue Y, Ding J, Wang J, Zhang S, Pan D. Two-phase SSU and SKAT in genetic association studies. J Genet 2020; 99:9. [PMID: 32089528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The sum of squared score (SSU) and sequence kernel association test (SKAT) are the two good alternative tests for genetic association studies in case-control data. Both SSU and SKAT are derived through assuming a dose-response model between the risk of disease and genotypes. However, in practice, the real genetic mode of inheritance is impossible to know. Thus, these two tests might losepower substantially as shown in simulation results when the genetic model is misspecified. Here, to make both the tests suitable in broad situations, we propose two-phase SSU (tpSSU) and two-phase SKAT (tpSKAT), where the Hardy-Weinberg equilibrium test is adopted to choose the genetic model in the first phase and the SSU and SKAT are constructed corresponding to the selected genetic model in the second phase. We found that both tpSSU and tpSKAT outperformed the original SSU and SKAT in most of our simulation scenarios. Byapplying tpSSU and tpSKAT to the study of type 2 diabetes data, we successfully identified some genes that have direct effects on obesity. Besides, we also detected the significant chromosomal region 10q21.22 in GAW16 rheumatoid arthritis dataset, with P<10-6. These findings suggest that tpSSU and tpSKAT can be effective in identifying genetic variants for complex diseases in case-control association studies.
Collapse
Affiliation(s)
- Yuan Xue
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China.
| | | | | | | | | |
Collapse
|
9
|
Zhang S, Jiang W, Ma RC, Yu W. Region-based interaction detection in genome-wide case-control studies. BMC Med Genomics 2019; 12:133. [PMID: 31888606 PMCID: PMC6936067 DOI: 10.1186/s12920-019-0583-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 09/10/2019] [Indexed: 01/14/2023] Open
Abstract
Background In genome-wide association study (GWAS), conventional interaction detection methods such as BOOST are mostly based on SNP-SNP interactions. Although single nucleotides are the building blocks of human genome, single nucleotide polymorphisms (SNPs) are not necessarily the smallest functional unit for complex phenotypes. Region-based strategies have been proved to be successful in studies aiming at marginal effects. Methods We propose a novel region-region interaction detection method named RRIntCC (region-region interaction detection for case-control studies). RRIntCC uses the correlations between individual SNP-SNP interactions based on linkage disequilibrium (LD) contrast test. Results Simulation experiments showed that our method can achieve a higher power than conventional SNP-based methods with similar type-I-error rates. When applied to two real datasets, RRIntCC was able to find several significant regions, while BOOST failed to identify any significant results. The source code and the sample data of RRIntCC are available at http://bioinformatics.ust.hk/RRIntCC.html. Conclusion In this paper, a new region-based interaction detection method with better performance than SNP-based interaction detection methods has been proposed.
Collapse
Affiliation(s)
- Sen Zhang
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology,, Kowloon, Hong Kong, China
| | - Wei Jiang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
| | - Ronald Cw Ma
- Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Shatin, Hong Kong, China
| | - Weichuan Yu
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong, China.
| |
Collapse
|
10
|
Wang M, Greenberg DA, Stewart WCL. In Response: ME2 association analysis in adolescent-onset genetic generalized epilepsy. Epilepsia 2019; 60:2001-2002. [PMID: 31353459 DOI: 10.1111/epi.16303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 07/08/2019] [Indexed: 11/27/2022]
Affiliation(s)
- Meng Wang
- The Research Institute at Nationwide Children's Hospital, Nationwide Children's Hospital, Columbus, Ohio
| | | | - William C L Stewart
- The Research Institute at Nationwide Children's Hospital, Nationwide Children's Hospital, Columbus, Ohio.,Department of Pediatrics, The Ohio State University, Columbus, Ohio.,Departments of Statistics, The Ohio State University, Columbus, Ohio
| |
Collapse
|
11
|
Kulminski AM, Huang J, Wang J, He L, Loika Y, Culminskaya I. Apolipoprotein E region molecular signatures of Alzheimer's disease. Aging Cell 2018; 17:e12779. [PMID: 29797398 PMCID: PMC6052488 DOI: 10.1111/acel.12779] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/10/2018] [Indexed: 01/01/2023] Open
Abstract
Although the APOE region is the strongest genetic risk factor for Alzheimer's diseases (ADs), its pathogenic role remains poorly understood. Elucidating genetic predisposition to ADs, a subset of age-related diseases characteristic for postreproductive period, is hampered by the undefined role of evolution in establishing molecular mechanisms of such diseases. This uncertainty is inevitable source of natural-selection-free genetic heterogeneity in predisposition to ADs. We performed first large-scale analysis of linkage disequilibrium (LD) structures characterized by 30 polymorphisms from five genes in the APOE 19q13.3 region (BCAM, NECTIN2, TOMM40, APOE, and APOC1) in 2,673 AD-affected and 16,246 unaffected individuals from five cohorts. Consistent with the undefined role of evolution in age-related diseases, we found that these structures, being highly heterogeneous, are significantly different in subjects with and without ADs. The pattern of the difference represents molecular signature of AD comprised of single nucleotide polymorphisms (SNPs) from all five genes in the APOE region. Significant differences in LD in subjects with and without ADs indicate SNPs from different genes likely involved in AD pathogenesis. Significant and highly heterogeneous molecular signatures of ADs provide unprecedented insight into complex polygenetic predisposition to ADs in the APOE region. These findings are more consistent with a complex haplotype than with a single genetic variant origin of ADs in this region.
Collapse
Affiliation(s)
- Alexander M Kulminski
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, North Carolina
| | - Jian Huang
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, North Carolina
| | - Jiayi Wang
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, North Carolina
| | - Liang He
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, North Carolina
| | - Yury Loika
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, North Carolina
| | - Irina Culminskaya
- Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, North Carolina
| |
Collapse
|
12
|
Zhou X. A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES. Ann Appl Stat 2017; 11:2027-2051. [PMID: 29515717 PMCID: PMC5836736 DOI: 10.1214/17-aoas1052] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal z-scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
Collapse
|
13
|
Maadooliat M, Bansal NK, Upadhya J, Farazi MR, Li X, He MM, Hebbring SJ, Ye Z, Schrodi SJ. The Decay of Disease Association with Declining Linkage Disequilibrium: A Fine Mapping Theorem. Front Genet 2016; 7:217. [PMID: 28018425 PMCID: PMC5149547 DOI: 10.3389/fgene.2016.00217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 11/28/2016] [Indexed: 11/13/2022] Open
Abstract
Several important and fundamental aspects of disease genetics models have yet to be described. One such property is the relationship of disease association statistics at a marker site closely linked to a disease causing site. A complete description of this two-locus system is of particular importance to experimental efforts to fine map association signals for complex diseases. Here, we present a simple relationship between disease association statistics and the decline of linkage disequilibrium from a causal site. Specifically, the ratio of Chi-square disease association statistics at a marker site and causal site is equivalent to the standard measure of pairwise linkage disequilibrium, r2. A complete derivation of this relationship from a general disease model is shown. Quite interestingly, this relationship holds across all modes of inheritance. Extensive Monte Carlo simulations using a disease genetics model applied to chromosomes subjected to a standard model of recombination are employed to better understand the variation around this fine mapping theorem due to sampling effects. We also use this relationship to provide a framework for estimating properties of a non-interrogated causal site using data at closely linked markers. Lastly, we apply this way of examining association data from high-density genotyping in a large, publicly-available data set investigating extreme BMI. We anticipate that understanding the patterns of disease association decay with declining linkage disequilibrium from a causal site will enable more powerful fine mapping methods and provide new avenues for identifying causal sites/genes from fine-mapping studies.
Collapse
Affiliation(s)
- Mehdi Maadooliat
- Department of Mathematics, Statistics and Computer Science, Marquette UniversityMilwaukee, WI, USA; Center for Human Genetics, Marshfield Clinic Research FoundationMarshfield, WI, USA
| | - Naveen K Bansal
- Department of Mathematics, Statistics and Computer Science, Marquette University Milwaukee, WI, USA
| | - Jiblal Upadhya
- Department of Mathematics, Statistics and Computer Science, Marquette University Milwaukee, WI, USA
| | - Manzur R Farazi
- Department of Mathematics, Statistics and Computer Science, Marquette University Milwaukee, WI, USA
| | - Xiang Li
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation Marshfield, WI, USA
| | - Max M He
- Center for Human Genetics, Marshfield Clinic Research FoundationMarshfield, WI, USA; Biomedical Informatics Research Center, Marshfield Clinic Research FoundationMarshfield, WI, USA; Computation and Informatics in Biology and Medicine, University of Wisconsin-MadisonMadison, WI, USA
| | - Scott J Hebbring
- Center for Human Genetics, Marshfield Clinic Research FoundationMarshfield, WI, USA; Computation and Informatics in Biology and Medicine, University of Wisconsin-MadisonMadison, WI, USA
| | - Zhan Ye
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation Marshfield, WI, USA
| | - Steven J Schrodi
- Center for Human Genetics, Marshfield Clinic Research FoundationMarshfield, WI, USA; Computation and Informatics in Biology and Medicine, University of Wisconsin-MadisonMadison, WI, USA
| |
Collapse
|
14
|
Schrodi SJ. Reflections on the Field of Human Genetics: A Call for Increased Disease Genetics Theory. Front Genet 2016; 7:106. [PMID: 27375680 PMCID: PMC4896932 DOI: 10.3389/fgene.2016.00106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 05/25/2016] [Indexed: 12/29/2022] Open
Abstract
Development of human genetics theoretical models and the integration of those models with experiment and statistical evaluation are critical for scientific progress. This perspective argues that increased effort in disease genetics theory, complementing experimental, and statistical efforts, will escalate the unraveling of molecular etiologies of complex diseases. In particular, the development of new, realistic disease genetics models will help elucidate complex disease pathogenesis, and the predicted patterns in genetic data made by these models will enable the concurrent, more comprehensive statistical testing of multiple aspects of disease genetics predictions, thereby better identifying disease loci. By theoretical human genetics, I intend to encompass all investigations devoted to modeling the heritable architecture underlying disease traits and studies of the resulting principles and dynamics of such models. Hence, the scope of theoretical disease genetics work includes construction and analysis of models describing how disease-predisposing alleles (1) arise, (2) are transmitted across families and populations, and (3) interact with other risk and protective alleles across both the genome and environmental factors to produce disease states. Theoretical work improves insight into viable genetic models of diseases consistent with empirical results from linkage, transmission, and association studies as well as population genetics. Furthermore, understanding the patterns of genetic data expected under realistic disease models will enable more powerful approaches to discover disease-predisposing alleles and additional heritable factors important in common diseases. In spite of the pivotal role of disease genetics theory, such investigation is not particularly vibrant.
Collapse
Affiliation(s)
- Steven J Schrodi
- Marshfield Clinic Research Foundation, Center for Human GeneticsMarshfield, WI, USA; Computation and Informatics in Biology and Medicine, University of Wisconsin-MadisonMadison, WI, USA
| |
Collapse
|
15
|
Wang YT, Sung PY, Lin PL, Yu YW, Chung RH. A multi-SNP association test for complex diseases incorporating an optimal P-value threshold algorithm in nuclear families. BMC Genomics 2015; 16:381. [PMID: 25975968 PMCID: PMC4433014 DOI: 10.1186/s12864-015-1620-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 05/05/2015] [Indexed: 01/22/2023] Open
Abstract
Background Genome-wide association studies (GWAS) have become a common approach to identifying single nucleotide polymorphisms (SNPs) associated with complex diseases. As complex diseases are caused by the joint effects of multiple genes, while the effect of individual gene or SNP is modest, a method considering the joint effects of multiple SNPs can be more powerful than testing individual SNPs. The multi-SNP analysis aims to test association based on a SNP set, usually defined based on biological knowledge such as gene or pathway, which may contain only a portion of SNPs with effects on the disease. Therefore, a challenge for the multi-SNP analysis is how to effectively select a subset of SNPs with promising association signals from the SNP set. Results We developed the Optimal P-value Threshold Pedigree Disequilibrium Test (OPTPDT). The OPTPDT uses general nuclear families. A variable p-value threshold algorithm is used to determine an optimal p-value threshold for selecting a subset of SNPs. A permutation procedure is used to assess the significance of the test. We used simulations to verify that the OPTPDT has correct type I error rates. Our power studies showed that the OPTPDT can be more powerful than the set-based test in PLINK, the multi-SNP FBAT test, and the p-value based test GATES. We applied the OPTPDT to a family-based autism GWAS dataset for gene-based association analysis and identified MACROD2-AS1 with genome-wide significance (p-value= 2.5 × 10− 6). Conclusions Our simulation results suggested that the OPTPDT is a valid and powerful test. The OPTPDT will be helpful for gene-based or pathway association analysis. The method is ideal for the secondary analysis of existing GWAS datasets, which may identify a set of SNPs with joint effects on the disease. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1620-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yi-Ting Wang
- Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan.
| | - Pei-Yuan Sung
- Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan.
| | - Peng-Lin Lin
- Department of Medical Science, National Tsing Hua University, Hsin-Chu, Taiwan.
| | - Ya-Wen Yu
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan.
| | - Ren-Hua Chung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan.
| |
Collapse
|
16
|
Koh SP, Yip SP, Lee KK, Chan CC, Lau SM, Kho CS, Lau CK, Lin SY, Lau YM, Wong LG, Au KL, Wong KF, Chu RW, Yu PH, Chow EYD, Leung KFS, Tsoi WC, Yung BYM. Genetic association between germline JAK2 polymorphisms and myeloproliferative neoplasms in Hong Kong Chinese population: a case-control study. BMC Genet 2014; 15:147. [PMID: 25526816 PMCID: PMC4293821 DOI: 10.1186/s12863-014-0147-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2014] [Accepted: 12/08/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Myeloproliferative neoplasms (MPNs) are a group of haematological malignancies that can be characterised by a somatic mutation (JAK2V617F). This mutation causes the bone marrow to produce excessive blood cells and is found in polycythaemia vera (~95%), essential thrombocythaemia and primary myelofibrosis (both ~50%). It is considered as a major genetic factor contributing to the development of these MPNs. No genetic association study of MPN in the Hong Kong population has so far been reported. Here, we investigated the relationship between germline JAK2 polymorphisms and MPNs in Hong Kong Chinese to find causal variants that contribute to MPN development. We analysed 19 tag single nucleotide polymorphisms (SNPs) within the JAK2 locus in 172 MPN patients and 470 healthy controls. Three of these 19 SNPs defined the reported JAK2 46/1 haplotype: rs10974944, rs12343867 and rs12340895. Allele and haplotype frequencies were compared between patients and controls by logistic regression adjusted for sex and age. Permutation test was used to correct for multiple comparisons. With significant findings from the 19 SNPs, we then examined 76 additional SNPs across the 148.7-kb region of JAK2 via imputation with the SNP data from the 1000 Genomes Project. RESULTS In single-marker analysis, 15 SNPs showed association with JAK2V617F-positive MPNs (n = 128), and 8 of these were novel MPN-associated SNPs not previously reported. Exhaustive variable-sized sliding-window haplotype analysis identified 184 haplotypes showing significant differences (P < 0.05) in frequencies between patients and controls even after multiple-testing correction. However, single-marker alleles exhibited the strongest association with V617F-positive MPNs. In local Hong Kong Chinese, rs12342421 showed the strongest association signal: asymptotic P = 3.76 × 10-15, empirical P = 2.00 × 10-5 for 50,000 permutations, OR = 3.55 for the minor allele C, and 95% CI, 2.59-4.87. Conditional logistic regression also signified an independent effect of rs12342421 in significant haplotype windows, and this independent effect remained unchanged even with the imputation of additional 76 SNPs. No significant association was found between V617F-negative MPNs and JAK2 SNPs. CONCLUSION With a large sample size, we reported the association between JAK2V617F-positive MPNs and 15 tag JAK2 SNPs and the association of rs12342421 being independent of the JAK2 46/1 haplotype in Hong Kong Chinese population.
Collapse
|
17
|
He T, Zhong PS, Cui Y. A set-based association test identifies sex-specific gene sets associated with type 2 diabetes. Front Genet 2014; 5:395. [PMID: 25429300 PMCID: PMC4228910 DOI: 10.3389/fgene.2014.00395] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 10/27/2014] [Indexed: 01/28/2023] Open
Abstract
Single variant analysis in genome-wide association studies (GWAS) has been proven to be successful in identifying thousands of genetic variants associated with hundreds of complex diseases. However, these identified variants only explain a small fraction of inheritable variability in many diseases, suggesting that other resources, such as multilevel genetic variations, may contribute to disease susceptibility. In this work, we proposed to combine genetic variants that belong to a gene set, such as at gene- and pathway-level to form an integrated signal aimed to identify major players that function in a coordinated manner conferring disease risk. The integrated analysis provides novel insight into disease etiology while individual signals could be easily missed by single variant analysis. We applied our approach to a genome-wide association study of type 2 diabetes (T2D) with male and female data analyzed separately. Novel sex-specific genes and pathways were identified to increase the risk of T2D. We also demonstrated the performance of signal integration through simulation studies.
Collapse
Affiliation(s)
- Tao He
- Department of Statistics and Probability, Michigan State University East Lansing, MI, USA
| | - Ping-Shou Zhong
- Department of Statistics and Probability, Michigan State University East Lansing, MI, USA
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University East Lansing, MI, USA ; Division of Medical Statistics, School of Public Health, Shanxi Medical University Taiyuan, China
| |
Collapse
|
18
|
Won S, Kwon MS, Mattheisen M, Park S, Park C, Kihara D, Cichon S, Ophoff R, Nöthen MM, Rietschel M, Baur M, Uitterlinden AG, Hofmann A, Lange C. Efficient Strategy for Detecting Gene × Gene Joint Action and Its Application in Schizophrenia. Genet Epidemiol 2013; 38:60-71. [DOI: 10.1002/gepi.21779] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2013] [Revised: 08/21/2013] [Accepted: 10/21/2013] [Indexed: 01/21/2023]
Affiliation(s)
- Sungho Won
- Department of Applied Statistics; Chung-Ang University; Seoul Korea
- Research Center for Data Science; Chung-Ang University; Seoul Korea
| | - Min-Seok Kwon
- Bioinformatics Program; Seoul National University; Seoul Korea
| | - Manuel Mattheisen
- Institute of Human Genetics; University of Bonn; Bonn Germany
- Department of Genomics; Life and Brain Center; University of Bonn; Bonn Germany
| | - Suyeon Park
- Department of Applied Statistics; Chung-Ang University; Seoul Korea
| | - Changsoon Park
- Department of Applied Statistics; Chung-Ang University; Seoul Korea
- Research Center for Data Science; Chung-Ang University; Seoul Korea
| | - Daisuke Kihara
- Department of Computer Sciences; Purdue University; West Lafayette Indiana United States of America
| | - Sven Cichon
- Institute of Human Genetics; University of Bonn; Bonn Germany
- Department of Genomics; Life and Brain Center; University of Bonn; Bonn Germany
- Institute of Neuroscience and Medicine (INM-1); Research Center Juelich; Juelich Germany
| | - Roel Ophoff
- Department of Medical Genetics, Rudolf Magnus Institute of Neuroscience; University Medical Center Utrecht; Utrecht The Netherlands
| | - Markus M. Nöthen
- Institute of Human Genetics; University of Bonn; Bonn Germany
- Department of Genomics; Life and Brain Center; University of Bonn; Bonn Germany
| | - Marcella Rietschel
- Department of Genetic Epidemiology in Psychiatry; Central Institute of Mental Health, University of Mannheim; Mannheim Germany
| | - Max Baur
- Institute for Medical Biometry; Informatics, and Epidemiology, University of Bonn; Bonn Germany
- German Center for Neurodegenerative Diseases (DZNE); Bonn Germany
| | - Andre G. Uitterlinden
- Department of Internal Medicine; Genetics Laboratory; Eramsmus Medical Center Rotterdam; The Netherlands
- Department of Epidemiology, Erasmus University Medical Center; Rotterdam The Netherlands
| | - A. Hofmann
- Department of Epidemiology, Erasmus University Medical Center; Rotterdam The Netherlands
| | - Christoph Lange
- German Center for Neurodegenerative Diseases (DZNE); Bonn Germany
- Institute for Genomic Mathematics; University of Bonn; Bonn Germany
- Department of Biostatistics; Harvard School of Public Health; Boston Massachusetts United States of America
- Center for Genomic Medicine; Brigham and Women's Hospital; Boston Massachusetts United States of America
| | | |
Collapse
|
19
|
A fast multilocus test with adaptive SNP selection for large-scale genetic-association studies. Eur J Hum Genet 2013; 22:696-702. [PMID: 24022295 DOI: 10.1038/ejhg.2013.201] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2013] [Revised: 07/02/2013] [Accepted: 08/07/2013] [Indexed: 12/20/2022] Open
Abstract
As increasing evidence suggests that multiple correlated genetic variants could jointly influence the outcome, a multilocus test that aggregates association evidence across multiple genetic markers in a considered gene or a genomic region may be more powerful than a single-marker test for detecting susceptibility loci. We propose a multilocus test, AdaJoint, which adopts a variable selection procedure to identify a subset of genetic markers that jointly show the strongest association signal, and defines the test statistic based on the selected genetic markers. The P-value from the AdaJoint test is evaluated by a computationally efficient algorithm that effectively adjusts for multiple-comparison, and is hundreds of times faster than the standard permutation method. Simulation studies demonstrate that AdaJoint has the most robust performance among several commonly used multilocus tests. We perform multilocus analysis of over 26,000 genes/regions on two genome-wide association studies of pancreatic cancer. Compared with its competitors, AdaJoint identifies a much stronger association between the gene CLPTM1L and pancreatic cancer risk (6.0 × 10(-8)), with the signal optimally captured by two correlated single-nucleotide polymorphisms (SNPs). Finally, we show AdaJoint as a powerful tool for mapping cis-regulating methylation quantitative trait loci on normal breast tissues, and find many CpG sites whose methylation levels are jointly regulated by multiple SNPs nearby.
Collapse
|
20
|
Bochdanovits Z, Simón-Sánchez J, Jonker M, Hoogendijk WJ, van der Vaart A, Heutink P. Accurate prediction of a minimal region around a genetic association signal that contains the causal variant. Eur J Hum Genet 2013; 22:238-42. [PMID: 23736218 DOI: 10.1038/ejhg.2013.115] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2011] [Revised: 04/19/2013] [Accepted: 04/23/2013] [Indexed: 11/09/2022] Open
Abstract
In recent years, genome-wide association studies have been very successful in identifying loci for complex traits. However, typically these findings involve noncoding and/or intergenic SNPs without a clear functional effect that do not directly point to a gene. Hence, the challenge is to identify the causal variant responsible for the association signal. Typically, the first step is to identify all genetic variation in the locus region, usually by resequencing a large number of case chromosomes. Among all variants, the causal one needs to be identified in further functional studies. Because the experimental follow up can be very laborious, restricting the number of variants to be scrutinized can yield a great advantage. An objective method for choosing the size of the region to be followed up would be highly valuable. Here, we propose a simple method to call the minimal region around a significant association peak that is very likely to contain the causal variant. We model linkage disequilibrium (LD) in cases from the observed single SNP association signals, and predict the location of the causal variant by quantifying how well this relationship fits the data. Simulations showed that our approach identifies genomic regions of on average ∼50 kb with up to 90% probability to contain the causal variant. We apply our method to two genome-wide association data sets and localize both the functional variant REP1 in the α-synuclein gene that conveys susceptibility to Parkinson's disease and the APOE gene responsible for the association signal in the Alzheimer's disease data set.
Collapse
Affiliation(s)
- Zoltán Bochdanovits
- Department of Clinical Genetics, VU University Medical Center, Amsterdam, The Netherlands
| | - Javier Simón-Sánchez
- Department of Clinical Genetics, VU University Medical Center, Amsterdam, The Netherlands
| | - Marianne Jonker
- Section Stochastics, Department of Mathematics, Faculty of Sciences, Vrije Universiteit, Amsterdam, The Netherlands
| | - Witte J Hoogendijk
- 1] Department of Psychiatry, VU University Medical Center, Amsterdam, The Netherlands [2] Department of Psychiatry, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Aad van der Vaart
- Section Stochastics, Department of Mathematics, Faculty of Sciences, Vrije Universiteit, Amsterdam, The Netherlands
| | - Peter Heutink
- Department of Clinical Genetics, VU University Medical Center, Amsterdam, The Netherlands
| |
Collapse
|
21
|
Popa OM, Kriegova E, Popa L, Schneiderova P, Dutescu MI, Bojinca M, Bara C, Petrek M. Association study in Romanians confirms IL23A gene haplotype block rs2066808/rs11171806 as conferring risk to psoriatic arthritis. Cytokine 2013; 63:67-73. [PMID: 23673284 DOI: 10.1016/j.cyto.2013.04.013] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2012] [Revised: 03/09/2013] [Accepted: 04/11/2013] [Indexed: 02/01/2023]
Abstract
BACKGROUND The cytokines IL12 and IL23 have been recently implicated in the pathogenesis of psoriatic arthritis (PsA). In this study we investigated the genetic variations in the genes coding for IL12, IL23 and IL23 receptor as a plausible source of susceptibility and modification of clinical symptoms of PsA in Romanian population. METHODS Twenty five SNPs mapping to IL12A, IL12B, IL23A, IL23R and IL12RB1 genes were genotyped in 94 PsA patients and 161 healthy controls of Romanian ethnicity using the Sequenom genotyping platform. RESULTS The exonic SNP rs11171806 from IL23A gene was significantly underrepresented in patients versus controls (p=0.03, OR 0.391) and the carriers of rs11171806/rs2066808 AC haplotype had decreased risk for PsA (p=0.03). The two SNPs of the highly conserved gene IL23A are in complete LD in our population. Genetic variants of IL12B gene were associated with polyarticular subtype of PsA. No associations were found between SNPs from IL12A, IL23R and IL12RB1 genes and susceptibility to PsA and its phenotypes. CONCLUSION We confirm the previously described association of rs2066808 variant with psoriasis and PsA and we show evidence of an extended genomic region inside IL23A gene as carrier of true disease susceptibility factors. These data suggest a role for IL23 in the PsA pathogenesis in Romanians.
Collapse
Affiliation(s)
- Olivia Mihaela Popa
- Department of Immunology and Pathophysiology, Faculty of Medicine, University "Carol Davila", Bucharest, Romania; Laboratory of Immunogenomics and Immunoproteomics, Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, Olomouc, Czech Republic.
| | - Eva Kriegova
- Laboratory of Immunogenomics and Immunoproteomics, Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, Olomouc, Czech Republic.
| | - Luis Popa
- Molecular Biology Department, Grigore Antipa National Museum of Natural History, Bucharest, Romania.
| | - Petra Schneiderova
- Laboratory of Immunogenomics and Immunoproteomics, Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, Olomouc, Czech Republic.
| | | | - Mihai Bojinca
- Department of Rheumatology, Faculty of Medicine, University "Carol Davila", "I.C. Cantacuzino" Hospital, Bucharest, Romania.
| | - Constantin Bara
- Department of Immunology and Pathophysiology, Faculty of Medicine, University "Carol Davila", Bucharest, Romania.
| | - Martin Petrek
- Laboratory of Immunogenomics and Immunoproteomics, Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, Olomouc, Czech Republic.
| |
Collapse
|
22
|
Piriyapongsa J, Ngamphiw C, Intarapanich A, Kulawonganunchai S, Assawamakin A, Bootchai C, Shaw PJ, Tongsima S. iLOCi: a SNP interaction prioritization technique for detecting epistasis in genome-wide association studies. BMC Genomics 2012; 13 Suppl 7:S2. [PMID: 23281813 PMCID: PMC3521387 DOI: 10.1186/1471-2164-13-s7-s2] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Background Genome-wide association studies (GWAS) do not provide a full account of the heritability of genetic diseases since gene-gene interactions, also known as epistasis are not considered in single locus GWAS. To address this problem, a considerable number of methods have been developed for identifying disease-associated gene-gene interactions. However, these methods typically fail to identify interacting markers explaining more of the disease heritability over single locus GWAS, since many of the interactions significant for disease are obscured by uninformative marker interactions e.g., linkage disequilibrium (LD). Results In this study, we present a novel SNP interaction prioritization algorithm, named iLOCi (Interacting Loci). This algorithm accounts for marker dependencies separately in case and control groups. Disease-associated interactions are then prioritized according to a novel ranking score calculated from the difference in marker dependencies for every possible pair between case and control groups. The analysis of a typical GWAS dataset can be completed in less than a day on a standard workstation with parallel processing capability. The proposed framework was validated using simulated data and applied to real GWAS datasets using the Wellcome Trust Case Control Consortium (WTCCC) data. The results from simulated data showed the ability of iLOCi to identify various types of gene-gene interactions, especially for high-order interaction. From the WTCCC data, we found that among the top ranked interacting SNP pairs, several mapped to genes previously known to be associated with disease, and interestingly, other previously unreported genes with biologically related roles. Conclusion iLOCi is a powerful tool for uncovering true disease interacting markers and thus can provide a more complete understanding of the genetic basis underlying complex disease. The program is available for download at http://www4a.biotec.or.th/GI/tools/iloci.
Collapse
Affiliation(s)
- Jittima Piriyapongsa
- National Center for Genetic Engineering and Biotechnology, Pathumthani, 12120, Thailand
| | | | | | | | | | | | | | | |
Collapse
|
23
|
Rafiq S, Venkata KKM, Gupta V, Vinay DG, Spurgeon CJ, Parameshwaran S, Madana SN, Kinra S, Bowen L, Timpson NJ, Smith GD, Dudbridge F, Prabhakaran D, Ben-Shlomo Y, Reddy KS, Ebrahim S, Chandak GR. Evaluation of seven common lipid associated loci in a large Indian sib pair study. Lipids Health Dis 2012; 11:155. [PMID: 23150898 PMCID: PMC3598237 DOI: 10.1186/1476-511x-11-155] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2012] [Accepted: 10/27/2012] [Indexed: 01/20/2023] Open
Abstract
Background Genome wide association studies (GWAS), mostly in Europeans have identified several common variants as associated with key lipid traits. Replication of these genetic effects in South Asian populations is important since it would suggest wider relevance for these findings. Given the rising prevalence of metabolic disorders and heart disease in the Indian sub-continent, these studies could be of future clinical relevance. Methods We studied seven common variants associated with a variety of lipid traits in previous GWASs. The study sample comprised of 3178 sib-pairs recruited as participants for the Indian Migration Study (IMS). Associations with various lipid parameters and quantitative traits were analyzed using the Fulker genetic association model. Results We replicated five of the 7 main effect associations with p-values ranging from 0.03 to 1.97x10-7. We identified particularly strong association signals at rs662799 in APOA5 (beta=0.18 s.d, p=1.97 x 10-7), rs10503669 in LPL (beta =−0.18 s.d, p=1.0 x 10-4) and rs780094 in GCKR (beta=0.11 s.d, p=0.001) loci in relation to triglycerides. In addition, the GCKR variant was also associated with total cholesterol (beta=0.11 s.d, p=3.9x10-4). We also replicated the association of rs562338 in APOB (p=0.03) and rs4775041 in LIPC (p=0.007) with LDL-cholesterol and HDL-cholesterol respectively. Conclusions We report associations of five loci with various lipid traits with the effect size consistent with the same reported in Europeans. These results indicate an overlap of genetic effects pertaining to lipid traits across the European and Indian populations.
Collapse
Affiliation(s)
- Sajjad Rafiq
- Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Rajapakse I, Perlman MD, Martin PJ, Hansen JA, Kooperberg C. Multivariate detection of gene-gene interactions. Genet Epidemiol 2012; 36:622-30. [PMID: 22782518 DOI: 10.1002/gepi.21656] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Revised: 04/27/2012] [Accepted: 05/29/2012] [Indexed: 12/18/2022]
Abstract
Unraveling the nature of genetic interactions is crucial to obtaining a more complete picture of complex diseases. It is thought that gene-gene interactions play an important role in the etiology of cancer, cardiovascular, and immune-mediated disease. Interactions among genes are defined as phenotypic effects that differ from those observed for independent contributions of each gene, usually detected by univariate logistic regression methods. Using a multivariate extension of linkage disequilibrium (LD), we have developed a new method, based on distances between sample covariance matrices for groups of single nucleotide polymorphisms (SNPs), to test for interaction effects of two groups of genes associated with a disease phenotype. Since a disease-associated interacting locus will often be in LD with more than one marker in the region, a method that examines a set of markers in a region collectively can offer greater power than traditional methods. Our method effectively identifies interaction effects in simulated data, as well as in data on the genetic contributions to the risk for graft-versus-host disease following hematopoietic stem cell transplantation.
Collapse
Affiliation(s)
- Indika Rajapakse
- Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024, USA
| | | | | | | | | |
Collapse
|
25
|
Wang X, Morris NJ, Schaid DJ, Elston RC. Power of single- vs. multi-marker tests of association. Genet Epidemiol 2012; 36:480-7. [PMID: 22648939 PMCID: PMC3708310 DOI: 10.1002/gepi.21642] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Revised: 03/23/2012] [Accepted: 04/23/2012] [Indexed: 01/15/2023]
Abstract
Current genome-wide association studies still heavily rely on a single-marker strategy, in which each single nucleotide polymorphism (SNP) is tested individually for association with a phenotype. Although methods and software packages that consider multimarker models have become available, they have been slow to become widely adopted and their efficacy in real data analysis is often questioned. Based on conducting extensive simulations, here we endeavor to provide more insights into the performance of simple multimarker association tests as compared to single-marker tests. The results reveal the power advantage as well as disadvantage of the two- vs. the single-marker test. Power differentials depend on the correlation structure among tag SNPs, as well as that between tag SNPs and causal variants. A two-marker test has relatively better performance than single-marker tests when the correlation of the two adjacent markers is high. However, using HapMap data, two-marker tests tended to have a greater chance of being less powerful than single-marker tests, due to constraints on the number of actual possible haplotypes in the HapMap data. Yet, the average power difference was small whenever the one-marker test is more powerful, while there were many situations where the two-marker test can be much more powerful. These findings can be useful to guide analyses of future studies.
Collapse
Affiliation(s)
- Xuefeng Wang
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio
| | - Nathan J. Morris
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio
| | - Daniel J. Schaid
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Robert C. Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio
| |
Collapse
|
26
|
Ueki M, Cordell HJ. Improved statistics for genome-wide interaction analysis. PLoS Genet 2012; 8:e1002625. [PMID: 22496670 PMCID: PMC3320596 DOI: 10.1371/journal.pgen.1002625] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2011] [Accepted: 02/13/2012] [Indexed: 12/15/2022] Open
Abstract
Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result. Gene–gene interactions are a topic of great interest to geneticists carrying out studies of how genetic factors influence the development of common, complex diseases. Genes that interact may not only make important biological contributions to underlying disease processes, but also be more difficult to detect when using standard statistical methods in which we examine the effects of genetic factors one at a time. Recently a method was proposed by Wu and colleagues [1] for detecting pairwise interactions when carrying out genome-wide association studies (in which a large number of genetic variants across the genome are examined). Wu and colleagues carried out theoretical work and computer simulations that suggested their method outperformed other previously proposed approaches for detecting interactions. Here we show that, in fact, the method proposed by Wu and colleagues can result in an over-preponderence of false postive findings. We propose an adjusted version of their method that reduces the false positive rate while maintaining high power. We also propose a new method for detecting pairs of genetic effects that shows similarly high power but has some conceptual advantages over both Wu's method and also other previously proposed approaches.
Collapse
Affiliation(s)
- Masao Ueki
- Faculty of Medicine, Yamagata University, Yamagata, Japan
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Heather J. Cordell
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
- * E-mail:
| |
Collapse
|
27
|
Cleveland MA, Deeb N. Selecting markers and evaluating coverage. Methods Mol Biol 2012; 871:55-71. [PMID: 22565833 DOI: 10.1007/978-1-61779-785-9_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The availability of genetic markers in many species has enabled the analysis of marker-trait associations ranging from small genomic regions to genome-wide scale. An appropriate set of markers must be identified to meet the objectives of any research, using a custom discovery and selection approach or by using a commercial product. The key considerations in selecting markers are the quantity and the distribution across the genome. Though decisions about how many markers to use are often pragmatic, influenced by costs and available technology, an evaluation of the marker coverage is important in understanding how to design an effective genomic research study with reasonable expectations about the power to obtain desired results. An important parameter to evaluate coverage is linkage disequilibrium, which can be used to determine the appropriate number of markers for a particular analysis and is related to the proportion of variance that can be explained by a given marker, or power. Finally, the type of analysis used to identify marker-trait associations may depend on marker coverage as the optimal approach, from a statistical or computational standpoint, may differ with changes in marker number and distribution.
Collapse
Affiliation(s)
- Matthew A Cleveland
- Genus plc, 100 Bluegrass Commons Boulevard, Suite 2200, Hendersonville, TN 37075, USA.
| | | |
Collapse
|
28
|
The effect of genotyping errors on the robustness of composite linkage disequilibrium measures. J Genet 2011; 90:453-7. [DOI: 10.1007/s12041-011-0110-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
29
|
Tejedor MT, Cenarro A, Tejedor D, Stef M, Palacios L, de Castro I, García-Otín ÁL, Monteagudo LV, Civeira F, Pocovi M. New contributions to the study of common double mutants in the human LDL receptor gene. THE SCIENCE OF NATURE - NATURWISSENSCHAFTEN 2011; 98:943-9. [DOI: 10.1007/s00114-011-0845-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2011] [Revised: 09/02/2011] [Accepted: 09/06/2011] [Indexed: 03/08/2023]
|
30
|
Uh HW, Eilers PHC. Haplotype estimation from fuzzy genotypes using penalized likelihood. PLoS One 2011; 6:e24219. [PMID: 21931662 PMCID: PMC3169573 DOI: 10.1371/journal.pone.0024219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2010] [Accepted: 08/08/2011] [Indexed: 11/30/2022] Open
Abstract
The Composite Link Model is a generalization of the generalized linear model in which expected values of observed counts are constructed as a sum of generalized linear components. When combined with penalized likelihood, it provides a powerful and elegant way to estimate haplotype probabilities from observed genotypes. Uncertain ("fuzzy") genotypes, like those resulting from AFLP scores, can be handled by adding an extra layer to the model. We describe the model and the estimation algorithm. We apply it to a data set of accurate human single nucleotide polymorphism (SNP) and to a data set of fuzzy tomato AFLP scores.
Collapse
Affiliation(s)
- Hae-Won Uh
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands.
| | | |
Collapse
|
31
|
Chen Z, Liu Q. A new approach to account for the correlations among single nucleotide polymorphisms in genome: wide association studies. Hum Hered 2011; 72:1-9. [PMID: 21849789 DOI: 10.1159/000330135] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2011] [Accepted: 05/31/2011] [Indexed: 12/19/2022] Open
Abstract
In genetic association studies, such as genome-wide association studies (GWAS), the number of single nucleotide polymorphisms (SNPs) can be as large as hundreds of thousands. Due to linkage disequilibrium, many SNPs are highly correlated; assuming they are independent is not valid. The commonly used multiple comparison methods, such as Bonferroni correction, are not appropriate and are too conservative when applied to GWAS. To overcome these limitations, many approaches have been proposed to estimate the so-called effective number of independent tests to account for the correlations among SNPs. However, many current effective number estimation methods are based on eigenvalues of the correlation matrix. When the dimension of the matrix is large, the numeric results may be unreliable or even unobtainable. To circumvent this obstacle and provide better estimates, we propose a new effective number estimation approach which is not based on the eigenvalues. We compare the new method with others through simulated and real data. The comparison results show that the proposed method has very good performance.
Collapse
Affiliation(s)
- Zhongxue Chen
- Biostatistics Epidemiology Research Design Core, Center for Clinical and Translational Sciences, University of Texas Health Science Center at Houston, USA. zhongxue.chen @ uth.tmc.edu
| | | |
Collapse
|
32
|
Yu Z, Wang S. Contrasting linkage disequilibrium as a multilocus family-based association test. Genet Epidemiol 2011; 35:487-98. [PMID: 21769928 DOI: 10.1002/gepi.20598] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2010] [Revised: 04/20/2011] [Accepted: 04/24/2011] [Indexed: 02/04/2023]
Abstract
Linkage disequilibrium (LD) of genetic loci is routinely estimated and graphically illustrated in genetic association studies. It has been suggested that the information in LD is also useful for association mapping and genetic association can be detected by comparing LD patterns between cases and controls. Here, we extend this idea to analyze case-parents data by comparing LD patterns between transmitted and nontransmitted genotypes. We provide the condition when contrasting LD is valid for testing gene-gene interactions. A permutation procedure is given to assess statistical significance. One advantage of our proposed methods is that haplotype information is not required. Thus, the implementation of our methods is straightforward and the resulted tests are free from potential bias caused by assumptions made to estimate haplotypes in silico. Since our test statistics use pairwise LD measurements, they are less affected by missing data than many other multilocus methods. With simulated data, we demonstrate that examining LD patterns of case-parents data is a useful multilocus association mapping strategy and it complements existing association mapping methods. The application of our methods to a Crohn's disease data set shows that our methods can detect multilocus association that might be missed by other association methods. Our permutation procedure can also be modified to allow multiple offspring from a family to be analyzed.
Collapse
Affiliation(s)
- Zhaoxia Yu
- Department of Statistics, University of California-Irvine, CA 92697, USA.
| | | |
Collapse
|
33
|
Sorice R, Bione S, Sansanelli S, Ulivi S, Athanasakis E, Lanzara C, Nutile T, Sala C, Camaschella C, D'Adamo P, Gasparini P, Ciullo M, Toniolo D. Association of a variant in the CHRNA5-A3-B4 gene cluster region to heavy smoking in the Italian population. Eur J Hum Genet 2011; 19:593-6. [PMID: 21248747 PMCID: PMC3083621 DOI: 10.1038/ejhg.2010.240] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2009] [Revised: 11/25/2010] [Accepted: 12/03/2010] [Indexed: 01/22/2023] Open
Abstract
Large-scale population studies have established that genetic factors contribute to individual differences in smoking behavior. Linkage and genome-wide association studies have shown many chromosomal regions and genes associated with different smoking behaviors. One study was the association of single-nucleotide polymorphisms (SNPs) in the CHRNA5-A3-B4 gene cluster to nicotine addiction. Here, we report a replication of this association in the Italian population represented by three genetically isolated populations. One, the Val Borbera, is a genetic isolate from North-Western Italy; the Cilento population, is located in South-Western Italy; and the Carlantino village is located in South-Eastern Italy. Owing to their position and their isolation, the three populations have a different environment, different history and genetic structure. The variant A of the rs1051730 SNP was significantly associated with smoking quantity in two populations, Val Borbera and Cilento, no association was found in Carlantino population probably because difference in LD pattern in the variant region.
Collapse
Affiliation(s)
- Rossella Sorice
- Institute of Genetics and Biophysics ‘Adriano Buzzati-Traverso', CNR, Napoli, Italy
| | - Silvia Bione
- Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milano, Italy
- Institute of Molecular Genetics, CNR, Pavia, Italy
| | - Serena Sansanelli
- Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milano, Italy
| | - Sheila Ulivi
- Department of Laboratory Medicine, Medical Genetics, Institute for Maternal and Child Health IRCCS-Burlo Garofolo, Trieste, Italy
| | | | - Carmela Lanzara
- Department of Reproductive Sciences and Development, Medical Genetics, University of Trieste, Trieste, Italy
- LATEMAR Unit, Trieste, Italy
| | - Teresa Nutile
- Institute of Genetics and Biophysics ‘Adriano Buzzati-Traverso', CNR, Napoli, Italy
| | - Cinzia Sala
- Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milano, Italy
| | - Clara Camaschella
- Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milano, Italy
- Vita-Salute San Raffaele University, Milano, Italy
| | - Pio D'Adamo
- Department of Laboratory Medicine, Medical Genetics, Institute for Maternal and Child Health IRCCS-Burlo Garofolo, Trieste, Italy
| | - Paolo Gasparini
- Department of Laboratory Medicine, Medical Genetics, Institute for Maternal and Child Health IRCCS-Burlo Garofolo, Trieste, Italy
- Department of Reproductive Sciences and Development, Medical Genetics, University of Trieste, Trieste, Italy
| | - Marina Ciullo
- Institute of Genetics and Biophysics ‘Adriano Buzzati-Traverso', CNR, Napoli, Italy
| | - Daniela Toniolo
- Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milano, Italy
- Institute of Molecular Genetics, CNR, Pavia, Italy
| |
Collapse
|
34
|
Sha Q, Zhang Z, Zhang S. An improved score test for genetic association studies. Genet Epidemiol 2011; 35:350-9. [DOI: 10.1002/gepi.20583] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Revised: 02/16/2011] [Accepted: 03/01/2011] [Indexed: 11/06/2022]
|
35
|
Ruggiero D, Dalmasso C, Nutile T, Sorice R, Dionisi L, Aversano M, Bröet P, Leutenegger AL, Bourgain C, Ciullo M. Genetics of VEGF serum variation in human isolated populations of cilento: importance of VEGF polymorphisms. PLoS One 2011; 6:e16982. [PMID: 21347390 PMCID: PMC3036731 DOI: 10.1371/journal.pone.0016982] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2010] [Accepted: 01/19/2011] [Indexed: 11/18/2022] Open
Abstract
Vascular Endothelial Growth Factor (VEGF) is the main player in angiogenesis. Because of its crucial role in this process, the study of the genetic factors controlling VEGF variability may be of particular interest for many angiogenesis-associated diseases. Although some polymorphisms in the VEGF gene have been associated with a susceptibility to several disorders, no genome-wide search on VEGF serum levels has been reported so far. We carried out a genome-wide linkage analysis in three isolated populations and we detected a strong linkage between VEGF serum levels and the 6p21.1 VEGF region in all samples. A new locus on chromosome 3p26.3 significantly linked to VEGF serum levels was also detected in a combined population sample. A sequencing of the gene followed by an association study identified three common single nucleotide polymorphisms (SNPs) influencing VEGF serum levels in one population (Campora), two already reported in the literature (rs3025039, rs25648) and one new signal (rs3025020). A fourth SNP (rs41282644) was found to affect VEGF serum levels in another population (Cardile). All the identified SNPs contribute to the related population linkages (35% of the linkage explained in Campora and 15% in Cardile). Interestingly, none of the SNPs influencing VEGF serum levels in one population was found to be associated in the two other populations. These results allow us to exclude the hypothesis that the common variants located in the exons, intron-exon junctions, promoter and regulative regions of the VEGF gene may have a causal effect on the VEGF variation. The data support the alternative hypothesis of a multiple rare variant model, possibly consisting in distinct variants in different populations, influencing VEGF serum levels.
Collapse
Affiliation(s)
- Daniela Ruggiero
- Institute of Genetics and Biophysics A. Buzzati-Traverso, CNR, Naples, Italy
| | | | - Teresa Nutile
- Institute of Genetics and Biophysics A. Buzzati-Traverso, CNR, Naples, Italy
| | - Rossella Sorice
- Institute of Genetics and Biophysics A. Buzzati-Traverso, CNR, Naples, Italy
| | - Laura Dionisi
- Institute of Genetics and Biophysics A. Buzzati-Traverso, CNR, Naples, Italy
| | - Mario Aversano
- Institute of Genetics and Biophysics A. Buzzati-Traverso, CNR, Naples, Italy
| | | | | | | | - Marina Ciullo
- Institute of Genetics and Biophysics A. Buzzati-Traverso, CNR, Naples, Italy
| |
Collapse
|
36
|
EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units. Eur J Hum Genet 2010; 19:465-71. [PMID: 21150885 DOI: 10.1038/ejhg.2010.196] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Detection of epistatic interaction between loci has been postulated to provide a more in-depth understanding of the complex biological and biochemical pathways underlying human diseases. Studying the interaction between two loci is the natural progression following traditional and well-established single locus analysis. However, the added costs and time duration required for the computation involved have thus far deterred researchers from pursuing a genome-wide analysis of epistasis. In this paper, we propose a method allowing such analysis to be conducted very rapidly. The method, dubbed EPIBLASTER, is applicable to case-control studies and consists of a two-step process in which the difference in Pearson's correlation coefficients is computed between controls and cases across all possible SNP pairs as an indication of significant interaction warranting further analysis. For the subset of interactions deemed potentially significant, a second-stage analysis is performed using the likelihood ratio test from the logistic regression to obtain the P-value for the estimated coefficients of the individual effects and the interaction term. The algorithm is implemented using the parallel computational capability of commercially available graphical processing units to greatly reduce the computation time involved. In the current setup and example data sets (211 cases, 222 controls, 299468 SNPs; and 601 cases, 825 controls, 291095 SNPs), this coefficient evaluation stage can be completed in roughly 1 day. Our method allows for exhaustive and rapid detection of significant SNP pair interactions without imposing significant marginal effects of the single loci involved in the pair.
Collapse
|
37
|
Lee SH, Nyholt DR, Macgregor S, Henders AK, Zondervan KT, Montgomery GW, Visscher PM. A simple and fast two-locus quality control test to detect false positives due to batch effects in genome-wide association studies. Genet Epidemiol 2010; 34:854-62. [PMID: 21104888 PMCID: PMC3674525 DOI: 10.1002/gepi.20541] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Revised: 07/28/2010] [Accepted: 09/09/2010] [Indexed: 12/04/2022]
Abstract
The impact of erroneous genotypes having passed standard quality control (QC) can be severe in genome-wide association studies, genotype imputation, and estimation of heritability and prediction of genetic risk based on single nucleotide polymorphisms (SNP). To detect such genotyping errors, a simple two-locus QC method, based on the difference in test statistic of association between single SNPs and pairs of SNPs, was developed and applied. The proposed approach could detect many problematic SNPs with statistical significance even when standard single SNP QC analyses fail to detect them in real data. Depending on the data set used, the number of erroneous SNPs that were not filtered out by standard single SNP QC but detected by the proposed approach varied from a few hundred to thousands. Using simulated data, it was shown that the proposed method was powerful and performed better than other tested existing methods. The power of the proposed approach to detect erroneous genotypes was ∼80% for a 3% error rate per SNP. This novel QC approach is easy to implement and computationally efficient, and can lead to a better quality of genotypes for subsequent genotype-phenotype investigations.
Collapse
Affiliation(s)
- Sang Hong Lee
- Queensland Institute of Medical Research, Herston, Queensland, Australia.
| | | | | | | | | | | | | |
Collapse
|
38
|
Han F, Pan W. Powerful multi-marker association tests: unifying genomic distance-based regression and logistic regression. Genet Epidemiol 2010; 34:680-8. [PMID: 20976795 PMCID: PMC3345567 DOI: 10.1002/gepi.20529] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
To detect genetic association with common and complex diseases, many statistical tests have been proposed for candidate gene or genome-wide association studies with the case-control design. Due to linkage disequilibrium (LD), multi-marker association tests can gain power over single-marker tests with a Bonferroni multiple testing adjustment. Among many existing multi-marker association tests, most target to detect only one of many possible aspects in distributional differences between the genotypes of cases and controls, such as allele frequency differences, while a few new ones aim to target two or three aspects, all of which can be implemented in logistic regression. In contrast to logistic regression, a genomic distance-based regression (GDBR) approach aims to detect some high-order genotypic differences between cases and controls. A recent study has confirmed the high power of GDBR tests. At this moment, the popular logistic regression and the emerging GDBR approaches are completely unrelated; for example, one has to choose between the two. In this article, we reformulate GDBR as logistic regression, opening a venue to constructing other powerful tests while overcoming some limitations of GDBR. For example, asymptotic distributions can replace time-consuming permutations for deriving P-values and covariates, including gene-gene interactions, can be easily incorporated. Importantly, this reformulation facilitates combining GDBR with other existing methods in a unified framework of logistic regression. In particular, we show that Fisher's P-value combining method can boost statistical power by incorporating information from allele frequencies, Hardy-Weinberg disequilibrium, LD patterns, and other higher-order interactions among multi-markers as captured by GDBR.
Collapse
Affiliation(s)
- Fang Han
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota 55455–0392, USA
| | | |
Collapse
|
39
|
Crosslin DR, Qin X, Hauser ER. Assessment of LD matrix measures for the analysis of biological pathway association. Stat Appl Genet Mol Biol 2010; 9:Article35. [PMID: 20887274 PMCID: PMC2979315 DOI: 10.2202/1544-6115.1561] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Complex diseases will have multiple functional sites, and it will be invaluable to understand the cross-locus interaction in terms of linkage disequilibrium (LD) between those sites (epistasis) in addition to the haplotype-LD effects. We investigated the statistical properties of a class of matrix-based statistics to assess this epistasis. These statistical methods include two LD contrast tests (Zaykin et al., 2006) and partial least squares regression (Wang et al., 2008). To estimate Type 1 error rates and power, we simulated multiple two-variant disease models using the SIMLA software package. SIMLA allows for the joint action of up to two disease genes in the simulated data with all possible multiplicative interaction effects between them. Our goal was to detect an interaction between multiple disease-causing variants by means of their linkage disequilibrium (LD) patterns with other markers. We measured the effects of marginal disease effect size, haplotype LD, disease prevalence and minor allele frequency have on cross-locus interaction (epistasis). In the setting of strong allele effects and strong interaction, the correlation between the two disease genes was weak (r=0.2). In a complex system with multiple correlations (both marginal and interaction), it was difficult to determine the source of a significant result. Despite these complications, the partial least squares and modified LD contrast methods maintained adequate power to detect the epistatic effects; however, for many of the analyses we often could not separate interaction from a strong marginal effect. While we did not exhaust the entire parameter space of possible models, we do provide guidance on the effects that population parameters have on cross-locus interaction.
Collapse
|
40
|
Wan X, Yang C, Yang Q, Xue H, Fan X, Tang NL, Yu W. BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet 2010; 87:325-40. [PMID: 20817139 DOI: 10.1016/j.ajhg.2010.07.021] [Citation(s) in RCA: 307] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2010] [Revised: 07/09/2010] [Accepted: 07/29/2010] [Indexed: 12/30/2022] Open
Abstract
Gene-gene interactions have long been recognized to be fundamentally important for understanding genetic causes of complex disease traits. At present, identifying gene-gene interactions from genome-wide case-control studies is computationally and methodologically challenging. In this paper, we introduce a simple but powerful method, named "BOolean Operation-based Screening and Testing" (BOOST). For the discovery of unknown gene-gene interactions that underlie complex diseases, BOOST allows examination of all pairwise interactions in genome-wide case-control studies in a remarkably fast manner. We have carried out interaction analyses on seven data sets from the Wellcome Trust Case Control Consortium (WTCCC). Each analysis took less than 60 hr to completely evaluate all pairs of roughly 360,000 SNPs on a standard 3.0 GHz desktop with 4G memory running the Windows XP system. The interaction patterns identified from the type 1 diabetes data set display significant difference from those identified from the rheumatoid arthritis data set, although both data sets share a very similar hit region in the WTCCC report. BOOST has also identified some disease-associated interactions between genes in the major histocompatibility complex region in the type 1 diabetes data set. We believe that our method can serve as a computationally and statistically useful tool in the coming era of large-scale interaction mapping in genome-wide case-control studies.
Collapse
|
41
|
Clark TG, Campino SG, Anastasi E, Auburn S, Teo YY, Small K, Rockett KA, Kwiatkowski DP, Holmes CC. A Bayesian approach using covariance of single nucleotide polymorphism data to detect differences in linkage disequilibrium patterns between groups of individuals. Bioinformatics 2010; 26:1999-2003. [PMID: 20554688 PMCID: PMC2916719 DOI: 10.1093/bioinformatics/btq327] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2010] [Revised: 06/10/2010] [Accepted: 06/11/2010] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Quantifying differences in linkage disequilibrium (LD) between sub-groups can highlight genetic regions or sites under selection and/or associated with disease, and may have utility in trans-ethnic mapping studies. RESULTS We present a novel pseudo Bayes factor (PBF) approach that assess differences in covariance of genotype frequencies from single nucleotide polymorphism (SNP) data from a genome-wide study. The magnitude of the PBF reflects the strength of evidence for a difference, while accounting for the sample size and number of SNPs, without the requirement for permutation testing to establish statistical significance. Application of the PBF to HapMap and Gambian malaria SNP data reveals regional LD differences, some known to be under selection. AVAILABILITY AND IMPLEMENTATION The PBF approach has been implemented in the BALD (Bayesian analysis of LD differences) C++ software, and is available from http://homepages.lshtm.ac.uk/tgclark/downloads.
Collapse
Affiliation(s)
- Taane G Clark
- Department of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, London, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Hrafnkelsson B, Helgason A, Jonsson GF, Gudbjartsson DF, Jonsson T, Thorvaldsson S, Stefansson H, Steinthorsdottir V, Vidarsdottir N, Middleton D, Petersen HS, Martinez C, Snaedal J, Jonsson PV, Bjornsson S, Gulcher JR, Stefansson K. Evaluating differences in linkage disequilibrium between populations. Ann Hum Genet 2010; 74:233-47. [PMID: 20529015 DOI: 10.1111/j.1469-1809.2010.00571.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
We propose two methods to evaluate the statistical significance of differences in linkage disequilibrium (LD) between populations, where LD is measured by the standardised parameter D'. The first method is based on bootstrapping individuals within populations in order to test LD differences for each pair of loci. Using this approach we propose a solution to the problem of testing multiple locus-pairs by means of a single test for the number of pairs that exhibit significant LD differences among populations. The second method provides the Bayesian posterior probability that one population has greater LD than the other for each locus pair. Both methods can handle genotypes with unknown phase, and are demonstrated using two data sets. For the purpose of demonstration, we apply the methods to two different sets of data from humans. First, we explore the issue of LD differences between reproductively isolated populations using a new data set of twelve Xq25 microsatellites, typed in four European populations. Second, we examine evidence for LD differences between Alzheimer cases and controls from the Icelandic population using 19 single nucleotide polymorphisms (SNPs) from a 97 kb region flanking the Apolipoprotein E (APOE) gene on chromosome 19.
Collapse
|
43
|
Wang K, Dickson SP, Stolle CA, Krantz ID, Goldstein DB, Hakonarson H. Interpretation of association signals and identification of causal variants from genome-wide association studies. Am J Hum Genet 2010; 86:730-42. [PMID: 20434130 PMCID: PMC2869011 DOI: 10.1016/j.ajhg.2010.04.003] [Citation(s) in RCA: 115] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2010] [Revised: 02/19/2010] [Accepted: 04/05/2010] [Indexed: 12/21/2022] Open
Abstract
GWAS have been successful in identifying disease susceptibility loci, but it remains a challenge to pinpoint the causal variants in subsequent fine-mapping studies. A conventional fine-mapping effort starts by sequencing dozens of randomly selected samples at susceptibility loci to discover candidate variants, which are then placed on custom arrays or used in imputation algorithms to find the causal variants. We propose that one or several rare or low-frequency causal variants can hitchhike the same common tag SNP, so causal variants may not be easily unveiled by conventional efforts. Here, we first demonstrate that the true effect size and proportion of variance explained by a collection of rare causal variants can be underestimated by a common tag SNP, thereby accounting for some of the "missing heritability" in GWAS. We then describe a case-selection approach based on phasing long-range haplotypes and sequencing cases predicted to harbor causal variants. We compare this approach with conventional strategies on a simulated data set, and we demonstrate its advantages when multiple causal variants are present. We also evaluate this approach in a GWAS on hearing loss, where the most common causal variant has a minor allele frequency (MAF) of 1.3% in the general population and 8.2% in 329 cases. With our case-selection approach, it is present in 88% of the 32 selected cases (MAF = 66%), so sequencing a subset of these cases can readily reveal the causal allele. Our results suggest that thinking beyond common variants is essential in interpreting GWAS signals and identifying causal variants.
Collapse
Affiliation(s)
- Kai Wang
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Samuel P. Dickson
- Institute for Genome Sciences & Policy, Center for Human Genome Variation, Duke University, Durham, NC 27708, USA
| | - Catherine A. Stolle
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Ian D. Krantz
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - David B. Goldstein
- Institute for Genome Sciences & Policy, Center for Human Genome Variation, Duke University, Durham, NC 27708, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
44
|
Haplotype analyses, mechanism and evolution of common double mutants in the human LDL receptor gene. Mol Genet Genomics 2010; 283:565-74. [PMID: 20428891 DOI: 10.1007/s00438-010-0541-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2009] [Accepted: 04/07/2010] [Indexed: 12/15/2022]
Abstract
Familial hypercholesterolemia (FH), an autosomal dominant inherited disorder resulting in increased levels of circulating plasma low-density lipoprotein (LDL), tendon xanthomas and premature coronary artery disease (CAD), is caused by defects in the LDL receptor gene (LDLR). Three widespread LDLR alterations not causing FH (c.1061-8T>C, c.2177C>T and c.829G>A) and one mutation (c.12G>A) with narrow geographical distribution and thought to cause disease were investigated. In an attempt to improve knowledge on their origin, spread and possible selective effects, estimations of the ages of these variants (t generations) and haplotype analysis were performed by genotyping 86 healthy individuals and 98 FH patients in Spain for five LDLR SNPs: c.81T>C, c.1413G>A, c.1725C>T, c.1959T>C, and c.2232G>A; most patients carried two of these LDLR variants simultaneously. It was found that both the c.1061-8T>C (t = 54) and c.2177C>T alterations (t = 62) arose at about the same time (54 and 62 generations ago, respectively) in the CGCTG haplotype, while the c.12G>A mutation (t = 70) appeared in a CGCCG haplotype carrying an earlier c.829G>A alteration (t = 83). The estimated ages of selectively neutral alterations could explain their distribution by migrations. The origin of the c.12G>A mutation could be in the Iberian Peninsula; despite its estimated age, a low selective pressure could explain its conservation in Spain from where it could have spread to China and Mexico, since the sixteenth century through the Spanish/Portuguese colonial expeditions.
Collapse
|
45
|
A recombination hotspot in a schizophrenia-associated region of GABRB2. PLoS One 2010; 5:e9547. [PMID: 20221451 PMCID: PMC2833194 DOI: 10.1371/journal.pone.0009547] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2009] [Accepted: 01/28/2010] [Indexed: 12/01/2022] Open
Abstract
Background Schizophrenia is a major disorder with complex genetic mechanisms. Earlier, population genetic studies revealed the occurrence of strong positive selection in the GABRB2 gene encoding the β2 subunit of GABAA receptors, within a segment of 3,551 bp harboring twenty-nine single nucleotide polymorphisms (SNPs) and containing schizophrenia-associated SNPs and haplotypes. Methodology/Principal Findings In the present study, the possible occurrence of recombination in this ‘S1–S29’ segment was assessed. The occurrence of hotspot recombination was indicated by high resolution recombination rate estimation, haplotype diversity, abundance of rare haplotypes, recurrent mutations and torsos in haplotype networks, and experimental haplotyping of somatic and sperm DNA. The sub-segment distribution of relative recombination strength, measured by the ratio of haplotype diversity (Hd) over mutation rate (θ), was indicative of a human specific Alu-Yi6 insertion serving as a central recombining sequence facilitating homologous recombination. Local anomalous DNA conformation attributable to the Alu-Yi6 element, as suggested by enhanced DNase I sensitivity and obstruction to DNA sequencing, could be a contributing factor of the increased sequence diversity. Linkage disequilibrium (LD) analysis yielded prominent low LD points that supported ongoing recombination. LD contrast revealed significant dissimilarity between control and schizophrenic cohorts. Among the large array of inferred haplotypes, H26 and H73 were identified to be protective, and H19 and H81 risk-conferring, toward the development of schizophrenia. Conclusions/Significance The co-occurrence of hotspot recombination and positive selection in the S1–S29 segment of GABRB2 has provided a plausible contribution to the molecular genetics mechanisms for schizophrenia. The present findings therefore suggest that genome regions characterized by the co-occurrence of positive selection and hotspot recombination, two interacting factors both affecting genetic diversity, merit close scrutiny with respect to the etiology of common complex disorders.
Collapse
|
46
|
Kim S, Morris NJ, Won S, Elston RC. Single-marker and two-marker association tests for unphased case-control genotype data, with a power comparison. Genet Epidemiol 2010; 34:67-77. [PMID: 19557751 PMCID: PMC2796706 DOI: 10.1002/gepi.20436] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
In case-control single nucleotide polymorphism (SNP) data, the allele frequency, Hardy Weinberg Disequilibrium, and linkage disequilibrium (LD) contrast tests are three distinct sources of information about genetic association. While all three tests are typically developed in a retrospective context, we show that prospective logistic regression models may be developed that correspond conceptually to the retrospective tests. This approach provides a flexible framework for conducting a systematic series of association analyses using unphased genotype data and any number of covariates. For a single stage study, two single-marker tests and four two-marker tests are discussed. The true association models are derived and they allow us to understand why a model with only a linear term will generally fit well for a SNP in weak LD with a causal SNP, whatever the disease model, but not for a SNP in high LD with a non-additive disease SNP. We investigate the power of the association tests using real LD parameters from chromosome 11 in the HapMap CEU population data. Among the single-marker tests, the allelic test has on average the most power in the case of an additive disease, but for dominant, recessive, and heterozygote disadvantage diseases, the genotypic test has the most power. Among the four two-marker tests, the Allelic-LD contrast test, which incorporates linear terms for two markers and their interaction term, provides the most reliable power overall for the cases studied. Therefore, our result supports incorporating an interaction term as well as linear terms in multi-marker tests.
Collapse
Affiliation(s)
- Sulgi Kim
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio
| | - Nathan J. Morris
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio
| | - Sungho Won
- Department of Biostatistics, Harvard University, Boston, Massachusetts
| | - Robert C. Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio
| |
Collapse
|
47
|
Riley B, Kuo PH, Maher BS, Fanous AH, Sun J, Wormley B, O’Neill FA, Walsh D, Zhao Z, Kendler KS. The dystrobrevin binding protein 1 (DTNBP1) gene is associated with schizophrenia in the Irish Case Control Study of Schizophrenia (ICCSS) sample. Schizophr Res 2009; 115:245-53. [PMID: 19800201 PMCID: PMC2783814 DOI: 10.1016/j.schres.2009.09.008] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/31/2009] [Revised: 09/01/2009] [Accepted: 09/07/2009] [Indexed: 10/20/2022]
Abstract
BACKGROUND DTNBP1 is associated with schizophrenia in many studies, but the associated alleles and haplotypes vary between samples. METHOD We assessed nine single nucleotide polymorphisms (SNPs) in this gene for association with schizophrenia in a new sample of 1021 cases and 626 controls from Ireland. RESULTS Four SNPs give evidence of association (0.000018<p<0.045), most strongly with the common allele at rs760761. A haplotype of the common alleles of five markers (including rs760761) and the minor allele of rs2619538 overlapping the 5' end of the DTNBP1 gene also gives evidence for association (p=0.0002). Secondary analyses showed no difference in the association signal based on sex or family history. These results are in agreement with the most consistently observed association with common alleles and common-allele haplotypes, reported in a previous study of Irish cases and controls but not in an Irish high-density family sample. Our results do not support the prior report from a Swedish sample of increased association in cases with a family history of psychotic illness. Comparison of human, chimpanzee and rhesus sequence suggest that rs760761 is a particularly variable position in the primate lineage. CONCLUSION This study provides further evidence from a large case/control sample for association of common DTNBP1 alleles and haplotypes with schizophrenia.
Collapse
Affiliation(s)
- Brien Riley
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA.
| | - Po-Hsiu Kuo
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| | - Brion S. Maher
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA, Department of Human & Molecular Genetics, Virginia Commonwealth University, Richmond, VA, USA, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| | - Ayman H. Fanous
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA, Department of Psychiatry, Georgetown University School of Medicine, Washington DC, USA, Mental Health Service Line, Washington VA Medical Center, Washington DC, USA
| | - Jingchun Sun
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| | - Brandon Wormley
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| | | | | | - Zhongming Zhao
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| | - Kenneth S. Kendler
- Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA, Department of Human & Molecular Genetics, Virginia Commonwealth University, Richmond, VA, USA, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
48
|
Yu K, Li Q, Bergen AW, Pfeiffer RM, Rosenberg PS, Caporaso N, Kraft P, Chatterjee N. Pathway analysis by adaptive combination of P-values. Genet Epidemiol 2009; 33:700-9. [PMID: 19333968 PMCID: PMC2790032 DOI: 10.1002/gepi.20422] [Citation(s) in RCA: 219] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
It is increasingly recognized that pathway analyses-a joint test of association between the outcome and a group of single nucleotide polymorphisms (SNPs) within a biological pathway-could potentially complement single-SNP analysis and provide additional insights for the genetic architecture of complex diseases. Building upon existing P-value combining methods, we propose a class of highly flexible pathway analysis approaches based on an adaptive rank truncated product statistic that can effectively combine evidence of associations over different SNPs and genes within a pathway. The statistical significance of the pathway-level test statistics is evaluated using a highly efficient permutation algorithm that remains computationally feasible irrespective of the size of the pathway and complexity of the underlying test statistics for summarizing SNP- and gene-level associations. We demonstrate through simulation studies that a gene-based analysis that treats the underlying genes, as opposed to the underlying SNPs, as the basic units for hypothesis testing, is a very robust and powerful approach to pathway-based association testing. We also illustrate the advantage of the proposed methods using a study of the association between the nicotinic receptor pathway and cigarette smoking behaviors.
Collapse
Affiliation(s)
- Kai Yu
- Division of Cancer Epidemiology and Genetics, NCI, Rockville, Maryland 20892, USA.
| | | | | | | | | | | | | | | |
Collapse
|
49
|
Teo YY, Fry AE, Bhattacharya K, Small KS, Kwiatkowski DP, Clark TG. Genome-wide comparisons of variation in linkage disequilibrium. Genome Res 2009; 19:1849-60. [PMID: 19541915 PMCID: PMC2765270 DOI: 10.1101/gr.092189.109] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 06/04/2009] [Indexed: 11/24/2022]
Abstract
Current genome-wide surveys of common diseases and complex traits fundamentally aim to detect indirect associations where the single nucleotide polymorphisms (SNPs) carrying the association signals are not biologically active but are in linkage disequilibrium (LD) with some unknown functional polymorphisms. Reproducing any novel discoveries from these genome-wide scans in independent studies is now a prerequisite for the putative findings to be accepted. Significant differences in patterns of LD between populations can affect the portability of phenotypic associations when the replication effort or meta-analyses are attempted in populations that are distinct from the original population in which the genome-wide study is performed. Here, we introduce a novel method for genome-wide analyses of LD variations between populations that allow the identification of candidate regions with different patterns of LD. The evidence of LD variation provided by the introduced method correlated with the degree of differences in the frequencies of the most common haplotype across the populations. Identified regions also resulted in greater variation in the success of replication attempts compared with random regions in the genome. A separate permutation strategy introduced for assessing LD variation in the absence of genome-wide data also correctly identified the expected variation in LD patterns in two well-established regions undergoing strong population-specific evolutionary pressure. Importantly, this method addresses whether a failure to reproduce a disease association in a disparate population is due to underlying differences in LD structure with an unknown functional polymorphism, which is vital in the current climate of replicating and fine-mapping established findings from genome-wide association studies.
Collapse
Affiliation(s)
- Yik Y Teo
- Wellcome Trust Centre for Human Genetics, University of Oxford, United Kingdom.
| | | | | | | | | | | |
Collapse
|
50
|
Uh HW, Houwing-Duistermaat JJ, Putter H, van Houwelingen HC. Assessment of global phase uncertainty in case-control studies. BMC Genet 2009; 10:54. [PMID: 19751505 PMCID: PMC2760579 DOI: 10.1186/1471-2156-10-54] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2009] [Accepted: 09/14/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In haplotype-based candidate gene studies a problem is that the genotype data are unphased, which results in haplotype ambiguity. The R(h)(2) measure 1 quantifies haplotype predictability from genotype data. It is computed for each individual haplotype, and for a measure of global relative efficiency a minimum R(h)(2) value is suggested. Alternatively, we developed methods directly based on the information content of haplotype frequency estimates to obtain global relative efficiency measures: R(A)(2) and R(D)(2) based on A- and D-optimality, respectively. All three methods are designed for single populations; they can be applied in cases only, controls only or the whole data. Therefore they are not necessarily optimal for haplotype testing in case-control studies. RESULTS A new global relative efficiency measure R(T)(2) was derived to maximize power of a simple test statistic that compares haplotype frequencies in cases and controls. Application to real data showed that our proposed method R(T)(2) gave a clear and summarizing measure for the case-control study conducted. Additionally this measure might be used for selection of individuals, who have the highest potential for improving power by resolving phase ambiguity. CONCLUSION Instead of using relative efficiency measure for cases only, controls only or their combined data, we link uncertainty measure to case-control studies directly. Hence, our global efficiency measure might be useful to assess whether data are informative or have enough power for estimation of a specific haplotype risk.
Collapse
Affiliation(s)
- Hae-Won Uh
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
| | | | - Hein Putter
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
| | - Hans C van Houwelingen
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|