1
|
Zhang X, Bell JT. Detecting genetic effects on phenotype variability to capture gene-by-environment interactions: a systematic method comparison. G3 (BETHESDA, MD.) 2024; 14:jkae022. [PMID: 38289865 PMCID: PMC10989912 DOI: 10.1093/g3journal/jkae022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/16/2024] [Accepted: 01/19/2024] [Indexed: 02/01/2024]
Abstract
Genetically associated phenotypic variability has been widely observed across organisms and traits, including in humans. Both gene-gene and gene-environment interactions can lead to an increase in genetically associated phenotypic variability. Therefore, detecting the underlying genetic variants, or variance Quantitative Trait Loci (vQTLs), can provide novel insights into complex traits. Established approaches to detect vQTLs apply different methodologies from variance-only approaches to mean-variance joint tests, but a comprehensive comparison of these methods is lacking. Here, we review available methods to detect vQTLs in humans, carry out a simulation study to assess their performance under different biological scenarios of gene-environment interactions, and apply the optimal approaches for vQTL identification to gene expression data. Overall, with a minor allele frequency (MAF) of less than 0.2, the squared residual value linear model (SVLM) and the deviation regression model (DRM) are optimal when the data follow normal and non-normal distributions, respectively. In addition, the Brown-Forsythe (BF) test is one of the optimal methods when the MAF is 0.2 or larger, irrespective of phenotype distribution. Additionally, a larger sample size and more balanced sample distribution in different exposure categories increase the power of BF, SVLM, and DRM. Our results highlight vQTL detection methods that perform optimally under realistic simulation settings and show that their relative performance depends on the phenotype distribution, allele frequency, sample size, and the type of exposure in the interaction model underlying the vQTL.
Collapse
Affiliation(s)
- Xiaopu Zhang
- Department of Twin Research and Genetic Epidemiology, King's College London, St Thomas’ Hospital, Westminster Bridge Road, London SE1 7EH, UK
| | - Jordana T Bell
- Department of Twin Research and Genetic Epidemiology, King's College London, St Thomas’ Hospital, Westminster Bridge Road, London SE1 7EH, UK
| |
Collapse
|
2
|
Alipour N, Kazemnejad A, Akbarzadeh M, Eskandari F, Zahedi AS, Daneshpour MS. Regularized Machine Learning Models for Prediction of Metabolic Syndrome Using GCKR, APOA5, and BUD13 Gene Variants: Tehran Cardiometabolic Genetic Study. CELL JOURNAL 2023; 25:536-545. [PMID: 37641415 PMCID: PMC10542204 DOI: 10.22074/cellj.2023.2000864.1294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/08/2023] [Accepted: 06/19/2023] [Indexed: 08/31/2023]
Abstract
OBJECTIVE Metabolic syndrome (MetS) is a complex multifactorial disorder that considerably burdens healthcare systems. We aim to classify MetS using regularized machine learning models in the presence of the risk variants of GCKR, BUD13 and APOA5, and environmental risk factors. MATERIALS AND METHODS A cohort study was conducted on 2,346 cases and 2,203 controls from eligible Tehran Cardiometabolic Genetic Study (TCGS) participants whose data were collected from 1999 to 2017. We used different regularization approaches [least absolute shrinkage and selection operator (LASSO), ridge regression (RR), elasticnet (ENET), adaptive LASSO (aLASSO), and adaptive ENET (aENET)] and a classical logistic regression (LR) model to classify MetS and select influential variables that predict MetS. Demographics, clinical features, and common polymorphisms in the GCKR, BUD13 and APOA5 genes of eligible participants were assessed to classify TCGS participant status in MetS development. The models' performance was evaluated by 10-repeated 10-fold crossvalidation. Various assessment measures of sensitivity, specificity, classification accuracy, and area under the receiver operating characteristic curve (AUC-ROC) and AUC-precision-recall (AUC-PR) curves were used to compare the models. RESULTS During the follow-up period, 50.38% of participants developed MetS. The groups were not similar in terms of baseline characteristics and risk variants. MetS was significantly associated with age, gender, schooling years, body mass index (BMI), and alternate alleles in all the risk variants, as indicated by LR. A comparison of accuracy, AUCROC, and AUC-PR metrics indicated that the regularization models outperformed LR. Regularized machine learning models provided comparable classification performances, whereas the aLASSO model was more parsimonious and selected fewer predictors. CONCLUSION Regularized machine learning models provided more accurate and parsimonious MetS classifying models. These high-performing diagnostic models can lay the foundation for clinical decision support tools that use genetic and demographical variables to locate individuals at high risk for MetS.
Collapse
Affiliation(s)
- Nadia Alipour
- Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Anoshirvan Kazemnejad
- Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran.
| | - Mahdi Akbarzadeh
- Cellular and Molecular Endocrine Research Centre, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farzad Eskandari
- Department of Statistics, Faculty of Statistics, Mathematics and Computer, Allameh Tabataba'i University, Tehran, Iran
| | - Asiyeh Sadat Zahedi
- Cellular and Molecular Endocrine Research Centre, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Maryam S Daneshpour
- Cellular and Molecular Endocrine Research Centre, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
3
|
Masjoudi S, Sedaghati-Khayat B, Givi NJ, Bonab LNH, Azizi F, Daneshpour MS. Kernel machine SNP set analysis finds the association of BUD13, ZPR1, and APOA5 variants with metabolic syndrome in Tehran Cardio-metabolic Genetics Study. Sci Rep 2021; 11:10305. [PMID: 33986338 PMCID: PMC8119714 DOI: 10.1038/s41598-021-89509-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 04/22/2021] [Indexed: 12/21/2022] Open
Abstract
Metabolic syndrome (MetS) is one of the most important risk factors for cardiovascular disease. The 11p23.3 chromosomal region plays a potential role in the pathogenesis of MetS. The present study aimed to assess the association between 18 single nucleotide polymorphisms (SNPs) located at the BUD13, ZPR1, and APOA5 genes with MetS in the Tehran Cardio-metabolic Genetics Study (TCGS). In 5421 MetS affected and non-affected participants, we analyzed the data using two models. The first model (MetS model) examined SNPs' association with MetS. The second model (HTg-MetS Model) examined the association of SNPs with MetS affection participants who had a high plasma triglyceride (TG). The four-gamete rules were used to make SNP sets from correlated nearby SNPs. The kernel machine regression models and single SNP regression evaluated the association between SNP sets and MetS. The kernel machine results showed two sets over three sets of correlated SNPs have a significant joint effect on both models (p < 0.0001). Also, single SNP regression results showed that the odds ratios (ORs) for both models are almost similar; however, the p-values had slightly higher significance levels in the HTg-MetS model. The strongest ORs in the HTg-MetS model belonged to the G allele in rs2266788 (MetS: OR = 1.3, p = 3.6 × 10–7; HTg-MetS: OR = 1.4, p = 2.3 × 10–11) and the T allele in rs651821 (MetS: OR = 1.3, p = 2.8 × 10–7; HTg-MetS: OR = 1.4, p = 3.6 × 10–11). In the present study, the kernel machine regression models could help assess the association between the BUD13, ZPR1, and APOA5 gene variants (11p23.3 region) with lipid-related traits in MetS and MetS affected with high TG.
Collapse
Affiliation(s)
- Sajedeh Masjoudi
- Cellular and Molecular Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, PO Box 19195-4763, Tehran, Iran
| | - Bahareh Sedaghati-Khayat
- Cellular and Molecular Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, PO Box 19195-4763, Tehran, Iran
| | - Niloufar Javanrouh Givi
- Cellular and Molecular Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, PO Box 19195-4763, Tehran, Iran
| | - Leila Najd Hassan Bonab
- Cellular and Molecular Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, PO Box 19195-4763, Tehran, Iran
| | - Fereidoun Azizi
- Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Maryam S Daneshpour
- Cellular and Molecular Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, PO Box 19195-4763, Tehran, Iran.
| |
Collapse
|
4
|
Imperfect Linkage Disequilibrium Generates Phantom Epistasis (& Perils of Big Data). G3-GENES GENOMES GENETICS 2019; 9:1429-1436. [PMID: 30877081 PMCID: PMC6505142 DOI: 10.1534/g3.119.400101] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The genetic architecture of complex human traits and diseases is affected by large number of possibly interacting genes, but detecting epistatic interactions can be challenging. In the last decade, several studies have alluded to problems that linkage disequilibrium can create when testing for epistatic interactions between DNA markers. However, these problems have not been formalized nor have their consequences been quantified in a precise manner. Here we use a conceptually simple three locus model involving a causal locus and two markers to show that imperfect LD can generate the illusion of epistasis, even when the underlying genetic architecture is purely additive. We describe necessary conditions for such "phantom epistasis" to emerge and quantify its relevance using simulations. Our empirical results demonstrate that phantom epistasis can be a very serious problem in GWAS studies (with rejection rates against the additive model greater than 0.28 for nominal p-values of 0.05, even when the model is purely additive). Some studies have sought to avoid this problem by only testing interactions between SNPs with R-sq. <0.1. We show that this threshold is not appropriate and demonstrate that the magnitude of the problem is even greater with large sample size, intermediate allele frequencies, and when the causal locus explains a large amount of phenotypic variance. We conclude that caution must be exercised when interpreting GWAS results derived from very large data sets showing strong evidence in support of epistatic interactions between markers.
Collapse
|
5
|
Zhang J, Wei Z, Cardinale CJ, Gusareva ES, Van Steen K, Sleiman P, Hakonarson H. Multiple Epistasis Interactions Within MHC Are Associated With Ulcerative Colitis. Front Genet 2019; 10:257. [PMID: 31001315 PMCID: PMC6456704 DOI: 10.3389/fgene.2019.00257] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 03/08/2019] [Indexed: 12/20/2022] Open
Abstract
Successful searching for epistasis is much challenging, which generally requires very large sample sizes and/or very dense marker information. We exploited the largest Crohn's disease (CD) dataset (18,000 cases + 34,000 controls) and ulcerative colitis (UC) dataset (14,000 cases + 34,000 controls) to date. Leveraging its dense marker information and the large sample size of this IBD dataset, we employed a two-step approach to exhaustively search for epistasis. We detected abundant genome-wide significant (p < 1 × 10-13) epistatic signals, all within the MHC region. These signals were reduced substantially when conditional on the additive background, but still nine pairs remained significant at the Immunochip-wide level (P < 1.1 × 10-8) in conditional tests for UC. All these nine epistatic interactions come from the MHC region, and each explains on average 0.15% of the phenotypic variance. Eight of them were replicated in a replication cohort. There are multiple but relatively weak interactions independent of the additive effects within the MHC region for UC. Our promising results warrant the search for epistasis in large data sets with dense markers, exploiting dependencies between markers.
Collapse
Affiliation(s)
- Jie Zhang
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, United States.,Adobe Inc., San Jose, CA, United States
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, United States
| | - Christopher J Cardinale
- The Children's Hospital of Philadelphia, Center for Applied Genomics, Philadelphia, PA, United States
| | - Elena S Gusareva
- GIGA-R Medical Genomics - BIO3, University of Liege, Avenue de l'Hôpital 11, Liège, Belgium
| | - Kristel Van Steen
- GIGA-R Medical Genomics - BIO3, University of Liege, Avenue de l'Hôpital 11, Liège, Belgium.,WELBIO-Walloon Excellence in Life Sciences and BIOtechnology, Liège, Belgium
| | - Patrick Sleiman
- The Children's Hospital of Philadelphia, Center for Applied Genomics, Philadelphia, PA, United States.,Division of Human Genetics, Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | | | - Hakon Hakonarson
- The Children's Hospital of Philadelphia, Center for Applied Genomics, Philadelphia, PA, United States.,Division of Human Genetics, Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
6
|
Howey R, Cordell HJ. Further investigations of the W-test for pairwise epistasis testing. Wellcome Open Res 2017; 2:54. [PMID: 28852712 PMCID: PMC5553086 DOI: 10.12688/wellcomeopenres.11926.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/17/2017] [Indexed: 12/30/2022] Open
Abstract
Background: In a recent paper, a novel W-test for pairwise epistasis testing was proposed that appeared, in computer simulations, to have higher power than competing alternatives. Application to genome-wide bipolar data detected significant epistasis between SNPs in genes of relevant biological function. Network analysis indicated that the implicated genes formed two separate interaction networks, each containing genes highly related to autism and neurodegenerative disorders. Methods: Here we investigate further the properties and performance of the W-test via theoretical evaluation, computer simulations and application to real data. Results: We demonstrate that, for common variants, the W-test is closely related to several existing tests of association allowing for interaction, including logistic regression on 8 degrees of freedom, although logistic regression can show inflated type I error for low minor allele frequencies, whereas the W-test shows good/conservative type I error control. Although in some situations the W-test can show higher power, logistic regression is not limited to tests on 8 degrees of freedom but can instead be tailored to impose greater structure on the assumed alternative hypothesis, offering a power advantage when the imposed structure matches the true structure. Conclusions: The W-test is a potentially useful method for testing for association - without necessarily implying interaction - between genetic variants disease, particularly when one or more of the genetic variants are rare. For common variants, the advantages of the W-test are less clear, and, indeed, there are situations where existing methods perform better. In our investigations, we further uncover a number of problems with the practical implementation and application of the W-test (to bipolar disorder) previously described, apparently due to inadequate use of standard data quality-control procedures. This observation leads us to urge caution in interpretation of the previously-presented results, most of which we consider are highly likely to be artefacts.
Collapse
Affiliation(s)
- Richard Howey
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, NE1 3BZ, UK
| | - Heather J Cordell
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, NE1 3BZ, UK
| |
Collapse
|
7
|
Zhang L, You Y, Wu Y, Zhang Y, Wang M, Song Y, Liu X, Kou C. Association of BUD13 polymorphisms with metabolic syndrome in Chinese population: a case-control study. Lipids Health Dis 2017; 16:127. [PMID: 28659142 PMCID: PMC5490231 DOI: 10.1186/s12944-017-0520-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Accepted: 06/16/2017] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND BUD13 homolog (BUD13), one of submits of the retention and splicing complex, was identified in yeast as a splicing factor that affected nuclear pre-mRNA retention. While more and more studies demonstrated that BUD13 played a potential role in the pathogenesis of metabolic syndrome (MetS). This objective was to reassess whether novel locus of BUD13 were linked to MetS and individual complements in the northeast of China. METHODS A total of 3850 individuals were recruited in this case-control study, including 1813 MetS cases and 2037 healthy controls. The diagnostic criteria was according to the International Diabetes Federation (IDF). Metabolic complements such as waist circumference (WC), triglyceride, high-density lipoprotein cholesterol (HDL-C), systolic and diastolic blood pressure (SBP and DBP), and fasting glucose were measured. We explored the association between two novel single nucleotide polymorphism (SNPs) of BUD13 (rs7118999 and rs10488698) and MetS and its complements. RESULTS Using binary logistic regression analysis we found that there were no significant associations between SNPs and MetS in different heritance models (all P > 0.05). However, novel locus of BUD13 were linked to individual complements in MetS cases. Rs7118999 conferred to risk of WC (P = 0.016) and the carrier of TT might have higher susceptibility to MetS. While rs10488698 was associated with HDL-C (P = 0.001) and the carrier of TT was significantly associated with higher level of HDL-C. CONCLUSIONS We concluded that novel mutations in BUD13 did not confer risk for MetS in our study population, but these mutations changed the level of metabolic complements.
Collapse
Affiliation(s)
- Lili Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, No. 1163 Xinmin Street, Changchun, Jilin province 130021 China
| | - Yueyue You
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, No. 1163 Xinmin Street, Changchun, Jilin province 130021 China
| | - Yanhua Wu
- Division of Clinical Epidemiology, First Hospital of Jilin University, Changchun, Jilin 130021 China
| | - Yangyu Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, No. 1163 Xinmin Street, Changchun, Jilin province 130021 China
| | - Mohan Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, No. 1163 Xinmin Street, Changchun, Jilin province 130021 China
| | - Yan Song
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, No. 1163 Xinmin Street, Changchun, Jilin province 130021 China
| | - Xinyu Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, No. 1163 Xinmin Street, Changchun, Jilin province 130021 China
| | - Changgui Kou
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, No. 1163 Xinmin Street, Changchun, Jilin province 130021 China
| |
Collapse
|
8
|
Goudey B, Abraham G, Kikianty E, Wang Q, Rawlinson D, Shi F, Haviv I, Stern L, Kowalczyk A, Inouye M. Interactions within the MHC contribute to the genetic architecture of celiac disease. PLoS One 2017; 12:e0172826. [PMID: 28282431 PMCID: PMC5345796 DOI: 10.1371/journal.pone.0172826] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2016] [Accepted: 02/10/2017] [Indexed: 01/04/2023] Open
Abstract
Interaction analysis of GWAS can detect signal that would be ignored by single variant analysis, yet few robust interactions in humans have been detected. Recent work has highlighted interactions in the MHC region between known HLA risk haplotypes for various autoimmune diseases. To better understand the genetic interactions underlying celiac disease (CD), we have conducted exhaustive genome-wide scans for pairwise interactions in five independent CD case-control studies, using a rapid model-free approach to examine over 500 billion SNP pairs in total. We found 14 independent interaction signals within the MHC region that achieved stringent replication criteria across multiple studies and were independent of known CD risk HLA haplotypes. The strongest independent CD interaction signal corresponded to genes in the HLA class III region, in particular PRRC2A and GPANK1/C6orf47, which are known to contain variants for non-Hodgkin's lymphoma and early menopause, co-morbidities of celiac disease. Replicable evidence for statistical interaction outside the MHC was not observed. Both within and between European populations, we observed striking consistency of two-locus models and model distribution. Within the UK population, models of CD based on both interactions and additive single-SNP effects increased explained CD variance by approximately 1% over those of single SNPs. The interactions signal detected across the five cohorts indicates the presence of novel associations in the MHC region that cannot be detected using additive models. Our findings have implications for the determination of genetic architecture and, by extension, the use of human genetics for validation of therapeutic targets.
Collapse
Affiliation(s)
- Benjamin Goudey
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
- Department of Computing and Information Systems, The University of Melbourne, Parkville, Victoria, Australia
- IBM Research, Australia, Level 5, Carlton, Victoria, Australia
| | - Gad Abraham
- Centre for Systems Genomics, The University of Melbourne, Parkville, Victoria, Australia
- School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia
- Department of Pathology, The University of Melbourne, Parkville, Victoria, Australia
| | - Eder Kikianty
- Department of Mathematics, University of Johannesburg, Auckland Park, South Africa
| | - Qiao Wang
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Dave Rawlinson
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Fan Shi
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Izhak Haviv
- Faculty of Medicine, Bar Ilan University, Safed, Israel
| | - Linda Stern
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Adam Kowalczyk
- NICTA Victoria Research Lab, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Epidemiology and Biostatistics, The University of Melbourne, Parkville, Victoria, Australia
- Center for Neural Engineering, The University of Melbourne, Parkville, Victoria, Australia
| | - Michael Inouye
- Centre for Systems Genomics, The University of Melbourne, Parkville, Victoria, Australia
- School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia
- Department of Pathology, The University of Melbourne, Parkville, Victoria, Australia
- * E-mail:
| |
Collapse
|
9
|
Wei WH, Loh CY, Worthington J, Eyre S. Immunochip Analyses of Epistasis in Rheumatoid Arthritis Confirm Multiple Interactions within MHC and Suggest Novel Non-MHC Epistatic Signals. J Rheumatol 2016; 43:839-45. [PMID: 26879349 DOI: 10.3899/jrheum.150836] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/06/2016] [Indexed: 12/22/2022]
Abstract
OBJECTIVE Studying statistical gene-gene interactions (epistasis) has been limited by the difficulties in performance, both statistically and computationally, in large enough sample numbers to gain sufficient power. Three large Immunochip datasets from cohort samples recruited in the United Kingdom, United States, and Sweden with European ancestry were used to examine epistasis in rheumatoid arthritis (RA). METHODS A full pairwise search was conducted in the UK cohort using a high-throughput tool and the resultant significant epistatic signals were tested for replication in the United States and Swedish cohorts. A forward selection approach was applied to remove redundant signals, while conditioning on the preidentified additive effects. RESULTS We detected abundant genome-wide significant (p < 1.0e-13) epistatic signals, all within the MHC region. These signals were reduced substantially, but a proportion remained significant (p < 1.0e-03) in conditional tests. We identified 11 independent epistatic interactions across the entire MHC, each explaining on average 0.12% of the phenotypic variance, nearly all replicated in both replication cohorts. We also identified non-MHC epistatic interactions between RA susceptible loci LOC100506023 and IRF5 with Immunochip-wide significance (p < 1.1e-08) and between 2 neighboring single-nucleotide polymorphism near PTPN22 that were in low linkage disequilibrium with independent interaction (p < 1.0e-05). Both non-MHC epistatic interactions were statistically replicated with a similar interaction pattern in the US cohort only. CONCLUSION There are multiple but relatively weak interactions independent of the additive effects in RA and a larger sample number is required to confidently assign additional non-MHC epistasis.
Collapse
Affiliation(s)
- Wen-Hua Wei
- From the Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester; National Institute for Health Research (NIHR) Manchester Musculoskeletal Biomedical Research Unit, Central Manchester National Health Service (NHS) Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK.W.H. Wei*, PhD, Lecturer in Statistical Genetics, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester; C.Y. Loh*, MRes, PhD Student, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester; J. Worthington, PhD, Professor of Chronic Disease Genetics, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester, and NIHR Manchester Musculoskeletal Biomedical Research Unit, Central Manchester NHS Foundation Trust, Manchester Academic Health Science Centre; S. Eyre, PhD, Senior Research Fellow on Rheumatological Disorders, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester, and NIHR Manchester Musculoskeletal Biomedical Research Unit, Central Manchester NHS Foundation Trust, Manchester Academic Health Science Centre.
| | - Chia-Yin Loh
- From the Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester; National Institute for Health Research (NIHR) Manchester Musculoskeletal Biomedical Research Unit, Central Manchester National Health Service (NHS) Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK.W.H. Wei*, PhD, Lecturer in Statistical Genetics, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester; C.Y. Loh*, MRes, PhD Student, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester; J. Worthington, PhD, Professor of Chronic Disease Genetics, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester, and NIHR Manchester Musculoskeletal Biomedical Research Unit, Central Manchester NHS Foundation Trust, Manchester Academic Health Science Centre; S. Eyre, PhD, Senior Research Fellow on Rheumatological Disorders, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester, and NIHR Manchester Musculoskeletal Biomedical Research Unit, Central Manchester NHS Foundation Trust, Manchester Academic Health Science Centre
| | - Jane Worthington
- From the Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester; National Institute for Health Research (NIHR) Manchester Musculoskeletal Biomedical Research Unit, Central Manchester National Health Service (NHS) Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK.W.H. Wei*, PhD, Lecturer in Statistical Genetics, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester; C.Y. Loh*, MRes, PhD Student, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester; J. Worthington, PhD, Professor of Chronic Disease Genetics, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester, and NIHR Manchester Musculoskeletal Biomedical Research Unit, Central Manchester NHS Foundation Trust, Manchester Academic Health Science Centre; S. Eyre, PhD, Senior Research Fellow on Rheumatological Disorders, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester, and NIHR Manchester Musculoskeletal Biomedical Research Unit, Central Manchester NHS Foundation Trust, Manchester Academic Health Science Centre
| | - Stephen Eyre
- From the Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester; National Institute for Health Research (NIHR) Manchester Musculoskeletal Biomedical Research Unit, Central Manchester National Health Service (NHS) Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK.W.H. Wei*, PhD, Lecturer in Statistical Genetics, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester; C.Y. Loh*, MRes, PhD Student, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester; J. Worthington, PhD, Professor of Chronic Disease Genetics, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester, and NIHR Manchester Musculoskeletal Biomedical Research Unit, Central Manchester NHS Foundation Trust, Manchester Academic Health Science Centre; S. Eyre, PhD, Senior Research Fellow on Rheumatological Disorders, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester, and NIHR Manchester Musculoskeletal Biomedical Research Unit, Central Manchester NHS Foundation Trust, Manchester Academic Health Science Centre
| |
Collapse
|
10
|
Abstract
Genome-wide association studies (GWASs) have become the focus of the statistical analysis of complex traits in humans, successfully shedding light on several aspects of genetic architecture and biological aetiology. Single-nucleotide polymorphisms (SNPs) are usually modelled as having additive, cumulative and independent effects on the phenotype. Although evidently a useful approach, it is often argued that this is not a realistic biological model and that epistasis (that is, the statistical interaction between SNPs) should be included. The purpose of this Review is to summarize recent directions in methodology for detecting epistasis and to discuss evidence of the role of epistasis in human complex trait variation. We also discuss the relevance of epistasis in the context of GWASs and potential hazards in the interpretation of statistical interaction terms.
Collapse
|
11
|
Wei WH, Guo Y, Kindt ASD, Merriman TR, Semple CA, Wang K, Haley CS. Abundant local interactions in the 4p16.1 region suggest functional mechanisms underlying SLC2A9 associations with human serum uric acid. Hum Mol Genet 2014; 23:5061-8. [PMID: 24821702 PMCID: PMC4159153 DOI: 10.1093/hmg/ddu227] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Human serum uric acid concentration (SUA) is a complex trait. A recent meta-analysis of multiple genome-wide association studies (GWAS) identified 28 loci associated with SUA jointly explaining only 7.7% of the SUA variance, with 3.4% explained by two major loci (SLC2A9 and ABCG2). Here we examined whether gene-gene interactions had any roles in regulating SUA using two large GWAS cohorts included in the meta-analysis [the Atherosclerosis Risk in Communities study cohort (ARIC) and the Framingham Heart Study cohort (FHS)]. We found abundant genome-wide significant local interactions in ARIC in the 4p16.1 region located mostly in an intergenic area near SLC2A9 that were not driven by linkage disequilibrium and were replicated in FHS. Taking the forward selection approach, we constructed a model of five SNPs with marginal effects and three epistatic SNP pairs in ARIC-three marginal SNPs were located within SLC2A9 and the remaining SNPs were all located in the nearby intergenic area. The full model explained 1.5% more SUA variance than that explained by the lead SNP alone, but only 0.3% was contributed by the marginal and epistatic effects of the SNPs in the intergenic area. Functional analysis revealed strong evidence that the epistatically interacting SNPs in the intergenic area were unusually enriched at enhancers active in ENCODE hepatic (HepG2, P = 4.7E-05) and precursor red blood (K562, P = 5.0E-06) cells, putatively regulating transcription of WDR1 and SLC2A9. These results suggest that exploring epistatic interactions is valuable in uncovering the complex functional mechanisms underlying the 4p16.1 region.
Collapse
Affiliation(s)
- Wen-Hua Wei
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK, Arthritis Research UK Centre for Genetics and Genomics, Institute of Inflammation and Repair, Faculty of Medical and Human Sciences, Manchester Academic Health Science Centre, University of Manchester, Oxford Road, Manchester M13 9PT, UK,
| | - Yunfei Guo
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, CA, USA
| | - Alida S D Kindt
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - Tony R Merriman
- Department of Biochemistry, University of Otago, PO Box 56, Dunedin, New Zealand
| | - Colin A Semple
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| | - Kai Wang
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, CA, USA
| | - Chris S Haley
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
| |
Collapse
|
12
|
Howey R, Cordell HJ. Imputation without doing imputation: a new method for the detection of non-genotyped causal variants. Genet Epidemiol 2014; 38:173-90. [PMID: 24535679 PMCID: PMC4150535 DOI: 10.1002/gepi.21792] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Revised: 12/30/2013] [Accepted: 12/31/2013] [Indexed: 01/22/2023]
Abstract
Genome-wide association studies allow detection of non-genotyped disease-causing variants through testing of nearby genotyped SNPs. This approach may fail when there are no genotyped SNPs in strong LD with the causal variant. Several genotyped SNPs in weak LD with the causal variant may, however, considered together, provide equivalent information. This observation motivates popular but computationally intensive approaches based on imputation or haplotyping. Here we present a new method and accompanying software designed for this scenario. Our approach proceeds by selecting, for each genotyped "anchor" SNP, a nearby genotyped "partner" SNP, chosen via a specific algorithm we have developed. These two SNPs are used as predictors in linear or logistic regression analysis to generate a final significance test. In simulations, our method captures much of the signal captured by imputation, while taking a fraction of the time and disc space, and generating a smaller number of false-positives. We apply our method to a case/control study of severe malaria genotyped using the Affymetrix 500K array. Previous analysis showed that fine-scale sequencing of a Gambian reference panel in the region of the known causal locus, followed by imputation, increased the signal of association to genome-wide significance levels. Our method also increases the signal of association from P ≈ 2 × 10⁻⁶ to P ≈ 6 × 10⁻¹¹. Our method thus, in some cases, eliminates the need for more complex methods such as sequencing and imputation, and provides a useful additional test that may be used to identify genetic regions of interest.
Collapse
Affiliation(s)
- Richard Howey
- Institute of Genetic Medicine, Newcastle University, International Centre for Life, Central ParkwayNewcastle upon Tyne, United Kingdom
| | - Heather J Cordell
- Institute of Genetic Medicine, Newcastle University, International Centre for Life, Central ParkwayNewcastle upon Tyne, United Kingdom
| |
Collapse
|