1
|
UBASH3A Interacts with PTPN22 to Regulate IL2 Expression and Risk for Type 1 Diabetes. Int J Mol Sci 2023; 24:ijms24108671. [PMID: 37240014 DOI: 10.3390/ijms24108671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 05/09/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
UBASH3A is a negative regulator of T cell activation and IL-2 production and plays key roles in autoimmunity. Although previous studies revealed the individual effects of UBASH3A on risk for type 1 diabetes (T1D; a common autoimmune disease), the relationship of UBASH3A with other T1D risk factors remains largely unknown. Given that another well-known T1D risk factor, PTPN22, also inhibits T cell activation and IL-2 production, we investigated the relationship between UBASH3A and PTPN22. We found that UBASH3A, via its Src homology 3 (SH3) domain, physically interacts with PTPN22 in T cells, and that this interaction is not altered by the T1D risk coding variant rs2476601 in PTPN22. Furthermore, our analysis of RNA-seq data from T1D cases showed that the amounts of UBASH3A and PTPN22 transcripts exert a cooperative effect on IL2 expression in human primary CD8+ T cells. Finally, our genetic association analyses revealed that two independent T1D risk variants, rs11203203 in UBASH3A and rs2476601 in PTPN22, interact statistically, jointly affecting risk for T1D. In summary, our study reveals novel interactions, both biochemical and statistical, between two independent T1D risk loci, and suggests how these interactions may affect T cell function and increase risk for T1D.
Collapse
|
2
|
GADGETS: a genetic algorithm for detecting epistasis using nuclear families. Bioinformatics 2022; 38:1052-1058. [PMID: 34788792 PMCID: PMC10060691 DOI: 10.1093/bioinformatics/btab766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 10/08/2021] [Accepted: 11/03/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Epistasis may play an etiologic role in complex diseases, but research has been hindered because identification of interactions among sets of single nucleotide polymorphisms (SNPs) requires exploration of immense search spaces. Current approaches using nuclear families accommodate at most several hundred candidate SNPs. RESULTS GADGETS detects epistatic SNP-sets by applying a genetic algorithm to case-parent or case-sibling data. To allow for multiple epistatic sets, island subpopulations of SNP-sets evolve separately under selection for evident joint relevance to disease risk. The software evaluates the identified SNP-sets via permutation testing and provides graphical visualization. GADGETS correctly identified epistatic SNP-sets in realistically simulated case-parent triads with 10 000 candidate SNPs, far more SNPs than competitors can handle, and it outperformed competitors in simulations with many fewer SNPs. Applying GADGETS to family-based oral-clefting data from dbGaP identified SNP-sets with possible epistatic effects on risk. AVAILABILITY AND IMPLEMENTATION GADGETS is part of the epistasisGA package at https://github.com/mnodzenski/epistasisGA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
3
|
Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure. BioData Min 2021; 14:16. [PMID: 33608043 PMCID: PMC7893746 DOI: 10.1186/s13040-021-00247-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 02/07/2021] [Indexed: 12/15/2022] Open
Abstract
Background In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. Conclusion We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00247-w.
Collapse
|
4
|
Exploring gene-gene interaction in family-based data with an unsupervised machine learning method: EPISFA. Genet Epidemiol 2020; 44:811-824. [PMID: 32869348 DOI: 10.1002/gepi.22342] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 06/06/2020] [Accepted: 06/21/2020] [Indexed: 11/06/2022]
Abstract
Gene-gene interaction (G × G) is thought to fill the gap between the estimated heritability of complex diseases and the limited genetic proportion explained by identified single-nucleotide polymorphisms. The current tools for exploring G × G were often developed for case-control designs with less considerations for their applications in families. Family-based studies are robust against bias led from population stratification in genetic studies and helpful in understanding G × G. We proposed a new algorithm epistasis sparse factor analysis (EPISFA) and epistasis sparse factor analysis for linkage disequilibrium (EPISFA-LD) based on unsupervised machine learning to screen G × G. Extensive simulations were performed to compare EPISFA/EPISFA-LD with a classical family-based algorithm FAM-MDR (family-based multifactor dimensionality reduction). The results showed that EPISFA/EPISFA-LD is a tool of both high power and computational efficiency that could be applied in family designs and is applicable within high-dimensionality datasets. Finally, we applied EPISFA/EPISFA-LD to a real dataset drawn from the Fangshan/family-based Ischemic Stroke Study in China. Five pairs of G × G were discovered by EPISFA/EPISFA-LD, including three pairs verified by other algorithms (FAM-MDR and logistic), and an additional two pairs uniquely identified by EPISFA/EPISFA-LD only. The results from EPISFA might offer new insights for understanding the genetic etiology of complex diseases. EPISFA/EPISFA-LD was implemented in R. All relevant source code as well as simulated data could be freely downloaded from https://github.com/doublexism/episfa.
Collapse
|
5
|
Measuring gene-gene interaction using Kullback-Leibler divergence. Ann Hum Genet 2019; 83:405-417. [PMID: 31206606 DOI: 10.1111/ahg.12324] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 03/30/2019] [Accepted: 04/12/2019] [Indexed: 12/29/2022]
Abstract
Genome-wide association studies (GWAS) are used to investigate genetic variants contributing to complex traits. Despite discovering many loci, a large proportion of "missing" heritability remains unexplained. Gene-gene interactions may help explain some of this gap. Traditionally, gene-gene interactions have been evaluated using parametric statistical methods such as linear and logistic regression, with multifactor dimensionality reduction (MDR) used to address sparseness of data in high dimensions. We propose a method for the analysis of gene-gene interactions across independent single-nucleotide polymorphisms (SNPs) in two genes. Typical methods for this problem use statistics based on an asymptotic chi-squared mixture distribution, which is not easy to use. Here, we propose a Kullback-Leibler-type statistic, which follows an asymptotic, positive, normal distribution under the null hypothesis of no relationship between SNPs in the two genes, and normally distributed under the alternative hypothesis. The performance of the proposed method is evaluated by simulation studies, which show promising results. The method is also used to analyze real data and identifies gene-gene interactions among RAB3A, MADD, and PTPRN on type 2 diabetes (T2D) status.
Collapse
|
6
|
Statistical Methods and Software for Substance Use and Dependence Genetic Research. Curr Genomics 2019; 20:172-183. [PMID: 31929725 PMCID: PMC6935956 DOI: 10.2174/1389202920666190617094930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 05/16/2019] [Accepted: 05/24/2019] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Substantial substance use disorders and related health conditions emerged dur-ing the mid-20th century and continue to represent a remarkable 21st century global burden of disease. This burden is largely driven by the substance-dependence process, which is a complex process and is influenced by both genetic and environmental factors. During the past few decades, a great deal of pro-gress has been made in identifying genetic variants associated with Substance Use and Dependence (SUD) through linkage, candidate gene association, genome-wide association and sequencing studies. METHODS Various statistical methods and software have been employed in different types of SUD ge-netic studies, facilitating the identification of new SUD-related variants. CONCLUSION In this article, we review statistical methods and software that are currently available for SUD genetic studies, and discuss their strengths and limitations.
Collapse
|
7
|
TrioMDR: Detecting SNP interactions in trio families with model-based multifactor dimensionality reduction. Genomics 2018; 111:1176-1182. [PMID: 30055230 DOI: 10.1016/j.ygeno.2018.07.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 07/11/2018] [Accepted: 07/15/2018] [Indexed: 12/18/2022]
Abstract
Single nucleotide polymorphism (SNP) interactions can explain the missing heritability of common complex diseases. Many interaction detection methods have been proposed in genome-wide association studies, and they can be divided into two types: population-based and family-based. Compared with population-based methods, family-based methods are robust vs. population stratification. Several family-based methods have been proposed, among which Multifactor Dimensionality Reduction (MDR)-based methods are popular and powerful. However, current MDR-based methods suffer from heavy computational burden. Furthermore, they do not allow for main effect adjustment. In this work we develop a two-stage model-based MDR approach (TrioMDR) to detect multi-locus interaction in trio families (i.e., two parents and one affected child). TrioMDR combines the MDR framework with logistic regression models to check interactions, so TrioMDR can adjust main effects. In addition, unlike consuming permutation procedures used in traditional MDR-based methods, TrioMDR utilizes a simple semi-parameter P-values correction procedure to control type I error rate, this procedure only uses a few permutations to achieve the significance of a multi-locus model and significantly speeds up TrioMDR. We performed extensive experiments on simulated data to compare the type I error and power of TrioMDR under different scenarios. The results demonstrate that TrioMDR is fast and more powerful in general than some recently proposed methods for interaction detection in trios. The R codes of TrioMDR are available at: https://github.com/TrioMDR/TrioMDR.
Collapse
|
8
|
|
9
|
Detecting multi-way epistasis in family-based association studies. Brief Bioinform 2017; 18:394-402. [PMID: 27178992 DOI: 10.1093/bib/bbw039] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2015] [Indexed: 11/13/2022] Open
Abstract
The era of genome-wide association studies (GWAS) has led to the discovery of numerous genetic variants associated with disease. Better understanding of whether these or other variants interact leading to differential risk compared with individual marker effects will increase our understanding of the genetic architecture of disease, which may be investigated using the family-based study design. We present M-TDT (the multi-locus transmission disequilibrium test), a tool for detecting family-based multi-locus multi-allelic effects for qualitative or quantitative traits, extended from the original transmission disequilibrium test (TDT). Tests to handle the comparison between additive and epistatic models, lack of independence between markers and multiple offspring are described. Performance of M-TDT is compared with a multifactor dimensionality reduction (MDR) approach designed for investigating families in the hypothesis-free genome-wide setting (the multifactor dimensionality reduction pedigree disequilibrium test, MDR-PDT). Other methods derived from the TDT or MDR to investigate genetic interaction in the family-based design are also discussed. The case of three independent biallelic loci is illustrated using simulations for one- to three-locus alternative hypotheses. M-TDT identified joint-locus effects and distinguished effectively between additive and epistatic models. We showed a practical example of M-TDT based on three genes already known to be implicated in malaria susceptibility. Our findings demonstrate the value of M-TDT in a hypothesis-driven context to test for multi-way epistasis underlying common disease etiology, whereas MDR-PDT-based methods are more appropriate in a hypothesis-free genome-wide setting.
Collapse
|
10
|
GCORE-sib: An efficient gene-gene interaction tool for genome-wide association studies based on discordant sib pairs. BMC Bioinformatics 2016; 17:273. [PMID: 27391654 PMCID: PMC4939061 DOI: 10.1186/s12859-016-1145-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 07/01/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A computationally efficient tool is required for a genome-wide gene-gene interaction analysis that tests an extremely large number of single-nucleotide polymorphism (SNP) interaction pairs in genome-wide association studies (GWAS). Current tools for GWAS interaction analysis are mainly developed for unrelated case-control samples. Relatively fewer tools for interaction analysis are available for complex disease studies with family-based design, and these tools tend to be computationally expensive. RESULTS We developed a fast gene-gene interaction test, GCORE-sib, for discordant sib pairs and implemented the test into an efficient tool. We used simulations to demonstrate that the GCORE-sib has correct type I error rates and has comparable power to that of the regression-based interaction test. We also showed that the GCORE-sib can run more than 10 times faster than the regression-based test. Finally, the GCORE-sib was applied to a GWAS dataset with approximately 2,000 discordant sib pairs, and the GCORE-sib finished testing 19,368,078,382 pairs of SNPs within 6 days. CONCLUSIONS An efficient gene-gene interaction tool for discordant sib pairs was developed. It will be very useful for genome-wide gene-gene interaction analysis in GWAS using discordant sib pairs. The tool can be downloaded for free at http://gcore-sib.sourceforge.net .
Collapse
|
11
|
A Clustered Multiclass Likelihood-Ratio Ensemble Method for Family-Based Association Analysis Accounting for Phenotypic Heterogeneity. Genet Epidemiol 2016; 40:512-9. [PMID: 27321816 DOI: 10.1002/gepi.21987] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Revised: 05/04/2016] [Accepted: 05/08/2016] [Indexed: 12/24/2022]
Abstract
Although compelling evidence suggests that the genetic etiology of complex diseases could be heterogeneous in subphenotype groups, little attention has been paid to phenotypic heterogeneity in genetic association analysis of complex diseases. Simply ignoring phenotypic heterogeneity in association analysis could result in attenuated estimates of genetic effects and low power of association tests if subphenotypes with similar clinical manifestations have heterogeneous underlying genetic etiologies. To facilitate the family-based association analysis allowing for phenotypic heterogeneity, we propose a clustered multiclass likelihood-ratio ensemble (CMLRE) method. The proposed method provides an alternative way to model the complex relationship between disease outcomes and genetic variants. It allows for heterogeneous genetic causes of disease subphenotypes and can be applied to various pedigree structures. Through simulations, we found CMLRE outperformed the commonly adopted strategies in a variety of underlying disease scenarios. We further applied CMLRE to a family-based dataset from the International Consortium to Identify Genes and Interactions Controlling Oral Clefts (ICOC) to investigate the genetic variants and interactions predisposing to subphenotypes of oral clefts. The analysis suggested that two subphenotypes, nonsyndromic cleft lip without palate (CL) and cleft lip with palate (CLP), shared similar genetic etiologies, while cleft palate only (CP) had its own genetic mechanism. The analysis further revealed that rs10863790 (IRF6), rs7017252 (8q24), and rs7078160 (VAX1) were jointly associated with CL/CLP, while rs7969932 (TBK1), rs227731 (17q22), and rs2141765 (TBK1) jointly contributed to CP.
Collapse
|
12
|
An efficient gene-gene interaction test for genome-wide association studies in trio families. Bioinformatics 2016; 32:1848-55. [PMID: 26873927 PMCID: PMC5939888 DOI: 10.1093/bioinformatics/btw077] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Revised: 01/04/2016] [Accepted: 02/04/2016] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Several efficient gene-gene interaction tests have been developed for unrelated case-control samples in genome-wide association studies (GWAS), making it possible to test tens of billions of interaction pairs of single-nucleotide polymorphisms (SNPs) in a reasonable timeframe. However, current family-based gene-gene interaction tests are computationally expensive and are not applicable to genome-wide interaction analysis. RESULTS We developed an efficient family-based gene-gene interaction test, GCORE, for trios (i.e. two parents and one affected sib). The GCORE compares interlocus correlations at two SNPs between the transmitted and non-transmitted alleles. We used simulation studies to compare the statistical properties such as type I error rates and power for the GCORE with several other family-based interaction tests under various scenarios. We applied the GCORE to a family-based GWAS for autism consisting of approximately 2000 trios. Testing a total of 22 471 383 013 interaction pairs in the GWAS can be finished in 36 h by the GCORE without large-scale computing resources, demonstrating that the test is practical for genome-wide gene-gene interaction analysis in trios. AVAILABILITY AND IMPLEMENTATION GCORE is implemented with C ++ and is available at http://gscore.sourceforge.net CONTACT rchung@nhri.org.tw SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
13
|
Evaluation of potential novel variations and their interactions related to bipolar disorders: analysis of genome-wide association study data. Neuropsychiatr Dis Treat 2016; 12:2997-3004. [PMID: 27920536 PMCID: PMC5127431 DOI: 10.2147/ndt.s112558] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Multifactor dimensionality reduction (MDR) is a nonparametric approach that can be used to detect relevant interactions between single-nucleotide polymorphisms (SNPs). The aim of this study was to build the best genomic model based on SNP associations and to identify candidate polymorphisms that are the underlying molecular basis of the bipolar disorders. METHODS This study was performed on Whole-Genome Association Study of Bipolar Disorder (dbGaP [database of Genotypes and Phenotypes] study accession number: phs000017.v3.p1) data. After preprocessing of the genotyping data, three classification-based data mining methods (ie, random forest, naïve Bayes, and k-nearest neighbor) were performed. Additionally, as a nonparametric, model-free approach, the MDR method was used to evaluate the SNP profiles. The validity of these methods was evaluated using true classification rate, recall (sensitivity), precision (positive predictive value), and F-measure. RESULTS Random forests, naïve Bayes, and k-nearest neighbors identified 16, 13, and ten candidate SNPs, respectively. Surprisingly, the top six SNPs were reported by all three methods. Random forests and k-nearest neighbors were more successful than naïve Bayes, with recall values >0.95. On the other hand, MDR generated a model with comparable predictive performance based on five SNPs. Although different SNP profiles were identified in MDR compared to the classification-based models, all models mapped SNPs to the DOCK10 gene. CONCLUSION Three classification-based data mining approaches, random forests, naïve Bayes, and k-nearest neighbors, have prioritized similar SNP profiles as predictors of bipolar disorders, in contrast to MDR, which has found different SNPs through analysis of two-way and three-way interactions. The reduced number of associated SNPs discovered by MDR, without loss in the classification performance, would facilitate validation studies and decision support models, and would reduce the cost to develop predictive and diagnostic tests. Nevertheless, we need to emphasize that translation of genomic models to the clinical setting requires models with higher classification performance.
Collapse
|
14
|
Abstract
Complex diseases are defined to be determined by multiple genetic and environmental factors alone as well as in interactions. To analyze interactions in genetic data, many statistical methods have been suggested, with most of them relying on statistical regression models. Given the known limitations of classical methods, approaches from the machine-learning community have also become attractive. From this latter family, a fast-growing collection of methods emerged that are based on the Multifactor Dimensionality Reduction (MDR) approach. Since its first introduction, MDR has enjoyed great popularity in applications and has been extended and modified multiple times. Based on a literature search, we here provide a systematic and comprehensive overview of these suggested methods. The methods are described in detail, and the availability of implementations is listed. Most recent approaches offer to deal with large-scale data sets and rare variants, which is why we expect these methods to even gain in popularity.
Collapse
|
15
|
UGMDR: a unified conceptual framework for detection of multifactor interactions underlying complex traits. Heredity (Edinb) 2014; 114:255-61. [PMID: 25335557 DOI: 10.1038/hdy.2014.94] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Revised: 08/30/2014] [Accepted: 09/04/2014] [Indexed: 01/17/2023] Open
Abstract
Biological outcomes are governed by multiple genetic and environmental factors that act in concert. Determining multifactor interactions is the primary topic of interest in recent genetics studies but presents enormous statistical and mathematical challenges. The computationally efficient multifactor dimensionality reduction (MDR) approach has emerged as a promising tool for meeting these challenges. On the other hand, complex traits are expressed in various forms and have different data generation mechanisms that cannot be appropriately modeled by a dichotomous model; the subjects in a study may be recruited according to its own analytical goals, research strategies and resources available, not only consisting of homogeneous unrelated individuals. Although several modifications and extensions of MDR have in part addressed the practical problems, they are still limited in statistical analyses of diverse phenotypes, multivariate phenotypes and correlated observations, correcting for potential population stratification and unifying both unrelated and family samples into a more powerful analysis. I propose a comprehensive statistical framework, referred as to unified generalized MDR (UGMDR), for systematic extension of MDR. The proposed approach is quite versatile, not only allowing for covariate adjustment, being suitable for analyzing almost any trait type, for example, binary, count, continuous, polytomous, ordinal, time-to-onset, multivariate and others, as well as combinations of those, but also being applicable to various study designs, including homogeneous and admixed unrelated-subject and family as well as mixtures of them. The proposed UGMDR offers an important addition to the arsenal of analytical tools for identifying nonlinear multifactor interactions and unraveling the genetic architecture of complex traits.
Collapse
|
16
|
|
17
|
Gene-Gene and Gene-Environment Interactions Underlying Complex Traits and their Detection. BIOMETRICS & BIOSTATISTICS INTERNATIONAL JOURNAL 2014; 1:00007. [PMID: 25584363 PMCID: PMC4288817 DOI: 10.15406/bbij.2014.01.00007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
|
18
|
Multivariate generalized multifactor dimensionality reduction to detect gene-gene interactions. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S15. [PMID: 24565370 PMCID: PMC4029529 DOI: 10.1186/1752-0509-7-s6-s15] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Background Recently, one of the greatest challenges in genome-wide association studies is to detect gene-gene and/or gene-environment interactions for common complex human diseases. Ritchie et al. (2001) proposed multifactor dimensionality reduction (MDR) method for interaction analysis. MDR is a combinatorial approach to reduce multi-locus genotypes into high-risk and low-risk groups. Although MDR has been widely used for case-control studies with binary phenotypes, several extensions have been proposed. One of these methods, a generalized MDR (GMDR) proposed by Lou et al. (2007), allows adjusting for covariates and applying to both dichotomous and continuous phenotypes. GMDR uses the residual score of a generalized linear model of phenotypes to assign either high-risk or low-risk group, while MDR uses the ratio of cases to controls. Methods In this study, we propose multivariate GMDR, an extension of GMDR for multivariate phenotypes. Jointly analysing correlated multivariate phenotypes may have more power to detect susceptible genes and gene-gene interactions. We construct generalized estimating equations (GEE) with multivariate phenotypes to extend generalized linear models. Using the score vectors from GEE we discriminate high-risk from low-risk groups. We applied the multivariate GMDR method to the blood pressure data of the 7,546 subjects from the Korean Association Resource study: systolic blood pressure (SBP) and diastolic blood pressure (DBP). We compare the results of multivariate GMDR for SBP and DBP to the results from separate univariate GMDR for SBP and DBP, respectively. We also applied the multivariate GMDR method to the repeatedly measured hypertension status from 5,466 subjects and compared its result with those of univariate GMDR at each time point. Results Results from the univariate GMDR and multivariate GMDR in two-locus model with both blood pressures and hypertension phenotypes indicate best combinations of SNPs whose interaction has significant association with risk for high blood pressures or hypertension. Although the test balanced accuracy (BA) of multivariate analysis was not always greater than that of univariate analysis, the multivariate BAs were more stable with smaller standard deviations. Conclusions In this study, we have developed multivariate GMDR method using GEE approach. It is useful to use multivariate GMDR with correlated multiple phenotypes of interests.
Collapse
|
19
|
A unified GMDR method for detecting gene-gene interactions in family and unrelated samples with application to nicotine dependence. Hum Genet 2013; 133:139-50. [PMID: 24057800 DOI: 10.1007/s00439-013-1361-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 09/05/2013] [Indexed: 11/26/2022]
Abstract
Gene-gene and gene-environment interactions govern a substantial portion of the variation in complex traits and diseases. In convention, a set of either unrelated or family samples are used in detection of such interactions; even when both kinds of data are available, the unrelated and the family samples are analyzed separately, potentially leading to loss in statistical power. In this report, to detect gene-gene interactions we propose a generalized multifactor dimensionality reduction method that unifies analyses of nuclear families and unrelated subjects within the same statistical framework. We used principal components as genetic background controls against population stratification, and when sibling data are included, within-family control were used to correct for potential spurious association at the tested loci. Through comprehensive simulations, we demonstrate that the proposed method can remarkably increase power by pooling unrelated and offspring's samples together as compared with individual analysis strategies and the Fisher's combining p value method while it retains a controlled type I error rate in the presence of population structure. In application to a real dataset, we detected one significant tetragenic interaction among CHRNA4, CHRNB2, BDNF, and NTRK2 associated with nicotine dependence in the Study of Addiction: Genetics and Environment sample, suggesting the biological role of these genes in nicotine dependence development.
Collapse
|
20
|
Catecholaminergic gene variants: contribution in ADHD and associated comorbid attributes in the eastern Indian probands. BIOMED RESEARCH INTERNATIONAL 2013; 2013:918410. [PMID: 24163823 PMCID: PMC3791561 DOI: 10.1155/2013/918410] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Revised: 08/07/2013] [Accepted: 08/12/2013] [Indexed: 12/25/2022]
Abstract
Contribution of genes in attention deficit hyperactivity disorder (ADHD) has been explored in various populations, and several genes were speculated to contribute small but additive effects. We have assessed variants in four genes, DDC (rs3837091 and rs3735273), DRD2 (rs1800496, rs1801028, and rs1799732), DRD4 (rs4646984 and rs4646983), and COMT (rs165599 and rs740603) in Indian ADHD subjects with comorbid attributes. Cases were recruited following the Diagnostic and Statistical Manual for Mental Disorders-IV-TR after obtaining informed written consent. DNA isolated from peripheral blood leukocytes of ADHD probands (N = 170), their parents (N = 310), and ethnically matched controls (n = 180) was used for genotyping followed by population- and family-based analyses by the UNPHASED program. DRD4 sites showed significant difference in allelic frequencies by case-control analysis, while DDC and COMT exhibited bias in familial transmission (P < 0.05). rs3837091 “AGAG,” rs3735273 “A,” rs1799732 “C,” rs740603 “G,” rs165599 “G” and single repeat alleles of rs4646984/rs4646983 showed positive correlation with co-morbid characteristics (P < 0.05). Multi dimensionality reduction analysis of case-control data revealed significant interactive effects of all four genes (P < 0.001), while family-based data showed interaction between DDC and DRD2 (P = 0.04). This first study on these gene variants in Indo-Caucasoid ADHD probands and associated co-morbid conditions indicates altered dopaminergic neurotransmission in ADHD.
Collapse
|
21
|
Genetic association and gene-gene interaction analyses suggest likely involvement of ITGB3 and TPH2 with autism spectrum disorder (ASD) in the Indian population. Prog Neuropsychopharmacol Biol Psychiatry 2013; 45:131-43. [PMID: 23628433 DOI: 10.1016/j.pnpbp.2013.04.015] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2013] [Revised: 04/12/2013] [Accepted: 04/22/2013] [Indexed: 11/19/2022]
Abstract
BACKGROUND Serotoninergic dysfunction leads to neurodevelopmental abnormalities and behavioral impairments. Platelet hyperserotoninemia is reported as the best identified endophenotype for autism spectrum disorders. Therefore, in the present study we investigate the association of TPH2, the rate limiting enzyme in 5-HT biosynthesis and ITGB3, a serotonin quantitative trait locus with ASD in the Indian population. METHODS Population and family-based genetic association and gene-gene interaction analyses were performed to evaluate the role of ITGB3 and TPH2 markers in ASD etiology. RESULTS Association tests using ITGB3 markers revealed significant paternal overtransmission of T allele of rs5918 to male probands. Interestingly for TPH2, we observed significant overrepresentation of A-A (rs11179000-rs4290270), G-A (rs4570625-rs4290270), G-G-A (rs4570625-rs11179001-rs4290270) and A-G-A (rs11179000-rs11179001-rs4290270) haplotypes in the controls and maternal preferential transmission of A-A (rs11179001-rs7305115), T-A-A (rs4570625-rs11179001-rs7305115) and T-A-A (rs11179000-rs11179001-rs7305115) and nontransmission of G-G-A (rs4570625-rs11179001-rs7305115) haplotypes to the affected offspring. Moreover, interaction of ITGB3 marker, rs15908 with TPH2 markers was found to be significant and influenced by the sex of the probands. Predicted individual risk, which varied from very mild to moderate, supports combined effect of these markers in ASD. CONCLUSION Overall results of the present study indicate likely involvement of ITGB3 and TPH2 in the pathophysiology of ASD in the Indian population.
Collapse
|
22
|
Abstract
Background Multifactor dimensionality reduction (MDR) is a powerful method for analysis of gene-gene interactions and has been successfully applied to many genetic studies of complex diseases. However, the main application of MDR has been limited to binary traits, while traits having ordinal features are commonly observed in many genetic studies (e.g., obesity classification - normal, pre-obese, mild obese and severe obese). Methods We propose ordinal MDR (OMDR) to facilitate gene-gene interaction analysis for ordinal traits. As an alternative to balanced accuracy, the use of tau-b, a common ordinal association measure, was suggested to evaluate interactions. Also, we generalized cross-validation consistency (GCVC) to identify multiple best interactions. GCVC can be practically useful for analyzing complex traits, especially in large-scale genetic studies. Results and conclusions In simulations, OMDR showed fairly good performance in terms of power, predictability and selection stability and outperformed MDR. For demonstration, we used a real data of body mass index (BMI) and scanned 1~4-way interactions of obesity ordinal and binary traits of BMI via OMDR and MDR, respectively. In real data analysis, more interactions were identified for ordinal trait than binary traits. On average, the commonly identified interactions showed higher predictability for ordinal trait than binary traits. The proposed OMDR and GCVC were implemented in a C/C++ program, executables of which are freely available for Linux, Windows and MacOS upon request for non-commercial research institutions.
Collapse
|
23
|
Efficient simulation of epistatic interactions in case-parent trios. Hum Hered 2013; 75:12-22. [PMID: 23548797 DOI: 10.1159/000348789] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2012] [Accepted: 02/11/2013] [Indexed: 12/26/2022] Open
Abstract
Statistical approaches to evaluate interactions between single nucleotide polymorphisms (SNPs) and SNP-environment interactions are of great importance in genetic association studies, as susceptibility to complex disease might be related to the interaction of multiple SNPs and/or environmental factors. With these methods under active development, algorithms to simulate genomic data sets are needed to ensure proper type I error control of newly proposed methods and to compare power with existing methods. In this paper we propose an efficient method for a haplotype-based simulation of case-parent trios when the disease risk is thought to depend on possibly higher-order epistatic interactions or gene-environment interactions with binary exposures.
Collapse
|
24
|
Efficient techniques for genotype-phenotype correlational analysis. BMC Med Inform Decis Mak 2013; 13:41. [PMID: 23557276 PMCID: PMC3686582 DOI: 10.1186/1472-6947-13-41] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2013] [Accepted: 03/19/2013] [Indexed: 11/16/2022] Open
Abstract
Background Single Nucleotide Polymorphisms (SNPs) are sequence variations found in individuals at some specific points in the genomic sequence. As SNPs are highly conserved throughout evolution and within a population, the map of SNPs serves as an excellent genotypic marker. Conventional SNPs analysis mechanisms suffer from large run times, inefficient memory usage, and frequent overestimation. In this paper, we propose efficient, scalable, and reliable algorithms to select a small subset of SNPs from a large set of SNPs which can together be employed to perform phenotypic classification. Methods Our algorithms exploit the techniques of gene selection and random projections to identify a meaningful subset of SNPs. To the best of our knowledge, these techniques have not been employed before in the context of genotype‐phenotype correlations. Random projections are used to project the input data into a lower dimensional space (closely preserving distances). Gene selection is then applied on the projected data to identify a subset of the most relevant SNPs. Results We have compared the performance of our algorithms with one of the currently known best algorithms called Multifactor Dimensionality Reduction (MDR), and Principal Component Analysis (PCA) technique. Experimental results demonstrate that our algorithms are superior in terms of accuracy as well as run time. Conclusions In our proposed techniques, random projection is used to map data from a high dimensional space to a lower dimensional space, and thus overcomes the curse of dimensionality problem. From this space of reduced dimension, we select the best subset of attributes. It is a unique mechanism in the domain of SNPs analysis, and to the best of our knowledge it is not employed before. As revealed by our experimental results, our proposed techniques offer the potential of high accuracies while keeping the run times low.
Collapse
|
25
|
SYMPHONY, an information-theoretic method for gene-gene and gene-environment interaction analysis of disease syndromes. Heredity (Edinb) 2013; 110:548-59. [PMID: 23423149 DOI: 10.1038/hdy.2012.123] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
We develop an information-theoretic method for gene-gene (GGI) and gene-environmental interactions (GEI) analysis of syndromes, defined as a phenotype vector comprising multiple quantitative traits (QTs). The K-way interaction information (KWII), an information-theoretic metric, was derived for multivariate normal distributed phenotype vectors. The utility of the method was challenged with three simulated data sets, the Genetic Association Workshop-15 (GAW15) rheumatoid arthritis data set, a high-density lipoprotein (HDL) and atherosclerosis data set from a mouse QT locus study, and the 1000 Genomes data. The dependence of the KWII on effect size, minor allele frequency, linkage disequilibrium, population stratification/admixture, as well as the power and computational time requirements of the novel method was systematically assessed in simulation studies. In these studies, phenotype vectors containing two and three constituent multivariate normally distributed QTs were used and the KWII was found to be effective at detecting GEI associated with the phenotype. High KWII values were observed for variables and variable combinations associated with the syndrome phenotype compared with uninformative variables not associated with the phenotype. The KWII values for the phenotype-associated combinations increased monotonically with increasing effect size values. The KWII also exhibited utility in simulations with non-linear dependence between the constituent QTs. Analysis of the HDL and atherosclerosis data set indicated that the simultaneous analysis of both phenotypes identified interactions not detected in the analysis of the individual traits. The information-theoretic approach may be useful for non-parametric analysis of GGI and GEI of complex syndromes.
Collapse
|
26
|
SVM-based generalized multifactor dimensionality reduction approaches for detecting gene-gene interactions in family studies. Genet Epidemiol 2013; 36:88-98. [PMID: 22851472 DOI: 10.1002/gepi.21602] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Gene-gene interaction plays an important role in the etiology of complex diseases, which may exist without a genetic main effect. Most current statistical approaches, however, focus on assessing an interaction effect in the presence of the gene's main effects. It would be very helpful to develop methods that can detect not only the gene's main effects but also gene-gene interaction effects regardless of the existence of the gene's main effects while adjusting for confounding factors. In addition, when a disease variant is rare or when the sample size is quite limited, the statistical asymptotic properties are not applicable; therefore, approaches based on a reasonable and applicable computational framework would be practical and frequently applied. In this study, we have developed an extended support vector machine (SVM) method and an SVM-based pedigree-based generalized multifactor dimensionality reduction (PGMDR) method to study interactions in the presence or absence of main effects of genes with an adjustment for covariates using limited samples of families. A new test statistic is proposed for classifying the affected and the unaffected in the SVM-based PGMDR approach to improve performance in detecting gene-gene interactions. Simulation studies under various scenarios have been performed to compare the performances of the proposed and the original methods. The proposed and original approaches have been applied to a real data example for illustration and comparison. Both the simulation and real data studies show that the proposed SVM and SVM-based PGMDR methods have great prediction accuracies, consistencies, and power in detecting gene-gene interactions.
Collapse
|
27
|
Comparative Power of Family-Based Association Strategies to Detect Disease-Causing Variants Under Two-Locus Models. Genet Epidemiol 2012; 36:848-55. [DOI: 10.1002/gepi.21672] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Revised: 06/15/2012] [Accepted: 07/02/2012] [Indexed: 11/08/2022]
|
28
|
A novel method to identify high order gene-gene interactions in genome-wide association studies: gene-based MDR. BMC Bioinformatics 2012. [PMID: 22901090 DOI: 10.1186/1471‐2105‐13‐s9‐s5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Because common complex diseases are affected by multiple genes and environmental factors, it is essential to investigate gene-gene and/or gene-environment interactions to understand genetic architecture of complex diseases. After the great success of large scale genome-wide association (GWA) studies using the high density single nucleotide polymorphism (SNP) chips, the study of gene-gene interaction becomes a next challenge. Multifactor dimensionality reduction (MDR) analysis has been widely used for the gene-gene interaction analysis. In practice, however, it is not easy to perform high order gene-gene interaction analyses via MDR in genome-wide level because it requires exploring a huge search space and suffers from a computational burden due to high dimensionality. RESULTS We propose dimensional reduction analysis, Gene-MDR analysis for the fast and efficient high order gene-gene interaction analysis. The proposed Gene-MDR method is composed of two-step applications of MDR: within- and between-gene MDR analyses. First, within-gene MDR analysis summarizes each gene effect via MDR analysis by combining multiple SNPs from the same gene. Second, between-gene MDR analysis then performs interaction analysis using the summarized gene effects from within-gene MDR analysis. We apply the Gene-MDR method to bipolar disorder (BD) GWA data from Wellcome Trust Case Control Consortium (WTCCC). The results demonstrate that Gene-MDR is capable of detecting high order gene-gene interactions associated with BD. CONCLUSION By reducing the dimension of genome-wide data from SNP level to gene level, Gene-MDR efficiently identifies high order gene-gene interactions. Therefore, Gene-MDR can provide the key to understand complex disease etiology.
Collapse
|
29
|
A novel method to identify high order gene-gene interactions in genome-wide association studies: gene-based MDR. BMC Bioinformatics 2012; 13 Suppl 9:S5. [PMID: 22901090 PMCID: PMC3372457 DOI: 10.1186/1471-2105-13-s9-s5] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Because common complex diseases are affected by multiple genes and environmental factors, it is essential to investigate gene-gene and/or gene-environment interactions to understand genetic architecture of complex diseases. After the great success of large scale genome-wide association (GWA) studies using the high density single nucleotide polymorphism (SNP) chips, the study of gene-gene interaction becomes a next challenge. Multifactor dimensionality reduction (MDR) analysis has been widely used for the gene-gene interaction analysis. In practice, however, it is not easy to perform high order gene-gene interaction analyses via MDR in genome-wide level because it requires exploring a huge search space and suffers from a computational burden due to high dimensionality. Results We propose dimensional reduction analysis, Gene-MDR analysis for the fast and efficient high order gene-gene interaction analysis. The proposed Gene-MDR method is composed of two-step applications of MDR: within- and between-gene MDR analyses. First, within-gene MDR analysis summarizes each gene effect via MDR analysis by combining multiple SNPs from the same gene. Second, between-gene MDR analysis then performs interaction analysis using the summarized gene effects from within-gene MDR analysis. We apply the Gene-MDR method to bipolar disorder (BD) GWA data from Wellcome Trust Case Control Consortium (WTCCC). The results demonstrate that Gene-MDR is capable of detecting high order gene-gene interactions associated with BD. Conclusion By reducing the dimension of genome-wide data from SNP level to gene level, Gene-MDR efficiently identifies high order gene-gene interactions. Therefore, Gene-MDR can provide the key to understand complex disease etiology.
Collapse
|
30
|
A comparison of methods sensitive to interactions with small main effects. Genet Epidemiol 2012; 36:303-11. [PMID: 22460684 DOI: 10.1002/gepi.21622] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Revised: 12/14/2011] [Accepted: 01/02/2012] [Indexed: 01/26/2023]
Abstract
Numerous genetic variants have been successfully identified for complex traits, yet these genetic factors only account for a modest portion of the predicted variance due to genetic factors. This has led to increased interest in other approaches to account for the "missing" genetic contributions to phenotype, including joint gene-gene or gene-environment analysis. A variety of methods for such analysis have been advocated. However, they have seldom been compared systematically. To facilitate such comparisons, the developers of the multifactor dimensionality reduction (MDR) simulated 100 data replicates for each of 96 two-locus models displaying negligible marginal effects from either locus (16 variations on each of six basic genetic models). The genetic models, based on a dichotomous phenotype, had varying minor allele frequencies and from two to eight distinct risk levels associated with genotype. The basic models were modified to include "noise" from combinations of missing data, genotyping error, genetic heterogeneity, and phenocopies. This study compares the performance of three methods designed to be sensitive to joint effects (MDR, support vector machines (SVMs), and the restricted partition method (RPM)) on these simulated data. In these tests, the RPM consistently outperformed the other two methods for each of the six classes of genetic models. In contrast, the comparison between other two methods had mixed results. The MDR outperformed the SVM when the true model had only a few, well-separated risk classes; while the SVM outperformed the MDR on more complicated models. Of these methods, only MDR has a well-developed user interface.
Collapse
|
31
|
A family-based association test to detect gene-gene interactions in the presence of linkage. Eur J Hum Genet 2012; 20:973-80. [PMID: 22419171 DOI: 10.1038/ejhg.2012.45] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
For many complex diseases, quantitative traits contain more information than dichotomous traits. One of the approaches used to analyse these traits in family-based association studies is the quantitative transmission disequilibrium test (QTDT). The QTDT is a regression-based approach that models simultaneously linkage and association. It splits up the association effect in a between- and a within-family genetic component to adjust and test for population stratification and includes a variance components method to model linkage. We extend this approach to detect gene-gene interactions between two unlinked QTLs by adjusting the definition of the between- and within-family component and the variance components included in the model. We simulate data to investigate the influence of the epistasis model, linkage disequilibrium patterns between the markers and the QTLs, and allele frequencies on the power and type I error rates of the approach. Results show that for some of the investigated settings, power gains are obtained in comparison with FAM-MDR. We conclude that our approach shows promising results for candidate-gene studies where too few markers are available to correct for population stratification using standard methods (for example EIGENSTRAT). The proposed method is applied to real-life data on hypertension from the FLEMENGHO study.
Collapse
|
32
|
Abstract
Complex diseases are presumed to be the result of multiple genes and environmental factors, which emphasize the importance of gene - gene and gene - environment interactions. Traditional parametric approaches are limited in their ability to detect high-order interactions and handle sparse data, and standard stepwise procedures may miss interactions with undetectable main effects. To address these limitations, the multifactor dimensionality reduction (MDR) method was developed. MDR is well suited for examining high-order interactions and detecting interactions without main effects. Like most statistical methods in genetic association studies, MDR may also lead to a false positive in the presence of population stratification. Although many statistical methods have been proposed to detect main effects and control for population stratification using genomic markers, not many methods are available to detect interactions and control for population stratification at the same time. In this article, we developed a novel test, MDR in structured populations (MDR-SP), to detect the interactions and control for population stratification. MDR-SP is applicable to both quantitative and qualitative traits and can incorporate covariates. We present simulation studies to demonstrate the validity of the test and to evaluate its power.
Collapse
|
33
|
Entropy-based information gain approaches to detect and to characterize gene-gene and gene-environment interactions/correlations of complex diseases. Genet Epidemiol 2011; 35:706-21. [PMID: 22009792 PMCID: PMC3384547 DOI: 10.1002/gepi.20621] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
For complex diseases, the relationship between genotypes, environment factors, and phenotype is usually complex and nonlinear. Our understanding of the genetic architecture of diseases has considerably increased over the last years. However, both conceptually and methodologically, detecting gene-gene and gene-environment interactions remains a challenge, despite the existence of a number of efficient methods. One method that offers great promises but has not yet been widely applied to genomic data is the entropy-based approach of information theory. In this article, we first develop entropy-based test statistics to identify two-way and higher order gene-gene and gene-environment interactions. We then apply these methods to a bladder cancer data set and thereby test their power and identify strengths and weaknesses. For two-way interactions, we propose an information gain (IG) approach based on mutual information. For three-ways and higher order interactions, an interaction IG approach is used. In both cases, we develop one-dimensional test statistics to analyze sparse data. Compared to the naive chi-square test, the test statistics we develop have similar or higher power and is robust. Applying it to the bladder cancer data set allowed to investigate the complex interactions between DNA repair gene single nucleotide polymorphisms, smoking status, and bladder cancer susceptibility. Although not yet widely applied, entropy-based approaches appear as a useful tool for detecting gene-gene and gene-environment interactions. The test statistics we develop add to a growing body methodologies that will gradually shed light on the complex architecture of common diseases.
Collapse
|
34
|
Abstract
The goal of this unit is to introduce gene-gene interactions (epistasis) as a significant complicating factor in the search for disease susceptibility genes. This unit begins with an overview of gene-gene interactions and why they are likely to be common. Then, it reviews several statistical and computational methods for detecting and characterizing genes with effects that are dependent on other genes. The focus of this unit is genetic association studies of discrete and quantitative traits because most of the methods for detecting gene-gene interactions have been developed specifically for these study designs.
Collapse
|
35
|
Detecting genetic interactions for quantitative traits with U-statistics. Genet Epidemiol 2011; 35:457-68. [PMID: 21618602 DOI: 10.1002/gepi.20594] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2011] [Revised: 03/09/2011] [Accepted: 04/19/2011] [Indexed: 11/08/2022]
Abstract
The genetic etiology of complex human diseases has been commonly viewed as a process that involves multiple genetic variants, environmental factors, as well as their interactions. Statistical approaches, such as the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR), have recently been proposed to test the joint association of multiple genetic variants with either dichotomous or continuous traits. In this study, we propose a novel Forward U-Test to evaluate the combined effect of multiple loci on quantitative traits with consideration of gene-gene/gene-environment interactions. In this new approach, a U-Statistic-based forward algorithm is first used to select potential disease-susceptibility loci and then a weighted U-statistic is used to test the joint association of the selected loci with the disease. Through a simulation study, we found the Forward U-Test outperformed GMDR in terms of greater power. Aside from that, our approach is less computationally intensive, making it feasible for high-dimensional gene-gene/gene-environment research. We illustrate our method with a real data application to nicotine dependence (ND), using three independent datasets from the Study of Addiction: Genetics and Environment. Our gene-gene interaction analysis of 155 SNPs in 67 candidate genes identified two SNPs, rs16969968 within gene CHRNA5 and rs1122530 within gene NTRK2, jointly associated with the level of ND (P-value = 5.31e-7). The association, which involves essential interaction, is replicated in two independent datasets with P-values of 1.08e-5 and 0.02, respectively. Our finding suggests that joint action may exist between the two gene products.
Collapse
|
36
|
Perspectives on genome-wide multi-stage family-based association studies. Stat Med 2011; 30:2201-21. [DOI: 10.1002/sim.4259] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2010] [Accepted: 03/07/2011] [Indexed: 01/03/2023]
|
37
|
Association of ABCA4 and MAFB with non-syndromic cleft lip with or without cleft palate. Am J Med Genet A 2011; 155A:1469-71. [PMID: 21567910 DOI: 10.1002/ajmg.a.33940] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2010] [Accepted: 01/17/2011] [Indexed: 11/10/2022]
|
38
|
Role of gene-gene/gene-environment interaction in the etiology of eastern Indian ADHD probands. Prog Neuropsychopharmacol Biol Psychiatry 2011; 35:577-87. [PMID: 21216270 DOI: 10.1016/j.pnpbp.2010.12.027] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2010] [Revised: 12/23/2010] [Accepted: 12/23/2010] [Indexed: 11/20/2022]
Abstract
Associations between attention deficit hyperactivity disorder (ADHD) and genetic polymorphisms in the dopamine receptors, transporter and metabolizing enzymes have been reported in different ethnic groups. Gene variants may affect disease outcome by acting synergistically or antagonistically and thus their combined effect becomes an important aspect to study in the disease etiology. In the present investigation, interaction between ten functional polymorphisms in DRD4, DAT1, MAOA, COMT, and DBH genes were explored in the Indo-Caucasoid population. ADHD cases were recruited based on DSM-IV criteria. Peripheral blood samples were collected from ADHD probands (N=126), their parents (N=233) and controls (N=96) after obtaining informed written consent for participation. Genomic DNA was subjected to PCR based analysis of single nucleotide polymorphisms and variable number of tandem repeats (VNTRs). Data obtained was examined for population as well as family-based association analyses. While case-control analysis revealed higher occurrence of DAT1 intron 8 VNTR 5R allele (P=0.02) in cases, significant preferential transmission of the 7R-T (DRD4 exon3 VNTR-rs1800955) and 3R-T (MAOA-u VNTR-rs6323) haplotypes were noticed from parents to probands (P=0.02 and 0.002 respectively). Gene-gene interaction analysis revealed significant additive effect of DBH rs1108580 and DRD4 rs1800955 with significant main effects of DRD4 exon3 VNTR, DAT1 3'UTR and intron 8 VNTR, MAOA u-VNTR, rs6323, COMT rs4680, rs362204, DBH rs1611115 and rs1108580 thereby pointing towards a strong association of these markers with ADHD. Correlation between gene variants, high ADHD score and low DBH enzymatic activity was also noticed, especially in male probands. From these observations, an impact of the studied sites on the disease etiology could be speculated in this ethnic group.
Collapse
|
39
|
Genome-wide association scan of korean autism spectrum disorders with language delay: a preliminary study. Psychiatry Investig 2011; 8:61-6. [PMID: 21519539 PMCID: PMC3079188 DOI: 10.4306/pi.2011.8.1.61] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Revised: 06/03/2010] [Accepted: 06/26/2010] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE Communication problems are a prevalent symptom of autism spectrum disorders (ASDs), which have a genetic background. Although several genome-wide studies on ASD have suggested a number of candidate genes, few studies have reported the association or linkage of specific endophenotypes to ASDs. METHODS Forty-two Korean ASD patients who showed a language delay were enrolled in this study with their parents. We performed a genome-wide scan by using the Affymetrix SNP Array 5.0 platform to identify candidate genes responsible for language delay in ASDs. RESULTS We detected candidate single-nucleotide polymorphisms (SNPs) in chromosome 11, rs11212733 (p-value=9.76×10(-6)) and rs7125479 (p-value=1.48×10(-4)), as a marker of language delay in ASD using the transmission disequilibrium test and multifactor dimensionality reduction test. CONCLUSION Although our results suggest that several SNPs are associated with language delay in ASD, rs11212733 we were not able to observe any significant results after correction of multiple comparisons. This may imply that more samples may be required to identify genes associated with language delay in ASD.
Collapse
|
40
|
Common genetic variation in the GAD1 gene and the entire family of DLX homeobox genes and autism spectrum disorders. Am J Med Genet B Neuropsychiatr Genet 2011; 156:233-9. [PMID: 21302352 PMCID: PMC3088769 DOI: 10.1002/ajmg.b.31148] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2010] [Accepted: 10/26/2010] [Indexed: 12/13/2022]
Abstract
Biological and positional evidence supports the involvement of the GAD1 and distal-less homeobox genes (DLXs) in the etiology of autism. We investigated 42 single nucleotide polymorphisms in these genes as risk factors for autism spectrum disorders (ASD) in a large family-based association study of 715 nuclear families. No single marker showed significant association after correction for multiple testing. A rare haplotype in the DLX1 promoter was associated with ASD (P-value = 0.001). Given the importance of rare variants to the etiology of autism revealed in recent studies, the observed rare haplotype may be relevant to future investigations. Our observations, when taken together with previous findings, suggest that common genetic variation in the GAD1 and DLX genes is unlikely to play a critical role in ASD susceptibility.
Collapse
|
41
|
Practical and theoretical considerations in study design for detecting gene-gene interactions using MDR and GMDR approaches. PLoS One 2011; 6:e16981. [PMID: 21386969 PMCID: PMC3046176 DOI: 10.1371/journal.pone.0016981] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2010] [Accepted: 01/19/2011] [Indexed: 12/25/2022] Open
Abstract
Detection of interacting risk factors for complex traits is challenging. The choice of an appropriate method, sample size, and allocation of cases and controls are serious concerns. To provide empirical guidelines for planning such studies and data analyses, we investigated the performance of the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR) methods under various experimental scenarios. We developed the mathematical expectation of accuracy and used it as an indicator parameter to perform a gene-gene interaction study. We then examined the statistical power of GMDR and MDR within the plausible range of accuracy (0.50∼0.65) reported in the literature. The GMDR with covariate adjustment had a power of>80% in a case-control design with a sample size of≥2000, with theoretical accuracy ranging from 0.56 to 0.62. However, when the accuracy was<0.56, a sample size of≥4000 was required to have sufficient power. In our simulations, the GMDR outperformed the MDR under all models with accuracy ranging from 0.56∼0.62 for a sample size of 1000–2000. However, the two methods performed similarly when the accuracy was outside this range or the sample was significantly larger. We conclude that with adjustment of a covariate, GMDR performs better than MDR and a sample size of 1000∼2000 is reasonably large for detecting gene-gene interactions in the range of effect size reported by the current literature; whereas larger sample size is required for more subtle interactions with accuracy<0.56.
Collapse
|
42
|
|
43
|
On the use of multifactor dimensionality reduction (MDR) and classification and regression tree (CART) to identify haplotype–haplotype interactions in genetic studies. Genomics 2011; 97:77-85. [DOI: 10.1016/j.ygeno.2010.11.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2010] [Revised: 11/05/2010] [Accepted: 11/14/2010] [Indexed: 11/18/2022]
|
44
|
Folate pathway and nonsyndromic cleft lip and palate. BIRTH DEFECTS RESEARCH. PART A, CLINICAL AND MOLECULAR TERATOLOGY 2011; 91:50-60. [PMID: 21254359 PMCID: PMC4098909 DOI: 10.1002/bdra.20740] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2010] [Revised: 07/19/2010] [Accepted: 08/12/2010] [Indexed: 11/08/2022]
Abstract
BACKGROUND Nonsyndromic cleft lip with or without cleft palate (NSCLP) is a common complex birth defect. Periconceptional supplementation with folic acid, a key component in DNA synthesis and cell division, has reduced the birth prevalence of neural tube defects and may similarly reduce the birth prevalence of other complex birth defects including NSCLP. Past studies investigating the role of two common methylenetetrahydrofolate reductase (MTHFR) single-nucleotide polymorphisms (SNPs), C677T (rs1801133) and A1298C (rs1801131), in NSCLP have produced conflicting results. Most studies of folate pathway genes have been limited in scope, as few genes/SNPs have been interrogated. Here, we asked whether variations in a more comprehensive group of folate pathway genes were associated with NSCLP, and were there detectable interactions between these genes and environmental exposures? METHODS Fourteen folate metabolism-related genes were interrogated using 89 SNPs in multiplex and simplex non-Hispanic white and Hispanic NSCLP families. RESULTS Evidence for a risk association between NSCLP and SNPs in NOS3 and TYMS was detected in the non-Hispanic white group, whereas associations with MTR, BHMT2, MTHFS, and SLC19A1 were detected in the Hispanic group. Evidence for over-transmission of haplotypes and gene interactions in the methionine arm was detected. CONCLUSIONS These results suggest that perturbations of the genes in the folate pathway may contribute to NSCLP. There was evidence for an interaction between several SNPs and maternal smoking, and for one SNP with gender of the offspring. These results provide support for other studies that suggest that high maternal homocysteine levels may contribute to NSCLP and should be further investigated.
Collapse
|
45
|
Abstract
Ensemble methods (such as Bagging and Random Forests) take advantage of unstable base learners (such as decision trees) to improve predictions, and offer measures of variable importance useful for variable selection. LogicFS has been proposed as such an ensemble learner for case-control studies when interactions of single nucleotide polymorphisms (SNPs) are of particular interest. LogicFS uses bootstrap samples of the data and employs the Boolean trees derived in logic regression as base learners to create ensembles of models that allow for the quantification of the contributions of epistatic interactions to the disease risk. In this article, we propose an extension of logicFS suitable for case-parent trio data, and derive an additional importance measure that is much less influenced by linkage disequilibrium between SNPs than the measure originally used in logicFS. We illustrate the performance of the novel procedure in simulation studies and in a case study of 461 case-parent trios with autistic children.
Collapse
|
46
|
A comparison of internal validation techniques for multifactor dimensionality reduction. BMC Bioinformatics 2010; 11:394. [PMID: 20650002 PMCID: PMC2920275 DOI: 10.1186/1471-2105-11-394] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Accepted: 07/22/2010] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND It is hypothesized that common, complex diseases may be due to complex interactions between genetic and environmental factors, which are difficult to detect in high-dimensional data using traditional statistical approaches. Multifactor Dimensionality Reduction (MDR) is the most commonly used data-mining method to detect epistatic interactions. In all data-mining methods, it is important to consider internal validation procedures to obtain prediction estimates to prevent model over-fitting and reduce potential false positive findings. Currently, MDR utilizes cross-validation for internal validation. In this study, we incorporate the use of a three-way split (3WS) of the data in combination with a post-hoc pruning procedure as an alternative to cross-validation for internal model validation to reduce computation time without impairing performance. We compare the power to detect true disease causing loci using MDR with both 5- and 10-fold cross-validation to MDR with 3WS for a range of single-locus and epistatic disease models. Additionally, we analyze a dataset in HIV immunogenetics to demonstrate the results of the two strategies on real data. RESULTS MDR with 3WS is computationally approximately five times faster than 5-fold cross-validation. The power to find the exact true disease loci without detecting false positive loci is higher with 5-fold cross-validation than with 3WS before pruning. However, the power to find the true disease causing loci in addition to false positive loci is equivalent to the 3WS. With the incorporation of a pruning procedure after the 3WS, the power of the 3WS approach to detect only the exact disease loci is equivalent to that of MDR with cross-validation. In the real data application, the cross-validation and 3WS analyses indicate the same two-locus model. CONCLUSIONS Our results reveal that the performance of the two internal validation methods is equivalent with the use of pruning procedures. The specific pruning procedure should be chosen understanding the trade-off between identifying all relevant genetic effects but including false positives and missing important genetic factors. This implies 3WS may be a powerful and computationally efficient approach to screen for epistatic effects, and could be used to identify candidate interactions in large-scale genetic studies.
Collapse
|
47
|
Family-based study shows heterogeneity of a susceptibility locus on chromosome 8q24 for nonsyndromic cleft lip and palate. ACTA ACUST UNITED AC 2010; 88:256-9. [PMID: 20196142 DOI: 10.1002/bdra.20659] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
BACKGROUND Nonsyndromic cleft lip with or without cleft palate is a common birth defect. Although a number of susceptibility loci have been reported, replication has often been lacking. This is likely due, in part, to the heterogeneity of datasets and methodologies. Two independent genome-wide association studies of individuals of largely western European extraction have identified a possible susceptibility locus on 8q24.21. METHODS To determine the overall effect of this locus, we genotyped six of the previously associated single nucleotide polymorphisms in our Hispanic and non-Hispanic white family-based datasets and evaluated them for linkage and association. In addition, we genotyped a large African American family with nonsyndromic cleft lip with or without cleft palate that we had previously mapped to the 8q21.3-24.12 region to test for linkage. RESULTS There was no evidence for linkage to this region in any of the three ethnic groups. Nevertheless, strong evidence for association was noted in the non-Hispanic white group, whereas none was detected in the Hispanic dataset. CONCLUSION These results confirm the previously reported association and provide evidence suggesting that there is ethnically based heterogeneity for this locus.
Collapse
|
48
|
A cross-validation procedure for general pedigrees and matched odds ratio fitness metric implemented for the multifactor dimensionality reduction pedigree disequilibrium test. Genet Epidemiol 2010; 34:194-9. [PMID: 19697353 DOI: 10.1002/gepi.20447] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
As genetic epidemiology looks beyond mapping single disease susceptibility loci, interest in detecting epistatic interactions between genes has grown. The dimensionality and comparisons required to search the epistatic space and the inference for a significant result pose challenges for testing epistatic disease models. The multifactor dimensionality reduction-pedigree disequilibrium test (MDR-PDT) was developed to test for multilocus models in pedigree data. In the present study we rigorously tested MDR-PDT with new cross-validation (CV) (both 5- and 10-fold) and omnibus model selection algorithms by simulating a range of heritabilities, odds ratios, minor allele frequencies, sample sizes, and numbers of interacting loci. Power was evaluated using 100, 500, and 1,000 families, with minor allele frequencies 0.2 and 0.4 and broad-sense heritabilities of 0.005, 0.01, 0.03, 0.05, and 0.1 for 2- and 3-locus purely epistatic penetrance models. We also compared the prediction error (PE) measure of effect with a predicted matched odds ratio (MOR) for final model selection and testing. We report that the CV procedure is valid with the permutation test, MDR-PDT performs similarly with 5- and 10-fold CV, and that the MOR is more powerful than PE as the fitness metric for MDR-PDT.
Collapse
|
49
|
FAM-MDR: a flexible family-based multifactor dimensionality reduction technique to detect epistasis using related individuals. PLoS One 2010; 5:e10304. [PMID: 20421984 PMCID: PMC2858665 DOI: 10.1371/journal.pone.0010304] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2010] [Accepted: 03/01/2010] [Indexed: 12/05/2022] Open
Abstract
We propose a novel multifactor dimensionality reduction method for epistasis detection in small or extended pedigrees, FAM-MDR. It combines features of the Genome-wide Rapid Association using Mixed Model And Regression approach (GRAMMAR) with Model-Based MDR (MB-MDR). We focus on continuous traits, although the method is general and can be used for outcomes of any type, including binary and censored traits. When comparing FAM-MDR with Pedigree-based Generalized MDR (PGMDR), which is a generalization of Multifactor Dimensionality Reduction (MDR) to continuous traits and related individuals, FAM-MDR was found to outperform PGMDR in terms of power, in most of the considered simulated scenarios. Additional simulations revealed that PGMDR does not appropriately deal with multiple testing and consequently gives rise to overly optimistic results. FAM-MDR adequately deals with multiple testing in epistasis screens and is in contrast rather conservative, by construction. Furthermore, simulations show that correcting for lower order (main) effects is of utmost importance when claiming epistasis. As Type 2 Diabetes Mellitus (T2DM) is a complex phenotype likely influenced by gene-gene interactions, we applied FAM-MDR to examine data on glucose area-under-the-curve (GAUC), an endophenotype of T2DM for which multiple independent genetic associations have been observed, in the Amish Family Diabetes Study (AFDS). This application reveals that FAM-MDR makes more efficient use of the available data than PGMDR and can deal with multi-generational pedigrees more easily. In conclusion, we have validated FAM-MDR and compared it to PGMDR, the current state-of-the-art MDR method for family data, using both simulations and a practical dataset. FAM-MDR is found to outperform PGMDR in that it handles the multiple testing issue more correctly, has increased power, and efficiently uses all available information.
Collapse
|
50
|
Abstract
Following the identification of several disease-associated polymorphisms by genome-wide association (GWA) analysis, interest is now focusing on the detection of effects that, owing to their interaction with other genetic or environmental factors, might not be identified by using standard single-locus tests. In addition to increasing the power to detect associations, it is hoped that detecting interactions between loci will allow us to elucidate the biological and biochemical pathways that underpin disease. Here I provide a critical survey of the methods and related software packages currently used to detect the interactions between genetic loci that contribute to human genetic disease. I also discuss the difficulties in determining the biological relevance of statistical interactions.
Collapse
|