1
|
Alamin M, Sultana MH, Lou X, Jin W, Xu H. Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS. PLANTS (BASEL, SWITZERLAND) 2022; 11:3277. [PMID: 36501317 PMCID: PMC9739826 DOI: 10.3390/plants11233277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Genome-wide association study (GWAS) is the most popular approach to dissecting complex traits in plants, humans, and animals. Numerous methods and tools have been proposed to discover the causal variants for GWAS data analysis. Among them, linear mixed models (LMMs) are widely used statistical methods for regulating confounding factors, including population structure, resulting in increased computational proficiency and statistical power in GWAS studies. Recently more attention has been paid to pleiotropy, multi-trait, gene-gene interaction, gene-environment interaction, and multi-locus methods with the growing availability of large-scale GWAS data and relevant phenotype samples. In this review, we have demonstrated all possible LMMs-based methods available in the literature for GWAS. We briefly discuss the different LMM methods, software packages, and available open-source applications in GWAS. Then, we include the advantages and weaknesses of the LMMs in GWAS. Finally, we discuss the future perspective and conclusion. The present review paper would be helpful to the researchers for selecting appropriate LMM models and methods quickly for GWAS data analysis and would benefit the scientific society.
Collapse
Affiliation(s)
- Md. Alamin
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | | | - Xiangyang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Wenfei Jin
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
2
|
Nodzenski M, Shi M, Krahn JM, Wise AS, Li Y, Li L, Umbach DM, Weinberg CR. GADGETS: a genetic algorithm for detecting epistasis using nuclear families. Bioinformatics 2022; 38:1052-1058. [PMID: 34788792 PMCID: PMC10060691 DOI: 10.1093/bioinformatics/btab766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 10/08/2021] [Accepted: 11/03/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Epistasis may play an etiologic role in complex diseases, but research has been hindered because identification of interactions among sets of single nucleotide polymorphisms (SNPs) requires exploration of immense search spaces. Current approaches using nuclear families accommodate at most several hundred candidate SNPs. RESULTS GADGETS detects epistatic SNP-sets by applying a genetic algorithm to case-parent or case-sibling data. To allow for multiple epistatic sets, island subpopulations of SNP-sets evolve separately under selection for evident joint relevance to disease risk. The software evaluates the identified SNP-sets via permutation testing and provides graphical visualization. GADGETS correctly identified epistatic SNP-sets in realistically simulated case-parent triads with 10 000 candidate SNPs, far more SNPs than competitors can handle, and it outperformed competitors in simulations with many fewer SNPs. Applying GADGETS to family-based oral-clefting data from dbGaP identified SNP-sets with possible epistatic effects on risk. AVAILABILITY AND IMPLEMENTATION GADGETS is part of the epistasisGA package at https://github.com/mnodzenski/epistasisGA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michael Nodzenski
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC 27709, USA
| | - Min Shi
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC 27709, USA
| | - Juno M Krahn
- Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC 27709, USA
| | - Alison S Wise
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC 27709, USA
| | - Yuanyuan Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC 27709, USA
| | - Leping Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC 27709, USA
| | - David M Umbach
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC 27709, USA
| | - Clarice R Weinberg
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC 27709, USA
| |
Collapse
|
3
|
Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Duroux D, Gusareva ES, Wei Z, Hakonarson H, Van Steen K. Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure. BioData Min 2021; 14:16. [PMID: 33608043 PMCID: PMC7893746 DOI: 10.1186/s13040-021-00247-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 02/07/2021] [Indexed: 12/15/2022] Open
Abstract
Background In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. Conclusion We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00247-w.
Collapse
Affiliation(s)
- Fentaw Abegaz
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.
| | | | | | | | - Archana Bhardwaj
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Diane Duroux
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Elena S Gusareva
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Pediatrics, Division of Human Genetics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kristel Van Steen
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.,WELBIO (Walloon Excellence in Lifesciences and Biotechnology), University of Liège, Liège, Belgium
| |
Collapse
|
4
|
Xiang X, Wang S, Liu T, Wang M, Li J, Jiang J, Wu T, Hu Y. Exploring gene-gene interaction in family-based data with an unsupervised machine learning method: EPISFA. Genet Epidemiol 2020; 44:811-824. [PMID: 32869348 DOI: 10.1002/gepi.22342] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 06/06/2020] [Accepted: 06/21/2020] [Indexed: 11/06/2022]
Abstract
Gene-gene interaction (G × G) is thought to fill the gap between the estimated heritability of complex diseases and the limited genetic proportion explained by identified single-nucleotide polymorphisms. The current tools for exploring G × G were often developed for case-control designs with less considerations for their applications in families. Family-based studies are robust against bias led from population stratification in genetic studies and helpful in understanding G × G. We proposed a new algorithm epistasis sparse factor analysis (EPISFA) and epistasis sparse factor analysis for linkage disequilibrium (EPISFA-LD) based on unsupervised machine learning to screen G × G. Extensive simulations were performed to compare EPISFA/EPISFA-LD with a classical family-based algorithm FAM-MDR (family-based multifactor dimensionality reduction). The results showed that EPISFA/EPISFA-LD is a tool of both high power and computational efficiency that could be applied in family designs and is applicable within high-dimensionality datasets. Finally, we applied EPISFA/EPISFA-LD to a real dataset drawn from the Fangshan/family-based Ischemic Stroke Study in China. Five pairs of G × G were discovered by EPISFA/EPISFA-LD, including three pairs verified by other algorithms (FAM-MDR and logistic), and an additional two pairs uniquely identified by EPISFA/EPISFA-LD only. The results from EPISFA might offer new insights for understanding the genetic etiology of complex diseases. EPISFA/EPISFA-LD was implemented in R. All relevant source code as well as simulated data could be freely downloaded from https://github.com/doublexism/episfa.
Collapse
Affiliation(s)
- Xiao Xiang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Siyue Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Tianyi Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Capital Medical University, Beijing, China
| | - Mengying Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Jiawen Li
- Department of Clinical Medicine, School of Medicine, Peking University, Beijing, China
| | - Jin Jiang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Tao Wu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Yonghua Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
| |
Collapse
|
5
|
Joiret M, Mahachie John JM, Gusareva ES, Van Steen K. Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies. BioData Min 2019; 12:11. [PMID: 31198442 PMCID: PMC6558841 DOI: 10.1186/s13040-019-0199-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 05/09/2019] [Indexed: 01/07/2023] Open
Abstract
Background In Genome-Wide Association Studies (GWAS), the concept of linkage disequilibrium is important as it allows identifying genetic markers that tag the actual causal variants. In Genome-Wide Association Interaction Studies (GWAIS), similar principles hold for pairs of causal variants. However, Linkage Disequilibrium (LD) may also interfere with the detection of genuine epistasis signals in that there may be complete confounding between Gametic Phase Disequilibrium (GPD) and interaction. GPD may involve unlinked genetic markers, even residing on different chromosomes. Often GPD is eliminated in GWAIS, via feature selection schemes or so-called pruning algorithms, to obtain unconfounded epistasis results. However, little is known about the optimal degree of GPD/LD-pruning that gives a balance between false positive control and sufficient power of epistasis detection statistics. Here, we focus on Model-Based Multifactor Dimensionality Reduction as one large-scale epistasis detection tool. Its performance has been thoroughly investigated in terms of false positive control and power, under a variety of scenarios involving different trait types and study designs, as well as error-free and noisy data, but never with respect to multicollinear SNPs. Results Using real-life human LD patterns from a homogeneous subpopulation of British ancestry, we investigated the impact of LD-pruning on the statistical sensitivity of MB-MDR. We considered three different non-fully penetrant epistasis models with varying effect sizes. There is a clear advantage in pre-analysis pruning using sliding windows at r2 of 0.75 or lower, but using a threshold of 0.20 has a detrimental effect on the power to detect a functional interactive SNP pair (power < 25%). Signal sensitivity, directly using LD-block information to determine whether an epistasis signal is present or not, benefits from LD-pruning as well (average power across scenarios: 87%), but is largely hampered by functional loci residing at the boundaries of an LD-block. Conclusions Our results confirm that LD patterns and the position of causal variants in LD blocks do have an impact on epistasis detection, and that pruning strategies and LD-blocks definitions combined need careful attention, if we wish to maximize the power of large-scale epistasis screenings.
Collapse
Affiliation(s)
- Marc Joiret
- BIO3, GIGA-R Medical Genomics, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium.,Biomechanics Research Unit, GIGA-R in-silico medicine, Liège, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium
| | | | - Elena S Gusareva
- BIO3, GIGA-R Medical Genomics, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium
| | - Kristel Van Steen
- BIO3, GIGA-R Medical Genomics, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium.,WELBIO researcher, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium
| |
Collapse
|
6
|
Jung HY, Leem S, Park T. Fuzzy set-based generalized multifactor dimensionality reduction analysis of gene-gene interactions. BMC Med Genomics 2018; 11:32. [PMID: 29697366 PMCID: PMC5918459 DOI: 10.1186/s12920-018-0343-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Gene-gene interactions (GGIs) are a known cause of missing heritability. Multifactor dimensionality reduction (MDR) is one of most commonly used methods for GGI detection. The generalized multifactor dimensionality reduction (GMDR) method is an extension of MDR method that is applicable to various types of traits, and allows covariate adjustments. Our previous Fuzzy MDR (FMDR) is another extension for overcoming simple binary classification. FMDR uses continuous member-ship values instead of binary membership values 0 and 1, improving power for detecting causal SNPs and more intuitive interpretations in real data analysis. Here, we propose the fuzzy generalized multifactor dimensionality reduction (FGMDR) method, as a combined analysis of fuzzy set-based analysis and GMDR method, to detect GGIs associated with diseases using fuzzy set theory. RESULTS Through simulation studies for different types of traits, the proposed FGMDR showed a higher detection ratio of causal SNPs, compared to GMDR. We then applied FGMDR to two real data: Crohn's disease (CD) data from the Wellcome Trust Case Control Consortium (WTCCC) with a binary phenotype and the Homeostasis Model Assessment of Insulin Resistance (HOMA-IR) data from Korean population with a continuous phenotype. The interactions derived by our method include the pre-reported interactions associated with phenotypes. CONCLUSIONS The proposed FGMDR performs well for GGI detection with covariate adjustments. The program written in R for FGMDR is available at http://statgen.snu.ac.kr/software/FGMDR .
Collapse
Affiliation(s)
- Hye-Young Jung
- Faculty of Liberal Education, Seoul National University, Seoul, 08826 South Korea
| | - Sangseob Leem
- Department of Statistics, Seoul National University, Seoul, 08826 South Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, 08826 South Korea
| |
Collapse
|
7
|
Cole BS, Hall MA, Urbanowicz RJ, Gilbert‐Diamond D, Moore JH. Analysis of Gene‐Gene Interactions. ACTA ACUST UNITED AC 2018; 95:1.14.1-1.14.10. [DOI: 10.1002/cphg.45] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Brian S. Cole
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania Philadelphia Pennsylvania
| | - Molly A. Hall
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania Philadelphia Pennsylvania
- The Center for Systems Genomics, The Pennsylvania State University, University Park Pennsylvania
| | - Ryan J. Urbanowicz
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania Philadelphia Pennsylvania
| | - Diane Gilbert‐Diamond
- Institute for Quantitative Biomedical Sciences at Dartmouth Hanover New Hampshire
- Department of Epidemiology, Geisel School of Medicine at Dartmouth Hanover New Hampshire
| | - Jason H. Moore
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania Philadelphia Pennsylvania
| |
Collapse
|
8
|
Yu W, Lee S, Park T. A unified model based multifactor dimensionality reduction framework for detecting gene-gene interactions. Bioinformatics 2017; 32:i605-i610. [PMID: 27587680 DOI: 10.1093/bioinformatics/btw424] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Gene-gene interaction (GGI) is one of the most popular approaches for finding and explaining the missing heritability of common complex traits in genome-wide association studies. The multifactor dimensionality reduction (MDR) method has been widely studied for detecting GGI effects. However, there are several disadvantages of the existing MDR-based approaches, such as the lack of an efficient way of evaluating the significance of multi-locus models and the high computational burden due to intensive permutation. Furthermore, the MDR method does not distinguish marginal effects from pure interaction effects. METHODS We propose a two-step unified model based MDR approach (UM-MDR), in which, the significance of a multi-locus model, even a high-order model, can be easily obtained through a regression framework with a semi-parametric correction procedure for controlling Type I error rates. In comparison to the conventional permutation approach, the proposed semi-parametric correction procedure avoids heavy computation in order to achieve the significance of a multi-locus model. The proposed UM-MDR approach is flexible in the sense that it is able to incorporate different types of traits and evaluate significances of the existing MDR extensions. RESULTS The simulation studies and the analysis of a real example are provided to demonstrate the utility of the proposed method. UM-MDR can achieve at least the same power as MDR for most scenarios, and it outperforms MDR especially when there are some single nucleotide polymorphisms that only have marginal effects, which masks the detection of causal epistasis for the existing MDR approaches. CONCLUSIONS UM-MDR provides a very good supplement of existing MDR method due to its efficiency in achieving significance for every multi-locus model, its power and its flexibility of handling different types of traits. AVAILABILITY AND IMPLEMENTATION A R package "umMDR" and other source codes are freely available at http://statgen.snu.ac.kr/software/umMDR/ CONTACT: tspark@stats.snu.ac.kr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenbao Yu
- Department of Statistics, Seoul National University, Shilim-Dong, Kwanak-Gu, Seoul 151-742, Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Seoul 143-747, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Shilim-Dong, Kwanak-Gu, Seoul 151-742, Korea
| |
Collapse
|
9
|
Abo Alchamlat S, Farnir F. KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies. BMC Bioinformatics 2017; 18:184. [PMID: 28327091 PMCID: PMC5361736 DOI: 10.1186/s12859-017-1599-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 03/11/2017] [Indexed: 12/30/2022] Open
Abstract
Background Finding epistatic interactions in large association studies like genome-wide association studies (GWAS) with the nowadays-available large volume of genomic data is a challenging and largely unsolved issue. Few previous studies could handle genome-wide data due to the intractable difficulties met in searching a combinatorial explosive search space and statistically evaluating epistatic interactions given a limited number of samples. Our work is a contribution to this field. We propose a novel approach combining K-Nearest Neighbors (KNN) and Multi Dimensional Reduction (MDR) methods for detecting gene-gene interactions as a possible alternative to existing algorithms, e especially in situations where the number of involved determinants is high. After describing the approach, a comparison of our method (KNN-MDR) to a set of the other most performing methods (i.e., MDR, BOOST, BHIT, MegaSNPHunter and AntEpiSeeker) is carried on to detect interactions using simulated data as well as real genome-wide data. Results Experimental results on both simulated data and real genome-wide data show that KNN-MDR has interesting properties in terms of accuracy and power, and that, in many cases, it significantly outperforms its recent competitors. Conclusions The presented methodology (KNN-MDR) is valuable in the context of loci and interactions mapping and can be seen as an interesting addition to the arsenal used in complex traits analyses. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1599-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sinan Abo Alchamlat
- Department of Biostatistics, Faculty of Veterinary Medicine, FARAH, University of Liège, Sart Tilman B43, 4000, Liege, Belgium
| | - Frédéric Farnir
- Department of Biostatistics, Faculty of Veterinary Medicine, FARAH, University of Liège, Sart Tilman B43, 4000, Liege, Belgium.
| |
Collapse
|
10
|
A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction. Comput Biol Chem 2016; 65:193-202. [PMID: 27765491 DOI: 10.1016/j.compbiolchem.2016.09.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2016] [Accepted: 09/07/2016] [Indexed: 11/20/2022]
Abstract
BACKGROUND Gene-gene interaction (GGI) is one of the most popular approaches for finding the missing heritability of common complex traits in genetic association studies. The multifactor dimensionality reduction (MDR) method has been widely studied for detecting GGIs. In order to identify the best interaction model associated with disease susceptibility, MDR compares all possible genotype combinations in terms of their predictability of disease status from a simple binary high(H) and low(L) risk classification. However, this simple binary classification does not reflect the uncertainty of H/L classification. METHODS We regard classifying H/L as equivalent to defining the degree of membership of two risk groups H/L. By adopting the fuzzy set theory, we propose Fuzzy MDR which takes into account the uncertainty of H/L classification. Fuzzy MDR allows the possibility of partial membership of H/L through a membership function which transforms the degree of uncertainty into a [0,1] scale. The best genotype combinations can be selected which maximizes a new fuzzy set based accuracy measure. RESULTS Two simulation studies are conducted to compare the power of the proposed Fuzzy MDR with that of MDR. Our results show that Fuzzy MDR has higher power than MDR. We illustrate the proposed Fuzzy MDR by analysing bipolar disorder (BD) trait of the WTCCC dataset to detect GGI associated with BD. CONCLUSIONS We propose a novel Fuzzy MDR method to detect gene-gene interaction by taking into account the uncertainly of H/L classification and show that it has higher power than MDR. Fuzzy MDR can be easily extended to handle continuous phenotypes as well. The program written in R for the proposed Fuzzy MDR is available at https://statgen.snu.ac.kr/software/FuzzyMDR.
Collapse
|
11
|
Sung PY, Wang YT, Yu YW, Chung RH. An efficient gene-gene interaction test for genome-wide association studies in trio families. Bioinformatics 2016; 32:1848-55. [PMID: 26873927 PMCID: PMC5939888 DOI: 10.1093/bioinformatics/btw077] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Revised: 01/04/2016] [Accepted: 02/04/2016] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Several efficient gene-gene interaction tests have been developed for unrelated case-control samples in genome-wide association studies (GWAS), making it possible to test tens of billions of interaction pairs of single-nucleotide polymorphisms (SNPs) in a reasonable timeframe. However, current family-based gene-gene interaction tests are computationally expensive and are not applicable to genome-wide interaction analysis. RESULTS We developed an efficient family-based gene-gene interaction test, GCORE, for trios (i.e. two parents and one affected sib). The GCORE compares interlocus correlations at two SNPs between the transmitted and non-transmitted alleles. We used simulation studies to compare the statistical properties such as type I error rates and power for the GCORE with several other family-based interaction tests under various scenarios. We applied the GCORE to a family-based GWAS for autism consisting of approximately 2000 trios. Testing a total of 22 471 383 013 interaction pairs in the GWAS can be finished in 36 h by the GCORE without large-scale computing resources, demonstrating that the test is practical for genome-wide gene-gene interaction analysis in trios. AVAILABILITY AND IMPLEMENTATION GCORE is implemented with C ++ and is available at http://gscore.sourceforge.net CONTACT rchung@nhri.org.tw SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pei-Yuan Sung
- Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan and
| | - Yi-Ting Wang
- Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan and
| | - Ya-Wen Yu
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan
| | - Ren-Hua Chung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan
| |
Collapse
|
12
|
Software for detecting gene-gene interactions in genome wide association studies. BIOTECHNOL BIOPROC E 2015. [DOI: 10.1007/s12257-015-0064-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
13
|
Yu W, Kwon MS, Park T. Multivariate Quantitative Multifactor Dimensionality Reduction for Detecting Gene-Gene Interactions. Hum Hered 2015. [PMID: 26201702 DOI: 10.1159/000377723] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES To determine gene-gene interactions and missing heritability of complex diseases is a challenging topic in genome-wide association studies. The multifactor dimensionality reduction (MDR) method is one of the most commonly used methods for identifying gene-gene interactions with dichotomous phenotypes. For quantitative phenotypes, the generalized MDR or quantitative MDR (QMDR) methods have been proposed. These methods are known as univariate methods because they consider only one phenotype. To date, there are few methods for analyzing multiple phenotypes. METHODS To address this problem, we propose a multivariate QMDR method (Multi-QMDR) for multivariate correlated phenotypes. We summarize the multivariate phenotypes into a univariate score by dimensional reduction analysis, and then classify the samples accordingly into high-risk and low-risk groups. We use different ways of summarizing mainly based on the principal components. Multi-QMDR is model-free and easy to implement. RESULTS Multi-QMDR is applied to lipid-related traits. The properties of Multi- QMDR were investigated through simulation studies. Empirical studies show that Multi-QMDR outperforms existing univariate and multivariate methods at identifying causal interactions. CONCLUSIONS The Multi-QMDR approach improves the performance of QMDR when multiple quantitative phenotypes are available.
Collapse
Affiliation(s)
- Wenbao Yu
- Department of Statistic, Seoul National University, Seoul, South Korea
| | | | | |
Collapse
|
14
|
Ni S, Lv J, Cheng Z, Li M. Novel Online Dimensionality Reduction Method with Improved Topology Representing and Radial Basis Function Networks. PLoS One 2015; 10:e0131631. [PMID: 26161960 PMCID: PMC4498733 DOI: 10.1371/journal.pone.0131631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 06/05/2015] [Indexed: 11/19/2022] Open
Abstract
This paper presents improvements to the conventional Topology Representing Network to build more appropriate topology relationships. Based on this improved Topology Representing Network, we propose a novel method for online dimensionality reduction that integrates the improved Topology Representing Network and Radial Basis Function Network. This method can find meaningful low-dimensional feature structures embedded in high-dimensional original data space, process nonlinear embedded manifolds, and map the new data online. Furthermore, this method can deal with large datasets for the benefit of improved Topology Representing Network. Experiments illustrate the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Shengqiao Ni
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China
| | - Jiancheng Lv
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China
| | - Zhehao Cheng
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China
| | - Mao Li
- Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China
| |
Collapse
|
15
|
Gola D, Mahachie John JM, van Steen K, König IR. A roadmap to multifactor dimensionality reduction methods. Brief Bioinform 2015; 17:293-308. [PMID: 26108231 PMCID: PMC4793893 DOI: 10.1093/bib/bbv038] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Indexed: 02/02/2023] Open
Abstract
Complex diseases are defined to be determined by multiple genetic and environmental factors alone as well as in interactions. To analyze interactions in genetic data, many statistical methods have been suggested, with most of them relying on statistical regression models. Given the known limitations of classical methods, approaches from the machine-learning community have also become attractive. From this latter family, a fast-growing collection of methods emerged that are based on the Multifactor Dimensionality Reduction (MDR) approach. Since its first introduction, MDR has enjoyed great popularity in applications and has been extended and modified multiple times. Based on a literature search, we here provide a systematic and comprehensive overview of these suggested methods. The methods are described in detail, and the availability of implementations is listed. Most recent approaches offer to deal with large-scale data sets and rare variants, which is why we expect these methods to even gain in popularity.
Collapse
|
16
|
Gusareva ES, Van Steen K. Practical aspects of genome-wide association interaction analysis. Hum Genet 2014; 133:1343-58. [DOI: 10.1007/s00439-014-1480-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 08/18/2014] [Indexed: 12/31/2022]
|
17
|
Family studies of type 1 diabetes reveal additive and epistatic effects between MGAT1 and three other polymorphisms. Genes Immun 2014; 15:218-23. [PMID: 24572742 PMCID: PMC4047175 DOI: 10.1038/gene.2014.7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2013] [Revised: 01/13/2014] [Accepted: 01/17/2014] [Indexed: 11/08/2022]
Abstract
In a recent study on multiple sclerosis (MS), we observed additive effects and epistatic interactions between variants of four genes that converge to induce T-cell hyperactivity by altering Asn-(N)-linked protein glycosylation: namely, the Golgi enzyme MGAT1, cytotoxic T-lymphocyte antigen 4 (CTLA-4), interleukin-2 receptor-α (IL2RA) and interleukin-7 receptor-α (IL7RA). As the CTLA-4, IL2RA and IL7RA variants are associated with type 1 diabetes (T1D), we examined for joint effects in T1D. Employing a novel conditional logistic regression for family-based data sets, epistatic and additive effects were observed using 1423 multiplex families from the Type 1 Diabetes Genetic Consortium data set. The IL2RA and IL7RA variants had univariate association in MS and T1D, whereas the MGAT1 and CTLA-4 variants associated with only MS or T1D, respectively. However, similar to MS, the MGAT1 variant haplotype interacted with CTLA4 (P=0.03), and a combination of IL2RA and IL7RA (P=0.01). The joint effects of MGAT1, CTLA4, IL2RA, IL7RA and the two interactions using a multiple conditional logistic regression were statistically highly significant (P<5 × 10(-10)). The MGAT1-CTLA-4 interaction was replicated (P=0.01) in 179 trio families from the Genetics of Kidneys in Diabetes study. These data are consistent with defective N-glycosylation of T cells contributing to T1D pathogenesis.
Collapse
|
18
|
Zhang W, Langefeld CD, Grunwald GK, Fingerlin TE. Testing gene-environment interactions in family-based association studies using trait-based ascertained samples. Stat Med 2013; 33:304-18. [PMID: 23922213 DOI: 10.1002/sim.5930] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Revised: 07/03/2013] [Accepted: 07/08/2013] [Indexed: 01/15/2023]
Abstract
The study of gene-environment interactions is an increasingly important aspect of genetic epidemiological investigation. Historically, it has been difficult to study gene-environment interactions using a family-based design for quantitative traits or when parent-offspring trios were incomplete. The QBAT-I provides researchers a tool to estimate and test for a gene-environment interaction in families of arbitrary structure that are sampled without regard to the phenotype of interest, but is vulnerable to inflated type I error if families are ascertained on the basis of the phenotype. In this study, we verified the potential for type I error of the QBAT-I when applied to samples ascertained on a trait of interest. The magnitude of the inflation increases as the main genetic effect increases and as the ascertainment becomes more extreme. We propose an ascertainment-corrected score test that allows the use of the QBAT-I to test for gene-environment interactions in ascertained samples. Our results indicate that the score test and an ad hoc method we propose can often restore the nominal type I error rate, and in cases where complete restoration is not possible, dramatically reduce the inflation of the type I error rate in ascertained samples.
Collapse
Affiliation(s)
- Weiming Zhang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, U.S.A
| | | | | | | |
Collapse
|
19
|
Mahachie John JM, Van Lishout F, Gusareva ES, Van Steen K. A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection. BioData Min 2013; 6:9. [PMID: 23618370 PMCID: PMC3668290 DOI: 10.1186/1756-0381-6-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 04/20/2013] [Indexed: 11/10/2022] Open
Abstract
Background Applying a statistical method implies identifying underlying (model) assumptions and checking their validity in the particular context. One of these contexts is association modeling for epistasis detection. Here, depending on the technique used, violation of model assumptions may result in increased type I error, power loss, or biased parameter estimates. Remedial measures for violated underlying conditions or assumptions include data transformation or selecting a more relaxed modeling or testing strategy. Model-Based Multifactor Dimensionality Reduction (MB-MDR) for epistasis detection relies on association testing between a trait and a factor consisting of multilocus genotype information. For quantitative traits, the framework is essentially Analysis of Variance (ANOVA) that decomposes the variability in the trait amongst the different factors. In this study, we assess through simulations, the cumulative effect of deviations from normality and homoscedasticity on the overall performance of quantitative Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect 2-locus epistasis signals in the absence of main effects. Methodology Our simulation study focuses on pure epistasis models with varying degrees of genetic influence on a quantitative trait. Conditional on a multilocus genotype, we consider quantitative trait distributions that are normal, chi-square or Student’s t with constant or non-constant phenotypic variances. All data are analyzed with MB-MDR using the built-in Student’s t-test for association, as well as a novel MB-MDR implementation based on Welch’s t-test. Traits are either left untransformed or are transformed into new traits via logarithmic, standardization or rank-based transformations, prior to MB-MDR modeling. Results Our simulation results show that MB-MDR controls type I error and false positive rates irrespective of the association test considered. Empirically-based MB-MDR power estimates for MB-MDR with Welch’s t-tests are generally lower than those for MB-MDR with Student’s t-tests. Trait transformations involving ranks tend to lead to increased power compared to the other considered data transformations. Conclusions When performing MB-MDR screening for gene-gene interactions with quantitative traits, we recommend to first rank-transform traits to normality and then to apply MB-MDR modeling with Student’s t-tests as internal tests for association.
Collapse
|
20
|
Van Lishout F, Mahachie John JM, Gusareva ES, Urrea V, Cleynen I, Théâtre E, Charloteaux B, Calle ML, Wehenkel L, Van Steen K. An efficient algorithm to perform multiple testing in epistasis screening. BMC Bioinformatics 2013; 14:138. [PMID: 23617239 PMCID: PMC3648350 DOI: 10.1186/1471-2105-14-138] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 04/12/2013] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn's disease. RESULTS In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn's disease (CD) data. CONCLUSIONS Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn's disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.
Collapse
Affiliation(s)
- François Van Lishout
- Systems and Modeling Unit, Montefiore Institute, University of Liège, 4000 Liège, Belgium.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Fang YH, Chiu YF. SVM-based generalized multifactor dimensionality reduction approaches for detecting gene-gene interactions in family studies. Genet Epidemiol 2013; 36:88-98. [PMID: 22851472 DOI: 10.1002/gepi.21602] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Gene-gene interaction plays an important role in the etiology of complex diseases, which may exist without a genetic main effect. Most current statistical approaches, however, focus on assessing an interaction effect in the presence of the gene's main effects. It would be very helpful to develop methods that can detect not only the gene's main effects but also gene-gene interaction effects regardless of the existence of the gene's main effects while adjusting for confounding factors. In addition, when a disease variant is rare or when the sample size is quite limited, the statistical asymptotic properties are not applicable; therefore, approaches based on a reasonable and applicable computational framework would be practical and frequently applied. In this study, we have developed an extended support vector machine (SVM) method and an SVM-based pedigree-based generalized multifactor dimensionality reduction (PGMDR) method to study interactions in the presence or absence of main effects of genes with an adjustment for covariates using limited samples of families. A new test statistic is proposed for classifying the affected and the unaffected in the SVM-based PGMDR approach to improve performance in detecting gene-gene interactions. Simulation studies under various scenarios have been performed to compare the performances of the proposed and the original methods. The proposed and original approaches have been applied to a real data example for illustration and comparison. Both the simulation and real data studies show that the proposed SVM and SVM-based PGMDR methods have great prediction accuracies, consistencies, and power in detecting gene-gene interactions.
Collapse
Affiliation(s)
- Yao-Hwei Fang
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Miaoli, Taiwan, ROC
| | | |
Collapse
|
22
|
Basu M, Das T, Ghosh A, Majumder S, Maji AK, Kanjilal SD, Mukhopadhyay I, Roychowdhury S, Banerjee S, Sengupta S. Gene-gene interaction and functional impact of polymorphisms on innate immune genes in controlling Plasmodium falciparum blood infection level. PLoS One 2012; 7:e46441. [PMID: 23071570 PMCID: PMC3470565 DOI: 10.1371/journal.pone.0046441] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2012] [Accepted: 08/30/2012] [Indexed: 12/19/2022] Open
Abstract
Genetic variations in toll-like receptors and cytokine genes of the innate immune pathways have been implicated in controlling parasite growth and the pathogenesis of Plasmodium falciparum mediated malaria. We previously published genetic association of TLR4 non-synonymous and TNF-α promoter polymorphisms with P.falciparum blood infection level and here we extend the study considerably by (i) investigating genetic dependence of parasite-load on interleukin-12B polymorphisms, (ii) reconstructing gene-gene interactions among candidate TLRs and cytokine loci, (iii) exploring genetic and functional impact of epistatic models and (iv) providing mechanistic insights into functionality of disease-associated regulatory polymorphisms. Our data revealed that carriage of AA (P = 0.0001) and AC (P = 0.01) genotypes of IL12B 3′UTR polymorphism was associated with a significant increase of mean log-parasitemia relative to rare homozygous genotype CC. Presence of IL12B+1188 polymorphism in five of six multifactor models reinforced its strong genetic impact on malaria phenotype. Elevation of genetic risk in two-component models compared to the corresponding single locus and reduction of IL12B (2.2 fold) and lymphotoxin-α (1.7 fold) expressions in patients'peripheral-blood-mononuclear-cells under TLR4Thr399Ile risk genotype background substantiated the role of Multifactor Dimensionality Reduction derived models. Marked reduction of promoter activity of TNF-α risk haplotype (C-C-G-G) compared to wild-type haplotype (T-C-G-G) with (84%) and without (78%) LPS stimulation and the loss of binding of transcription factors detected in-silico supported a causal role of TNF-1031. Significantly lower expression of IL12B+1188 AA (5 fold) and AC (9 fold) genotypes compared to CC and under-representation (P = 0.0048) of allele A in transcripts of patients' PBMCs suggested an Allele-Expression-Imbalance. Allele (A+1188C) dependent differential stability (2 fold) of IL12B-transcripts upon actinomycin-D treatment and observed structural modulation (P = 0.013) of RNA-ensemble were the plausible explanations for AEI. In conclusion, our data provides functional support to the hypothesis that de-regulated receptor-cytokine axis of innate immune pathway influences blood infection level in P. falciparum malaria.
Collapse
Affiliation(s)
- Madhumita Basu
- Department of Biochemistry, University of Calcutta, Kolkata, West Bengal, India
| | - Tania Das
- Cancer & Cell Biology Division, Indian Institute of Chemical Biology, Kolkata, West Bengal, India
| | - Alip Ghosh
- Centre for Liver Research, The Institute of Post-Graduate Medical Education & Research, Kolkata, West Bengal, India
| | - Subhadipa Majumder
- Department of Biochemistry, University of Calcutta, Kolkata, West Bengal, India
| | - Ardhendu Kumar Maji
- Department of Protozoology, The Calcutta School of Tropical Medicine, Kolkata, West Bengal, India
| | - Sumana Datta Kanjilal
- Department of Pediatric Medicine, Calcutta National Medical College, Kolkata, West Bengal, India
| | | | - Susanta Roychowdhury
- Cancer & Cell Biology Division, Indian Institute of Chemical Biology, Kolkata, West Bengal, India
| | - Soma Banerjee
- Centre for Liver Research, The Institute of Post-Graduate Medical Education & Research, Kolkata, West Bengal, India
| | - Sanghamitra Sengupta
- Department of Biochemistry, University of Calcutta, Kolkata, West Bengal, India
- * E-mail:
| |
Collapse
|
23
|
Multilocus family-based association analysis of seven candidate polymorphisms with essential hypertension in an african-derived semi-isolated brazilian population. Int J Hypertens 2012; 2012:859219. [PMID: 23056922 PMCID: PMC3463917 DOI: 10.1155/2012/859219] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2012] [Accepted: 07/11/2012] [Indexed: 12/16/2022] Open
Abstract
Background. It has been widely suggested that analyses considering multilocus effects would be crucial to characterize the relationship between gene variability and essential hypertension (EH). Objective. To test for the presence of multilocus effects between/among seven polymorphisms (six genes) on blood pressure-related traits in African-derived semi-isolated Brazilian populations (quilombos). Methods. Analyses were carried out using a family-based design in a sample of 652 participants (97 families). Seven variants were investigated: ACE (rs1799752), AGT (rs669), ADD2 (rs3755351), NOS3 (rs1799983), GNB3 (rs5441 and rs5443), and GRK4 (rs1801058). Sensitivity analyses were further performed under a case-control design with unrelated participants only. Results. None of the investigated variants were associated individually with both systolic and diastolic BP levels (SBP and DBP, respectively) or EH (as a binary outcome). Multifactor dimensionality reduction-based techniques revealed a marginal association of the combined effect of both GNB3 variants on DBP levels in a family-based design (P = 0.040), whereas a putative NOS3-GRK4 interaction also in relation to DBP levels was observed in the case-control design only (P = 0.004). Conclusion. Our results provide limited support for the hypothesis of multilocus effects between/among the studied variants on blood pressure in quilombos. Further larger studies are needed to validate our findings.
Collapse
|
24
|
Aschard H, Lutz S, Maus B, Duell EJ, Fingerlin TE, Chatterjee N, Kraft P, Van Steen K. Challenges and opportunities in genome-wide environmental interaction (GWEI) studies. Hum Genet 2012; 131:1591-613. [PMID: 22760307 DOI: 10.1007/s00439-012-1192-0] [Citation(s) in RCA: 110] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Accepted: 06/11/2012] [Indexed: 02/03/2023]
Abstract
The interest in performing gene-environment interaction studies has seen a significant increase with the increase of advanced molecular genetics techniques. Practically, it became possible to investigate the role of environmental factors in disease risk and hence to investigate their role as genetic effect modifiers. The understanding that genetics is important in the uptake and metabolism of toxic substances is an example of how genetic profiles can modify important environmental risk factors to disease. Several rationales exist to set up gene-environment interaction studies and the technical challenges related to these studies-when the number of environmental or genetic risk factors is relatively small-has been described before. In the post-genomic era, it is now possible to study thousands of genes and their interaction with the environment. This brings along a whole range of new challenges and opportunities. Despite a continuing effort in developing efficient methods and optimal bioinformatics infrastructures to deal with the available wealth of data, the challenge remains how to best present and analyze genome-wide environmental interaction (GWEI) studies involving multiple genetic and environmental factors. Since GWEIs are performed at the intersection of statistical genetics, bioinformatics and epidemiology, usually similar problems need to be dealt with as for genome-wide association gene-gene interaction studies. However, additional complexities need to be considered which are typical for large-scale epidemiological studies, but are also related to "joining" two heterogeneous types of data in explaining complex disease trait variation or for prediction purposes.
Collapse
Affiliation(s)
- Hugues Aschard
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA.
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Gyenesei A, Moody J, Semple CAM, Haley CS, Wei WH. High-throughput analysis of epistasis in genome-wide association studies with BiForce. ACTA ACUST UNITED AC 2012; 28:1957-64. [PMID: 22618535 PMCID: PMC3400955 DOI: 10.1093/bioinformatics/bts304] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Gene-gene interactions (epistasis) are thought to be important in shaping complex traits, but they have been under-explored in genome-wide association studies (GWAS) due to the computational challenge of enumerating billions of single nucleotide polymorphism (SNP) combinations. Fast screening tools are needed to make epistasis analysis routinely available in GWAS. RESULTS We present BiForce to support high-throughput analysis of epistasis in GWAS for either quantitative or binary disease (case-control) traits. BiForce achieves great computational efficiency by using memory efficient data structures, Boolean bitwise operations and multithreaded parallelization. It performs a full pair-wise genome scan to detect interactions involving SNPs with or without significant marginal effects using appropriate Bonferroni-corrected significance thresholds. We show that BiForce is more powerful and significantly faster than published tools for both binary and quantitative traits in a series of performance tests on simulated and real datasets. We demonstrate BiForce in analysing eight metabolic traits in a GWAS cohort (323 697 SNPs, >4500 individuals) and two disease traits in another (>340 000 SNPs, >1750 cases and 1500 controls) on a 32-node computing cluster. BiForce completed analyses of the eight metabolic traits within 1 day, identified nine epistatic pairs of SNPs in five metabolic traits and 18 SNP pairs in two disease traits. BiForce can make the analysis of epistasis a routine exercise in GWAS and thus improve our understanding of the role of epistasis in the genetic regulation of complex traits. AVAILABILITY AND IMPLEMENTATION The software is free and can be downloaded from http://bioinfo.utu.fi/BiForce/. CONTACT wenhua.wei@igmm.ed.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Attila Gyenesei
- Finnish Microarray and Sequencing Centre, Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | | | | | | | | |
Collapse
|
26
|
De Lobel L, Thijs L, Kouznetsova T, Staessen JA, Van Steen K. A family-based association test to detect gene-gene interactions in the presence of linkage. Eur J Hum Genet 2012; 20:973-80. [PMID: 22419171 DOI: 10.1038/ejhg.2012.45] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
For many complex diseases, quantitative traits contain more information than dichotomous traits. One of the approaches used to analyse these traits in family-based association studies is the quantitative transmission disequilibrium test (QTDT). The QTDT is a regression-based approach that models simultaneously linkage and association. It splits up the association effect in a between- and a within-family genetic component to adjust and test for population stratification and includes a variance components method to model linkage. We extend this approach to detect gene-gene interactions between two unlinked QTLs by adjusting the definition of the between- and within-family component and the variance components included in the model. We simulate data to investigate the influence of the epistasis model, linkage disequilibrium patterns between the markers and the QTLs, and allele frequencies on the power and type I error rates of the approach. Results show that for some of the investigated settings, power gains are obtained in comparison with FAM-MDR. We conclude that our approach shows promising results for candidate-gene studies where too few markers are available to correct for population stratification using standard methods (for example EIGENSTRAT). The proposed method is applied to real-life data on hypertension from the FLEMENGHO study.
Collapse
Affiliation(s)
- Lizzy De Lobel
- Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium.
| | | | | | | | | |
Collapse
|
27
|
Mahachie John JM, Cattaert T, Van Lishout F, Gusareva ES, Van Steen K. Lower-order effects adjustment in quantitative traits model-based multifactor dimensionality reduction. PLoS One 2012; 7:e29594. [PMID: 22242176 PMCID: PMC3252336 DOI: 10.1371/journal.pone.0029594] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Accepted: 12/01/2011] [Indexed: 11/18/2022] Open
Abstract
Identifying gene-gene interactions or gene-environment interactions in studies of human complex diseases remains a big challenge in genetic epidemiology. An additional challenge, often forgotten, is to account for important lower-order genetic effects. These may hamper the identification of genuine epistasis. If lower-order genetic effects contribute to the genetic variance of a trait, identified statistical interactions may simply be due to a signal boost of these effects. In this study, we restrict attention to quantitative traits and bi-allelic SNPs as genetic markers. Moreover, our interaction study focuses on 2-way SNP-SNP interactions. Via simulations, we assess the performance of different corrective measures for lower-order genetic effects in Model-Based Multifactor Dimensionality Reduction epistasis detection, using additive and co-dominant coding schemes. Performance is evaluated in terms of power and familywise error rate. Our simulations indicate that empirical power estimates are reduced with correction of lower-order effects, likewise familywise error rates. Easy-to-use automatic SNP selection procedures, SNP selection based on “top” findings, or SNP selection based on p-value criterion for interesting main effects result in reduced power but also almost zero false positive rates. Always accounting for main effects in the SNP-SNP pair under investigation during Model-Based Multifactor Dimensionality Reduction analysis adequately controls false positive epistasis findings. This is particularly true when adopting a co-dominant corrective coding scheme. In conclusion, automatic search procedures to identify lower-order effects to correct for during epistasis screening should be avoided. The same is true for procedures that adjust for lower-order effects prior to Model-Based Multifactor Dimensionality Reduction and involve using residuals as the new trait. We advocate using “on-the-fly” lower-order effects adjusting when screening for SNP-SNP interactions using Model-Based Multifactor Dimensionality Reduction analysis.
Collapse
Affiliation(s)
- Jestinah M. Mahachie John
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Liege, Belgium
- Bioinformatics and Modeling, GIGA-R, University of Liege, Liege, Belgium
- * E-mail:
| | - Tom Cattaert
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Liege, Belgium
- Bioinformatics and Modeling, GIGA-R, University of Liege, Liege, Belgium
| | - François Van Lishout
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Liege, Belgium
- Bioinformatics and Modeling, GIGA-R, University of Liege, Liege, Belgium
| | - Elena S. Gusareva
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Liege, Belgium
- Bioinformatics and Modeling, GIGA-R, University of Liege, Liege, Belgium
| | - Kristel Van Steen
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Liege, Belgium
- Bioinformatics and Modeling, GIGA-R, University of Liege, Liege, Belgium
| |
Collapse
|
28
|
Niu A, Zhang S, Sha Q. A novel method to detect gene-gene interactions in structured populations: MDR-SP. Ann Hum Genet 2011; 75:742-54. [PMID: 21972964 DOI: 10.1111/j.1469-1809.2011.00681.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Complex diseases are presumed to be the result of multiple genes and environmental factors, which emphasize the importance of gene - gene and gene - environment interactions. Traditional parametric approaches are limited in their ability to detect high-order interactions and handle sparse data, and standard stepwise procedures may miss interactions with undetectable main effects. To address these limitations, the multifactor dimensionality reduction (MDR) method was developed. MDR is well suited for examining high-order interactions and detecting interactions without main effects. Like most statistical methods in genetic association studies, MDR may also lead to a false positive in the presence of population stratification. Although many statistical methods have been proposed to detect main effects and control for population stratification using genomic markers, not many methods are available to detect interactions and control for population stratification at the same time. In this article, we developed a novel test, MDR in structured populations (MDR-SP), to detect the interactions and control for population stratification. MDR-SP is applicable to both quantitative and qualitative traits and can incorporate covariates. We present simulation studies to demonstrate the validity of the test and to evaluate its power.
Collapse
Affiliation(s)
- Adan Niu
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | | | | |
Collapse
|
29
|
Mahachie John JM, Cattaert T, De Lobel L, Van Lishout F, Empain A, Van Steen K. Comparison of genetic association strategies in the presence of rare alleles. BMC Proc 2011; 5 Suppl 9:S32. [PMID: 22373505 PMCID: PMC3287868 DOI: 10.1186/1753-6561-5-s9-s32] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
In the quest for the missing heritability of most complex diseases, rare variants have received increased attention. Advances in large-scale sequencing have led to a shift from the common disease/common variant hypothesis to the common disease/rare variant hypothesis or have at least reopened the debate about the relevance and importance of rare variants for gene discoveries. The investigation of modeling and testing approaches to identify significant disease/rare variant associations is in full motion. New methods to better deal with parameter estimation instabilities, convergence problems, or multiple testing corrections in the presence of rare variants or effect modifiers of rare variants are in their infancy. Using a recently developed semiparametric strategy to detect causal variants, we investigate the performance of the model-based multifactor dimensionality reduction (MB-MDR) technique in terms of power and family-wise error rate (FWER) control in the presence of rare variants, using population-based and family-based data (FAM-MDR). We compare family-based results obtained from MB-MDR analyses to screening findings from a quantitative trait Pedigree-based association test (PBAT). Population-based data were further examined using penalized regression models. We restrict attention to all available single-nucleotide polymorphisms on chromosome 4 and consider Q1 as the outcome of interest. The considered family-based methods identified marker C4S4935 in the VEGFC gene with estimated power not exceeding 0.35 (FAM-MDR), when FWER was kept under control. The considered population-based methods gave rise to highly inflated FWERs (up to 90% for PBAT screening).
Collapse
Affiliation(s)
- Jestinah M Mahachie John
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Grande Traverse 10, 4000 Liège, Belgium.,Bioinformatics and Modeling, GIGA-R, University of Liege, Avenue de l'Hôpital 1, 4000 Liège, Belgium
| | - Tom Cattaert
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Grande Traverse 10, 4000 Liège, Belgium.,Bioinformatics and Modeling, GIGA-R, University of Liege, Avenue de l'Hôpital 1, 4000 Liège, Belgium
| | - Lizzy De Lobel
- Department of Applied Mathematics and Computer Science, Ghent University, Krijgslaan 281 S9, 9000 Ghent, Belgium
| | - François Van Lishout
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Grande Traverse 10, 4000 Liège, Belgium.,Bioinformatics and Modeling, GIGA-R, University of Liege, Avenue de l'Hôpital 1, 4000 Liège, Belgium
| | - Alain Empain
- Bioinformatics and Modeling, GIGA-R, University of Liege, Avenue de l'Hôpital 1, 4000 Liège, Belgium
| | - Kristel Van Steen
- Systems and Modeling Unit, Montefiore Institute, University of Liege, Grande Traverse 10, 4000 Liège, Belgium.,Bioinformatics and Modeling, GIGA-R, University of Liege, Avenue de l'Hôpital 1, 4000 Liège, Belgium
| |
Collapse
|
30
|
Gilbert-Diamond D, Moore JH. Analysis of gene-gene interactions. CURRENT PROTOCOLS IN HUMAN GENETICS 2011; Chapter 1:Unit1.14. [PMID: 21735376 PMCID: PMC4086055 DOI: 10.1002/0471142905.hg0114s70] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The goal of this unit is to introduce gene-gene interactions (epistasis) as a significant complicating factor in the search for disease susceptibility genes. This unit begins with an overview of gene-gene interactions and why they are likely to be common. Then, it reviews several statistical and computational methods for detecting and characterizing genes with effects that are dependent on other genes. The focus of this unit is genetic association studies of discrete and quantitative traits because most of the methods for detecting gene-gene interactions have been developed specifically for these study designs.
Collapse
Affiliation(s)
- Diane Gilbert-Diamond
- Computational Genetics Laboratory, Departments of Genetics and Community and Family Medicine, Dartmouth Medical School, Lebanon, New Hampshire, USA
| | | |
Collapse
|
31
|
Van Steen K. Perspectives on genome-wide multi-stage family-based association studies. Stat Med 2011; 30:2201-21. [DOI: 10.1002/sim.4259] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2010] [Accepted: 03/07/2011] [Indexed: 01/03/2023]
|
32
|
Abstract
Over the last few years, main effect genetic association analysis has proven to be a successful tool to unravel genetic risk components to a variety of complex diseases. In the quest for disease susceptibility factors and the search for the 'missing heritability', supplementary and complementary efforts have been undertaken. These include the inclusion of several genetic inheritance assumptions in model development, the consideration of different sources of information, and the acknowledgement of disease underlying pathways of networks. The search for epistasis or gene-gene interaction effects on traits of interest is marked by an exponential growth, not only in terms of methodological development, but also in terms of practical applications, translation of statistical epistasis to biological epistasis and integration of omics information sources. The current popularity of the field, as well as its attraction to interdisciplinary teams, each making valuable contributions with sometimes rather unique viewpoints, renders it impossible to give an exhaustive review of to-date available approaches for epistasis screening. The purpose of this work is to give a perspective view on a selection of currently active analysis strategies and concerns in the context of epistasis detection, and to provide an eye to the future of gene-gene interaction analysis.
Collapse
Affiliation(s)
- Kristel Van Steen
- Department of Electrical Engineering and Computer Science (Montefiore Institute), Grande Traverse, Bioinformatique 4000 Liège 1, Belgium.
| |
Collapse
|
33
|
Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data. Eur J Hum Genet 2011; 19:696-703. [PMID: 21407267 DOI: 10.1038/ejhg.2011.17] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Detecting gene-gene interactions or epistasis in studies of human complex diseases is a big challenge in the area of epidemiology. To address this problem, several methods have been developed, mainly in the context of data dimensionality reduction. One of these methods, Model-Based Multifactor Dimensionality Reduction, has so far mainly been applied to case-control studies. In this study, we evaluate the power of Model-Based Multifactor Dimensionality Reduction for quantitative traits to detect gene-gene interactions (epistasis) in the presence of error-free and noisy data. Considered sources of error are genotyping errors, missing genotypes, phenotypic mixtures and genetic heterogeneity. Our simulation study encompasses a variety of settings with varying minor allele frequencies and genetic variance for different epistasis models. On each simulated data, we have performed Model-Based Multifactor Dimensionality Reduction in two ways: with and without adjustment for main effects of (known) functional SNPs. In line with binary trait counterparts, our simulations show that the power is lowest in the presence of phenotypic mixtures or genetic heterogeneity compared to scenarios with missing genotypes or genotyping errors. In addition, empirical power estimates reduce even further with main effects corrections, but at the same time, false-positive percentages are reduced as well. In conclusion, phenotypic mixtures and genetic heterogeneity remain challenging for epistasis detection, and careful thought must be given to the way important lower-order effects are accounted for in the analysis.
Collapse
|
34
|
Grady BJ, Ritchie MD. Statistical Optimization of Pharmacogenomics Association Studies: Key Considerations from Study Design to Analysis. CURRENT PHARMACOGENOMICS AND PERSONALIZED MEDICINE 2011; 9:41-66. [PMID: 21887206 PMCID: PMC3163263 DOI: 10.2174/187569211794728805] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Research in human genetics and genetic epidemiology has grown significantly over the previous decade, particularly in the field of pharmacogenomics. Pharmacogenomics presents an opportunity for rapid translation of associated genetic polymorphisms into diagnostic measures or tests to guide therapy as part of a move towards personalized medicine. Expansion in genotyping technology has cleared the way for widespread use of whole-genome genotyping in the effort to identify novel biology and new genetic markers associated with pharmacokinetic and pharmacodynamic endpoints. With new technology and methodology regularly becoming available for use in genetic studies, a discussion on the application of such tools becomes necessary. In particular, quality control criteria have evolved with the use of GWAS as we have come to understand potential systematic errors which can be introduced into the data during genotyping. There have been several replicated pharmacogenomic associations, some of which have moved to the clinic to enact change in treatment decisions. These examples of translation illustrate the strength of evidence necessary to successfully and effectively translate a genetic discovery. In this review, the design of pharmacogenomic association studies is examined with the goal of optimizing the impact and utility of this research. Issues of ascertainment, genotyping, quality control, analysis and interpretation are considered.
Collapse
Affiliation(s)
- Benjamin J. Grady
- Department of Molecular Physiology & Biophysics, Center for Human Genetics Research, Vanderbilt University, Nashville, TN, USA
| | - Marylyn D. Ritchie
- Department of Molecular Physiology & Biophysics, Center for Human Genetics Research, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
35
|
Practical and theoretical considerations in study design for detecting gene-gene interactions using MDR and GMDR approaches. PLoS One 2011; 6:e16981. [PMID: 21386969 PMCID: PMC3046176 DOI: 10.1371/journal.pone.0016981] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2010] [Accepted: 01/19/2011] [Indexed: 12/25/2022] Open
Abstract
Detection of interacting risk factors for complex traits is challenging. The choice of an appropriate method, sample size, and allocation of cases and controls are serious concerns. To provide empirical guidelines for planning such studies and data analyses, we investigated the performance of the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR) methods under various experimental scenarios. We developed the mathematical expectation of accuracy and used it as an indicator parameter to perform a gene-gene interaction study. We then examined the statistical power of GMDR and MDR within the plausible range of accuracy (0.50∼0.65) reported in the literature. The GMDR with covariate adjustment had a power of>80% in a case-control design with a sample size of≥2000, with theoretical accuracy ranging from 0.56 to 0.62. However, when the accuracy was<0.56, a sample size of≥4000 was required to have sufficient power. In our simulations, the GMDR outperformed the MDR under all models with accuracy ranging from 0.56∼0.62 for a sample size of 1000–2000. However, the two methods performed similarly when the accuracy was outside this range or the sample was significantly larger. We conclude that with adjustment of a covariate, GMDR performs better than MDR and a sample size of 1000∼2000 is reasonably large for detecting gene-gene interactions in the range of effect size reported by the current literature; whereas larger sample size is required for more subtle interactions with accuracy<0.56.
Collapse
|
36
|
Cattaert T, Calle ML, Dudek SM, Mahachie John JM, Van Lishout F, Urrea V, Ritchie MD, Van Steen K. Model-based multifactor dimensionality reduction for detecting epistasis in case-control data in the presence of noise. Ann Hum Genet 2010; 75:78-89. [PMID: 21158747 DOI: 10.1111/j.1469-1809.2010.00604.x] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Analyzing the combined effects of genes and/or environmental factors on the development of complex diseases is a great challenge from both the statistical and computational perspective, even using a relatively small number of genetic and nongenetic exposures. Several data-mining methods have been proposed for interaction analysis, among them, the Multifactor Dimensionality Reduction Method (MDR) has proven its utility in a variety of theoretical and practical settings. Model-Based Multifactor Dimensionality Reduction (MB-MDR), a relatively new MDR-based technique that is able to unify the best of both nonparametric and parametric worlds, was developed to address some of the remaining concerns that go along with an MDR analysis. These include the restriction to univariate, dichotomous traits, the absence of flexible ways to adjust for lower order effects and important confounders, and the difficulty in highlighting epistatic effects when too many multilocus genotype cells are pooled into two new genotype groups. We investigate the empirical power of MB-MDR to detect gene-gene interactions in the absence of any noise and in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Power is generally higher for MB-MDR than for MDR, in particular in the presence of genetic heterogeneity, phenocopy, or low minor allele frequencies.
Collapse
Affiliation(s)
- Tom Cattaert
- Montefiore Institute, University of Liege, Belgium
| | | | | | | | | | | | | | | |
Collapse
|
37
|
Comparison of information-theoretic to statistical methods for gene-gene interactions in the presence of genetic heterogeneity. BMC Genomics 2010; 11:487. [PMID: 20815886 PMCID: PMC2996983 DOI: 10.1186/1471-2164-11-487] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Accepted: 09/03/2010] [Indexed: 11/10/2022] Open
Abstract
Background Multifactorial diseases such as cancer and cardiovascular diseases are caused by the complex interplay between genes and environment. The detection of these interactions remains challenging due to computational limitations. Information theoretic approaches use computationally efficient directed search strategies and thus provide a feasible solution to this problem. However, the power of information theoretic methods for interaction analysis has not been systematically evaluated. In this work, we compare power and Type I error of an information-theoretic approach to existing interaction analysis methods. Methods The k-way interaction information (KWII) metric for identifying variable combinations involved in gene-gene interactions (GGI) was assessed using several simulated data sets under models of genetic heterogeneity driven by susceptibility increasing loci with varying allele frequency, penetrance values and heritability. The power and proportion of false positives of the KWII was compared to multifactor dimensionality reduction (MDR), restricted partitioning method (RPM) and logistic regression. Results The power of the KWII was considerably greater than MDR on all six simulation models examined. For a given disease prevalence at high values of heritability, the power of both RPM and KWII was greater than 95%. For models with low heritability and/or genetic heterogeneity, the power of the KWII was consistently greater than RPM; the improvements in power for the KWII over RPM ranged from 4.7% to 14.2% at for α = 0.001 in the three models at the lowest heritability values examined. KWII performed similar to logistic regression. Conclusions Information theoretic models are flexible and have excellent power to detect GGI under a variety of conditions that characterize complex diseases.
Collapse
|
38
|
Machine learning techniques for single nucleotide polymorphism--disease classification models in schizophrenia. Molecules 2010; 15:4875-89. [PMID: 20657396 PMCID: PMC6257637 DOI: 10.3390/molecules15074875] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Revised: 07/08/2010] [Accepted: 07/09/2010] [Indexed: 11/16/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) can be used as inputs in disease computational studies such as pattern searching and classification models. Schizophrenia is an example of a complex disease with an important social impact. The multiple causes of this disease create the need of new genetic or proteomic patterns that can diagnose patients using biological information. This work presents a computational study of disease machine learning classification models using only single nucleotide polymorphisms at the HTR2A and DRD3 genes from Galician (Northwest Spain) schizophrenic patients. These classification models establish for the first time, to the best knowledge of the authors, a relationship between the sequence of the nucleic acid molecule and schizophrenia (Quantitative Genotype – Disease Relationships) that can automatically recognize schizophrenia DNA sequences and correctly classify between 78.3–93.8% of schizophrenia subjects when using datasets which include simulated negative subjects and a linear artificial neural network.
Collapse
|
39
|
Moore JH. Detecting, characterizing, and interpreting nonlinear gene-gene interactions using multifactor dimensionality reduction. ADVANCES IN GENETICS 2010; 72:101-16. [PMID: 21029850 DOI: 10.1016/b978-0-12-380862-2.00005-9] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Human health is a complex process that is dependent on many genes, many environmental factors and chance events that are perhaps not measurable with current technology or are simply unknowable. Success in the design and execution of population-based association studies to identify those genetic and environmental factors that play an important role in human disease will depend on our ability to embrace, rather that ignore, complexity in the genotype to phenotype mapping relationship for any given human ecology. We review here three general computational challenges that must be addressed. First, data mining and machine learning methods are needed to model nonlinear interactions between multiple genetic and environmental factors. Second, filter and wrapper methods are needed to identify attribute interactions in large and complex solution landscapes. Third, visualization methods are needed to help interpret computational models and results. We provide here an overview of the multifactor dimensionality reduction (MDR) method that was developed for addressing each of these challenges.
Collapse
Affiliation(s)
- Jason H Moore
- Institute for Quantitative Biomedical Sciences, Departments of Genetics and Community and Family Medicine, Dartmouth Medical School, Lebanon, New Hampshire, USA
| |
Collapse
|