1
|
Ren W, Liang Z. Review on GPU accelerated methods for genome-wide SNP-SNP interactions. Mol Genet Genomics 2024; 300:10. [PMID: 39738695 DOI: 10.1007/s00438-024-02214-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Accepted: 12/11/2024] [Indexed: 01/02/2025]
Abstract
Detecting genome-wide SNP-SNP interactions (epistasis) efficiently is essential to harnessing the vast data now available from modern biobanks. With millions of SNPs and genetic information from hundreds of thousands of individuals, researchers are positioned to uncover new insights into complex disease pathways. However, this data scale brings significant computational and statistical challenges. To address these, recent approaches leverage GPU-based parallel computing for high-throughput, cost-effective analysis and refine algorithms to improve time and memory efficiency. In this survey, we systematically review GPU-accelerated methods for exhaustive epistasis detection, detailing the statistical models used and the computational strategies employed to enhance performance. Our findings indicate substantial speedups with GPU implementations over traditional CPU approaches. We conclude that while GPU-based solutions hold promise for advancing genomic research, continued innovation in both algorithm design and hardware optimization is necessary to meet future data challenges in the field.
Collapse
Affiliation(s)
- Wenlong Ren
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong, 226019, China.
| | - Zhikai Liang
- Department of Plant Sciences, North Dakota State University, Fargo, 58108, USA
| |
Collapse
|
2
|
Wang Q, Tang TM, Youlton N, Weldy CS, Kenney AM, Ronen O, Weston Hughes J, Chin ET, Sutton SC, Agarwal A, Li X, Behr M, Kumbier K, Moravec CS, Wilson Tang WH, Margulies KB, Cappola TP, Butte AJ, Arnaout R, Brown JB, Priest JR, Parikh VN, Yu B, Ashley EA. Epistasis regulates genetic control of cardiac hypertrophy. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.11.06.23297858. [PMID: 37987017 PMCID: PMC10659487 DOI: 10.1101/2023.11.06.23297858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
The combinatorial effect of genetic variants is often assumed to be additive. Although genetic variation can clearly interact non-additively, methods to uncover epistatic relationships remain in their infancy. We develop low-signal signed iterative random forests to elucidate the complex genetic architecture of cardiac hypertrophy. We derive deep learning-based estimates of left ventricular mass from the cardiac MRI scans of 29,661 individuals enrolled in the UK Biobank. We report epistatic genetic variation including variants close to CCDC141 , IGF1R , TTN , and TNKS. Several loci where variants were deemed insignificant in univariate genome-wide association analyses are identified. Functional genomic and integrative enrichment analyses reveal a complex gene regulatory network in which genes mapped from these loci share biological processes and myogenic regulatory factors. Through a network analysis of transcriptomic data from 313 explanted human hearts, we found strong gene co-expression correlations between these statistical epistasis contributors in healthy hearts and a significant connectivity decrease in failing hearts. We assess causality of epistatic effects via RNA silencing of gene-gene interactions in human induced pluripotent stem cell-derived cardiomyocytes. Finally, single-cell morphology analysis using a novel high-throughput microfluidic system shows that cardiomyocyte hypertrophy is non-additively modifiable by specific pairwise interactions between CCDC141 and both TTN and IGF1R . Our results expand the scope of genetic regulation of cardiac structure to epistasis.
Collapse
|
3
|
Abstract
BACKGROUND Autoimmune hepatitis has an unknown cause and genetic associations that are not disease-specific or always present. Clarification of its missing causality and heritability could improve prevention and management strategies. AIMS Describe the key epigenetic and genetic mechanisms that could account for missing causality and heritability in autoimmune hepatitis; indicate the prospects of these mechanisms as pivotal factors; and encourage investigations of their pathogenic role and therapeutic potential. METHODS English abstracts were identified in PubMed using multiple key search phases. Several hundred abstracts and 210 full-length articles were reviewed. RESULTS Environmental induction of epigenetic changes is the prime candidate for explaining the missing causality of autoimmune hepatitis. Environmental factors (diet, toxic exposures) can alter chromatin structure and the production of micro-ribonucleic acids that affect gene expression. Epistatic interaction between unsuspected genes is the prime candidate for explaining the missing heritability. The non-additive, interactive effects of multiple genes could enhance their impact on the propensity and phenotype of autoimmune hepatitis. Transgenerational inheritance of acquired epigenetic marks constitutes another mechanism of transmitting parental adaptations that could affect susceptibility. Management strategies could range from lifestyle adjustments and nutritional supplements to precision editing of the epigenetic landscape. CONCLUSIONS Autoimmune hepatitis has a missing causality that might be explained by epigenetic changes induced by environmental factors and a missing heritability that might reflect epistatic gene interactions or transgenerational transmission of acquired epigenetic marks. These unassessed or under-evaluated areas warrant investigation.
Collapse
Affiliation(s)
- Albert J Czaja
- Mayo Clinic College of Medicine and Science, Rochester, MN, USA.
- Professor Emeritus of Medicine, Mayo Clinic College of Medicine and Science, 200 First Street SW, Rochester, MN, 55905, USA.
| |
Collapse
|
4
|
Alamin M, Sultana MH, Lou X, Jin W, Xu H. Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS. PLANTS (BASEL, SWITZERLAND) 2022; 11:3277. [PMID: 36501317 PMCID: PMC9739826 DOI: 10.3390/plants11233277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Genome-wide association study (GWAS) is the most popular approach to dissecting complex traits in plants, humans, and animals. Numerous methods and tools have been proposed to discover the causal variants for GWAS data analysis. Among them, linear mixed models (LMMs) are widely used statistical methods for regulating confounding factors, including population structure, resulting in increased computational proficiency and statistical power in GWAS studies. Recently more attention has been paid to pleiotropy, multi-trait, gene-gene interaction, gene-environment interaction, and multi-locus methods with the growing availability of large-scale GWAS data and relevant phenotype samples. In this review, we have demonstrated all possible LMMs-based methods available in the literature for GWAS. We briefly discuss the different LMM methods, software packages, and available open-source applications in GWAS. Then, we include the advantages and weaknesses of the LMMs in GWAS. Finally, we discuss the future perspective and conclusion. The present review paper would be helpful to the researchers for selecting appropriate LMM models and methods quickly for GWAS data analysis and would benefit the scientific society.
Collapse
Affiliation(s)
- Md. Alamin
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | | | - Xiangyang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Wenfei Jin
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
5
|
Amorim ST, Stafuzza NB, Kluska S, Peripolli E, Pereira ASC, Muller da Silveira LF, de Albuquerque LG, Baldi F. Genome-wide interaction study reveals epistatic interactions for beef lipid-related traits in Nellore cattle. Anim Genet 2021; 53:35-48. [PMID: 34407235 DOI: 10.1111/age.13124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/02/2021] [Indexed: 11/27/2022]
Abstract
Gene-gene interactions cause hidden genetic variation in natural populations and could be responsible for the lack of replication that is typically observed in complex traits studies. This study aimed to identify gene-gene interactions using the empirical Hilbert-Schmidt Independence Criterion method to test for epistasis in beef fatty acid profile traits of Nellore cattle. The dataset contained records from 963 bulls, genotyped using a 777 962k SNP chip. Meat samples of Longissimus muscle, were taken to measure fatty acid composition, which was quantified by gas chromatography. We chose to work with the sums of saturated (SFA), monounsaturated (MUFA), polyunsaturated (PUFA), omega-3 (OM3), omega-6 (OM6), SFA:PUFA and OM3:OM6 fatty acid ratios. The SNPs in the interactions where P < 10 - 8 were mapped individually and used to search for candidate genes. Totals of 602, 3, 13, 23, 13, 215 and 169 candidate genes for SFAs, MUFAs, PUFAs, OM3s, OM6s and SFA:PUFA and OM3:OM6 ratios were identified respectively. The candidate genes found were associated with cholesterol, lipid regulation, low-density lipoprotein receptors, feed efficiency and inflammatory response. Enrichment analysis revealed 57 significant GO and 18 KEGG terms ( P < 0.05), most of them related to meat quality and complementary terms. Our results showed substantial genetic interactions associated with lipid profile, meat quality, carcass and feed efficiency traits for the first time in Nellore cattle. The knowledge of these SNP-SNP interactions could improve understanding of the genetic and physiological mechanisms that contribute to lipid-related traits and improve human health by the selection of healthier meat products.
Collapse
Affiliation(s)
- S T Amorim
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, Via de acesso Prof. Paulo Donato Castellane, s/no, Jaboticabal, CEP 14884-900, Brazil
| | - N B Stafuzza
- Instituto de Zootecnia - Centro de Pesquisa em Bovinos de Corte, Rodovia Carlos Tonanni, Km94, Sertãozinho, 14174-000, Brazil
| | - S Kluska
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, Via de acesso Prof. Paulo Donato Castellane, s/no, Jaboticabal, CEP 14884-900, Brazil
| | - E Peripolli
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, Via de acesso Prof. Paulo Donato Castellane, s/no, Jaboticabal, CEP 14884-900, Brazil
| | - A S C Pereira
- Faculdade de Zootecnia e Engenharia de Alimentos, Núcleo de Apoio à Pesquisa em Melhoramento Animal, Biotecnologia e Transgenia, Universidade de São Paulo, Rua Duque de Caxias Norte, 225, Pirassununga, CEP 13635-900, Brazil
| | - L F Muller da Silveira
- Faculdade de Zootecnia e Engenharia de Alimentos, Núcleo de Apoio à Pesquisa em Melhoramento Animal, Biotecnologia e Transgenia, Universidade de São Paulo, Rua Duque de Caxias Norte, 225, Pirassununga, CEP 13635-900, Brazil
| | - L G de Albuquerque
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, Via de acesso Prof. Paulo Donato Castellane, s/no, Jaboticabal, CEP 14884-900, Brazil
| | - F Baldi
- Departamento de Zootecnia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, Via de acesso Prof. Paulo Donato Castellane, s/no, Jaboticabal, CEP 14884-900, Brazil
| |
Collapse
|
6
|
Pecanka J, Jonker MA. Two-Stage Testing for Epistasis: Screening and Verification. Methods Mol Biol 2021; 2212:69-92. [PMID: 33733351 DOI: 10.1007/978-1-0716-0947-7_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
Undiscovered gene-to-gene interaction (epistasis) is a possible explanation for the "missing heritability" of complex traits and diseases. On a genome-wide scale, screening for epistatic effects among all possible pairs of genetic markers faces two main complications. Firstly, the classical statistical methods for modeling epistasis are computationally very expensive, which makes them impractical on such large scale. Secondly, straightforward corrections for multiple testing using the classical methods tend to be too coarse and inefficient at discovering the epistatic effects in such a large scale application. In this chapter, we describe both the underlying framework and practical examples of two-stage statistical testing methods that alleviate both of the aforementioned complications.
Collapse
|
7
|
Tyler AL, Emerson J, El Kassaby B, Wells AE, Philip VM, Carter GW. The Combined Analysis of Pleiotropy and Epistasis (CAPE). Methods Mol Biol 2021; 2212:55-67. [PMID: 33733350 DOI: 10.1007/978-1-0716-0947-7_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Epistasis, or gene-gene interaction, contributes substantially to trait variation in organisms ranging from yeast to humans, and modeling epistasis directly is critical to understanding the genotype-phenotype map. However, inference of genetic interactions is challenging compared to inference of individual allele effects due to low statistical power. Furthermore, genetic interactions can appear inconsistent across different quantitative traits, presenting a challenge for the interpretation of detected interactions. Here we present a method called the Combined Analysis of Pleiotropy and Epistasis (CAPE) that combines information across multiple quantitative traits to infer directed epistatic interactions. By combining information across multiple traits, CAPE not only increases power to detect genetic interactions but also interprets these interactions across traits to identify a single interaction that is consistent across all observed data. This method generates informative, interpretable interaction networks that explain how variants interact with each other to influence groups of related traits. This method could potentially be used to link genetic variants to gene expression, physiological endophenotypes, and higher-level disease traits.
Collapse
|
8
|
Genetic control of non-genetic inheritance in mammals: state-of-the-art and perspectives. Mamm Genome 2020; 31:146-156. [PMID: 32529318 PMCID: PMC7369129 DOI: 10.1007/s00335-020-09841-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 06/03/2020] [Indexed: 12/12/2022]
Abstract
Thought to be directly and uniquely dependent from genotypes, the ontogeny of individual phenotypes is much more complicated. Individual genetics, environmental exposures, and their interaction are the three main determinants of individual's phenotype. This picture has been further complicated a decade ago when the Lamarckian theory of acquired inheritance has been rekindled with the discovery of epigenetic inheritance, according to which acquired phenotypes can be transmitted through fertilization and affect phenotypes across generations. The results of Genome-Wide Association Studies have also highlighted a big degree of missing heritability in genetics and have provided hints that not only acquired phenotypes, but also individual's genotypes affect phenotypes intergenerationally through indirect genetic effects. Here, we review available examples of indirect genetic effects in mammals, what is known of the underlying molecular mechanisms and their potential impact for our understanding of missing heritability, phenotypic variation. and individual disease risk.
Collapse
|
9
|
Wang H, Yue T, Yang J, Wu W, Xing EP. Deep mixed model for marginal epistasis detection and population stratification correction in genome-wide association studies. BMC Bioinformatics 2019; 20:656. [PMID: 31881907 PMCID: PMC6933893 DOI: 10.1186/s12859-019-3300-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Accepted: 12/02/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Genome-wide Association Studies (GWAS) have contributed to unraveling associations between genetic variants in the human genome and complex traits for more than a decade. While many works have been invented as follow-ups to detect interactions between SNPs, epistasis are still yet to be modeled and discovered more thoroughly. RESULTS In this paper, following the previous study of detecting marginal epistasis signals, and motivated by the universal approximation power of deep learning, we propose a neural network method that can potentially model arbitrary interactions between SNPs in genetic association studies as an extension to the mixed models in correcting confounding factors. Our method, namely Deep Mixed Model, consists of two components: 1) a confounding factor correction component, which is a large-kernel convolution neural network that focuses on calibrating the residual phenotypes by removing factors such as population stratification, and 2) a fixed-effect estimation component, which mainly consists of an Long-short Term Memory (LSTM) model that estimates the association effect size of SNPs with the residual phenotype. CONCLUSIONS After validating the performance of our method using simulation experiments, we further apply it to Alzheimer's disease data sets. Our results help gain some explorative understandings of the genetic architecture of Alzheimer's disease.
Collapse
Affiliation(s)
- Haohan Wang
- Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA
| | - Tianwei Yue
- Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA
| | - Jingkang Yang
- Department of Electrical and Computer Engineering, Rice University, Houston, TX USA
| | - Wei Wu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA
| | - Eric P. Xing
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA USA
| |
Collapse
|
10
|
Zhu S, Fang G. MatrixEpistasis: ultrafast, exhaustive epistasis scan for quantitative traits with covariate adjustment. Bioinformatics 2019; 34:2341-2348. [PMID: 29509873 DOI: 10.1093/bioinformatics/bty094] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 02/28/2018] [Indexed: 12/22/2022] Open
Abstract
Motivation For many traits, causal loci uncovered by genetic mapping studies explain only a minority of the heritable contribution to trait variation. Multiple explanations for this 'missing heritability' have been proposed. Single nucleotide polymorphism (SNP)-SNP interaction (epistasis), as one of the compelling models, has been widely studied. However, the genome-wide scan of epistasis, especially for quantitative traits, poses huge computational challenges. Moreover, covariate adjustment is largely ignored in epistasis analysis due to the massive extra computational undertaking. Results In the current study, we found striking differences among epistasis models using both simulation data and real biological data, suggesting that not only can covariate adjustment remove confounding bias, it can also improve power. Furthermore, we derived mathematical formulas, which enable the exhaustive epistasis scan together with full covariate adjustment to be expressed in terms of large matrix operation, therefore substantially improving the computational efficiency (∼104× faster than existing methods). We call the new method MatrixEpistasis. With MatrixEpistasis, we re-analyze a large real yeast dataset comprising 11 623 SNPs, 1008 segregants and 46 quantitative traits with covariates fully adjusted and detect thousands of novel putative epistasis with P-values < 1.48e-10. Availability and implementation The method is implemented in R and available at https://github.com/fanglab/MatrixEpistasis. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shijia Zhu
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gang Fang
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
11
|
Yan J, Risacher SL, Shen L, Saykin AJ. Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data. Brief Bioinform 2019; 19:1370-1381. [PMID: 28679163 DOI: 10.1093/bib/bbx066] [Citation(s) in RCA: 135] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Indexed: 11/14/2022] Open
Abstract
In the past decade, significant progress has been made in complex disease research across multiple omics layers from genome, transcriptome and proteome to metabolome. There is an increasing awareness of the importance of biological interconnections, and much success has been achieved using systems biology approaches. However, because of the typical focus on one single omics layer at a time, existing systems biology findings explain only a modest portion of complex disease. Recent advances in multi-omics data collection and sharing present us new opportunities for studying complex diseases in a more comprehensive fashion, and yet simultaneously create new challenges considering the unprecedented data dimensionality and diversity. Here, our goal is to review extant and emerging network approaches that can be applied across multiple biological layers to facilitate a more comprehensive and integrative multilayered omics analysis of complex diseases.
Collapse
Affiliation(s)
- Jingwen Yan
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis, USA
| | - Shannon L Risacher
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Li Shen
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| |
Collapse
|
12
|
Joiret M, Mahachie John JM, Gusareva ES, Van Steen K. Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies. BioData Min 2019; 12:11. [PMID: 31198442 PMCID: PMC6558841 DOI: 10.1186/s13040-019-0199-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 05/09/2019] [Indexed: 01/07/2023] Open
Abstract
Background In Genome-Wide Association Studies (GWAS), the concept of linkage disequilibrium is important as it allows identifying genetic markers that tag the actual causal variants. In Genome-Wide Association Interaction Studies (GWAIS), similar principles hold for pairs of causal variants. However, Linkage Disequilibrium (LD) may also interfere with the detection of genuine epistasis signals in that there may be complete confounding between Gametic Phase Disequilibrium (GPD) and interaction. GPD may involve unlinked genetic markers, even residing on different chromosomes. Often GPD is eliminated in GWAIS, via feature selection schemes or so-called pruning algorithms, to obtain unconfounded epistasis results. However, little is known about the optimal degree of GPD/LD-pruning that gives a balance between false positive control and sufficient power of epistasis detection statistics. Here, we focus on Model-Based Multifactor Dimensionality Reduction as one large-scale epistasis detection tool. Its performance has been thoroughly investigated in terms of false positive control and power, under a variety of scenarios involving different trait types and study designs, as well as error-free and noisy data, but never with respect to multicollinear SNPs. Results Using real-life human LD patterns from a homogeneous subpopulation of British ancestry, we investigated the impact of LD-pruning on the statistical sensitivity of MB-MDR. We considered three different non-fully penetrant epistasis models with varying effect sizes. There is a clear advantage in pre-analysis pruning using sliding windows at r2 of 0.75 or lower, but using a threshold of 0.20 has a detrimental effect on the power to detect a functional interactive SNP pair (power < 25%). Signal sensitivity, directly using LD-block information to determine whether an epistasis signal is present or not, benefits from LD-pruning as well (average power across scenarios: 87%), but is largely hampered by functional loci residing at the boundaries of an LD-block. Conclusions Our results confirm that LD patterns and the position of causal variants in LD blocks do have an impact on epistasis detection, and that pruning strategies and LD-blocks definitions combined need careful attention, if we wish to maximize the power of large-scale epistasis screenings.
Collapse
Affiliation(s)
- Marc Joiret
- BIO3, GIGA-R Medical Genomics, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium.,Biomechanics Research Unit, GIGA-R in-silico medicine, Liège, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium
| | | | - Elena S Gusareva
- BIO3, GIGA-R Medical Genomics, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium
| | - Kristel Van Steen
- BIO3, GIGA-R Medical Genomics, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium.,WELBIO researcher, Avenue de l'Hôpital 1-B34-CHU, Liège, 4000 Belgium
| |
Collapse
|
13
|
Van Steen K, Moore JH. How to increase our belief in discovered statistical interactions via large-scale association studies? Hum Genet 2019; 138:293-305. [PMID: 30840129 PMCID: PMC6483943 DOI: 10.1007/s00439-019-01987-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 02/20/2019] [Indexed: 12/31/2022]
Abstract
The understanding that differences in biological epistasis may impact disease risk, diagnosis, or disease management stands in wide contrast to the unavailability of widely accepted large-scale epistasis analysis protocols. Several choices in the analysis workflow will impact false-positive and false-negative rates. One of these choices relates to the exploitation of particular modelling or testing strategies. The strengths and limitations of these need to be well understood, as well as the contexts in which these hold. This will contribute to determining the potentially complementary value of epistasis detection workflows and is expected to increase replication success with biological relevance. In this contribution, we take a recently introduced regression-based epistasis detection tool as a leading example to review the key elements that need to be considered to fully appreciate the value of analytical epistasis detection performance assessments. We point out unresolved hurdles and give our perspectives towards overcoming these.
Collapse
Affiliation(s)
- K Van Steen
- WELBIO, GIGA-R Medical Genomics-BIO3, University of Liège, Liege, Belgium.
- Department of Human Genetics, University of Leuven, Leuven, Belgium.
| | - J H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, USA
| |
Collapse
|
14
|
Carmelo VAO, Kogelman LJA, Madsen MB, Kadarmideen HN. WISH-R- a fast and efficient tool for construction of epistatic networks for complex traits and diseases. BMC Bioinformatics 2018; 19:277. [PMID: 30064383 PMCID: PMC6069724 DOI: 10.1186/s12859-018-2291-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Accepted: 07/18/2018] [Indexed: 12/28/2022] Open
Abstract
Background Genetic epistasis is an often-overlooked area in the study of the genomics of complex traits. Genome-wide association studies are a useful tool for revealing potential causal genetic variants, but in this context, epistasis is generally ignored. Data complexity and interpretation issues make it difficult to process and interpret epistasis. As the number of interaction grows exponentially with the number of variants, computational limitation is a bottleneck. Gene Network based strategies have been successful in integrating biological data and identifying relevant hub genes and pathways related to complex traits. In this study, epistatic interactions and network-based analysis are combined in the Weighted Interaction SNP hub (WISH) method and implemented in an efficient and easy to use R package. Results The WISH R package (WISH-R) was developed to calculate epistatic interactions on a genome-wide level based on genomic data. It is easy to use and install, and works on regular genomic data. The package filters data based on linkage disequilibrium and calculates epistatic interaction coefficients between SNP pairs based on a parallelized efficient linear model and generalized linear model implementations. Normalized epistatic coefficients are analyzed in a network framework, alleviating multiple testing issues and integrating biological signal to identify modules and pathways related to complex traits. Functions for visualizing results and testing runtimes are also provided. Conclusion The WISH-R package is an efficient implementation for analyzing genome-wide epistasis for complex diseases and traits. It includes methods and strategies for analyzing epistasis from initial data filtering until final data interpretation. WISH offers a new way to analyze genomic data by combining epistasis and network based analysis in one method and provides options for visualizations. This alleviates many of the existing hurdles in the analysis of genomic interactions. Electronic supplementary material The online version of this article (10.1186/s12859-018-2291-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Victor A O Carmelo
- Quantitative and Systems Genomics Group, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, Building 208, 2800, Kgs. Lyngby, Denmark.,Animal Breeding, Quantitative Genetics and Systems Biology group, Department of Large Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Lisette J A Kogelman
- Animal Breeding, Quantitative Genetics and Systems Biology group, Department of Large Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark.,Danish Headache Center, Department of Neurology, Rigshospitalet Glostrup, Nordre Ringvej 69, 2600, Glostrup, Denmark
| | - Majbritt Busk Madsen
- Institute of Biological Psychiatry, Mental Health Centre, Sct. Hans, Roskilde, Capital Region of Denmark, Denmark
| | - Haja N Kadarmideen
- Quantitative and Systems Genomics Group, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, Building 208, 2800, Kgs. Lyngby, Denmark. .,Animal Breeding, Quantitative Genetics and Systems Biology group, Department of Large Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark.
| |
Collapse
|
15
|
Chatelain C, Durand G, Thuillier V, Augé F. Performance of epistasis detection methods in semi-simulated GWAS. BMC Bioinformatics 2018; 19:231. [PMID: 29914375 PMCID: PMC6006572 DOI: 10.1186/s12859-018-2229-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Accepted: 06/04/2018] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Part of the missing heritability in Genome Wide Association Studies (GWAS) is expected to be explained by interactions between genetic variants, also called epistasis. Various statistical methods have been developed to detect epistasis in case-control GWAS. These methods face major statistical challenges due to the number of tests required, the complexity of the Linkage Disequilibrium (LD) structure, and the lack of consensus regarding the definition of epistasis. Their limited impact in terms of uncovering new biological knowledge might be explained in part by the limited amount of experimental data available to validate their statistical performances in a realistic GWAS context. In this paper, we introduce a simulation pipeline for generating real scale GWAS data, including epistasis and realistic LD structure. We evaluate five exhaustive bivariate interaction methods, fastepi, GBOOST, SHEsisEpi, DSS, and IndOR. Two hundred thirty four different disease scenarios are considered in extensive simulations. We report the performances of each method in terms of false positive rate control, power, area under the ROC curve (AUC), and computation time using a GPU. Finally we compare the result of each methods on a real GWAS of type 2 diabetes from the Welcome Trust Case Control Consortium. RESULTS GBOOST, SHEsisEpi and DSS allow a satisfactory control of the false positive rate. fastepi and IndOR present an increase in false positive rate in presence of LD between causal SNPs, with our definition of epistasis. DSS performs best in terms of power and AUC in most scenarios with no or weak LD between causal SNPs. All methods can exhaustively analyze a GWAS with 6.105 SNPs and 15,000 samples in a couple of hours using a GPU. CONCLUSION This study confirms that computation time is no longer a limiting factor for performing an exhaustive search of epistasis in large GWAS. For this task, using DSS on SNP pairs with limited LD seems to be a good strategy to achieve the best statistical performance. A combination approach using both DSS and GBOOST is supported by the simulation results and the analysis of the WTCCC dataset demonstrated that this approach can detect distinct genes in epistasis. Finally, weak epistasis between common variants will be detectable with existing methods when GWAS of a few tens of thousands cases and controls are available.
Collapse
Affiliation(s)
| | - Guillermo Durand
- Laboratoire de Probabilités et Modèles Aléatoires, Université Pierre et Marie Curie, 4, place Jussieu, Paris Cedex 05, 75252 France
| | - Vincent Thuillier
- SANOFI R&D, Biostatistics & Programming, Chilly Mazarin, 91385 France
| | - Franck Augé
- SANOFI R&D, Translational Sciences, Chilly Mazarin, 91385 France
| |
Collapse
|
16
|
Pecanka J, Jonker MA, Bochdanovits Z, Van Der Vaart AW. A powerful and efficient two-stage method for detecting gene-to-gene interactions in GWAS. Biostatistics 2018; 18:477-494. [PMID: 28334077 DOI: 10.1093/biostatistics/kxw060] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Accepted: 11/05/2016] [Indexed: 11/13/2022] Open
Abstract
For over a decade functional gene-to-gene interaction (epistasis) has been suspected to be a determinant in the "missing heritability" of complex traits. However, searching for epistasis on the genome-wide scale has been challenging due to the prohibitively large number of tests which result in a serious loss of statistical power as well as computational challenges. In this article, we propose a two-stage method applicable to existing case-control data sets, which aims to lessen both of these problems by pre-assessing whether a candidate pair of genetic loci is involved in epistasis before it is actually tested for interaction with respect to a complex phenotype. The pre-assessment is based on a two-locus genotype independence test performed in the sample of cases. Only the pairs of loci that exhibit non-equilibrium frequencies are analyzed via a logistic regression score test, thereby reducing the multiple testing burden. Since only the computationally simple independence tests are performed for all pairs of loci while the more demanding score tests are restricted to the most promising pairs, genome-wide association study (GWAS) for epistasis becomes feasible. By design our method provides strong control of the type I error. Its favourable power properties especially under the practically relevant misspecification of the interaction model are illustrated. Ready-to-use software is available. Using the method we analyzed Parkinson's disease in four cohorts and identified possible interactions within several SNP pairs in multiple cohorts.
Collapse
Affiliation(s)
- Jakub Pecanka
- Leiden University Medical Center, Department of Medical Statistics and Bioinformatics, Leiden, The Netherlands and VU University, Department of Mathematics, Amsterdam, the Netherlands
| | - Marianne A Jonker
- VU University Medical Center, Department of Epidemiology and Biostatistics, Amsterdam, The Netherlands and Radboud University medical center, Radboud Institute for Health Sciences, Nijmegen, The Netherlands
| | | | - Zoltan Bochdanovits
- VU University Medical Center, Department of Clinical Genetics, Amsterdam, The Netherlands
| | | |
Collapse
|
17
|
Hill A, Loh PR, Bharadwaj RB, Pons P, Shang J, Guinan E, Lakhani K, Kilty I, Jelinsky SA. Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis. Gigascience 2018; 6:1-10. [PMID: 28327993 PMCID: PMC5467032 DOI: 10.1093/gigascience/gix009] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 12/18/2016] [Indexed: 11/12/2022] Open
Abstract
Background: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Results: Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Conclusions: Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics.
Collapse
Affiliation(s)
- Andrew Hill
- Research Business Technology, Pfizer Research, 1 Portland Street, Cambridge, Massachusetts, 02139 USA
| | - Po-Ru Loh
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts, 02142 USA
| | - Ragu B Bharadwaj
- Babbage Analytic and Innovation, Boston Massachusetts, USA.,Current affiliation, Nyrasta LLC
| | - Pascal Pons
- Current affiliation, Criteo Labs, 32 Rue Blanche, 75009, Paris, France
| | - Jingbo Shang
- Current affiliation, Computer Science Department, University of Illinois at Urbana-Champaign 201 N Goodwin Ave, Urbana, Illinois, USA
| | - Eva Guinan
- Department of Radiation Oncology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, Massachusetts, 02215, USA.,Harvard Business School, Boston, Massachusetts, 02163 USA
| | - Karim Lakhani
- Babbage Analytic and Innovation, Boston Massachusetts, USA.,Harvard Business School, Boston, Massachusetts, 02163 USA.,Harvard-NASA Tournament Lab, Institute for Quantitative Social Science 1737 Cambridge Street, Cambridge Massachusetts, 02138, USA
| | - Iain Kilty
- Department of Inflammation and Immunology, Pfizer Research, 1 Portland Street, Cambridge, Massachusetts, 02139, USA
| | - Scott A Jelinsky
- Department of Inflammation and Immunology, Pfizer Research, 1 Portland Street, Cambridge, Massachusetts, 02139, USA
| |
Collapse
|
18
|
Gumpinger AC, Roqueiro D, Grimm DG, Borgwardt KM. Methods and Tools in Genome-wide Association Studies. Methods Mol Biol 2018; 1819:93-136. [PMID: 30421401 DOI: 10.1007/978-1-4939-8618-7_5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Many traits, such as height, the response to a given drug, or the susceptibility to certain diseases are presumably co-determined by genetics. Especially in the field of medicine, it is of major interest to identify genetic aberrations that alter an individual's risk to develop a certain phenotypic trait. Addressing this question requires the availability of comprehensive, high-quality genetic datasets. The technological advancements and the decreasing cost of genotyping in the last decade led to an increase in such datasets. Parallel to and in line with this technological progress, an analysis framework under the name of genome-wide association studies was developed to properly collect and analyze these data. Genome-wide association studies aim at finding statistical dependencies-or associations-between a trait of interest and point-mutations in the DNA. The statistical models used to detect such associations are diverse, spanning the whole range from the frequentist to the Bayesian setting.Since genetic datasets are inherently high-dimensional, the search for associations poses not only a statistical but also a computational challenge. As a result, a variety of toolboxes and software packages have been developed, each implementing different statistical methods while using various optimizations and mathematical techniques to enhance the computations.This chapter is devoted to the discussion of widely used methods and tools in genome-wide association studies. We present the different statistical models and the assumptions on which they are based, explain peculiarities of the data that have to be accounted for and, most importantly, introduce commonly used tools and software packages for the different tasks in a genome-wide association study, complemented with examples for their application.
Collapse
Affiliation(s)
- Anja C Gumpinger
- Machine Learning and Computational Biology Lab, D-BSSE, ETH Zurich, Basel, Switzerland. .,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Damian Roqueiro
- Machine Learning and Computational Biology Lab, D-BSSE, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Dominik G Grimm
- Machine Learning and Computational Biology Lab, D-BSSE, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Karsten M Borgwardt
- Machine Learning and Computational Biology Lab, D-BSSE, ETH Zurich, Basel, Switzerland. .,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
19
|
Li R, Kim D, Ritchie MD. Methods to analyze big data in pharmacogenomics research. Pharmacogenomics 2017; 18:807-820. [PMID: 28612644 DOI: 10.2217/pgs-2016-0152] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The scale and scope of pharmacogenomics research continues to expand as the cost and efficiency of molecular data generation techniques advance. These new technologies give rise to enormous opportunity for the identification of important genetic and genomic factors important for drug treatment response. With this opportunity come significant challenges. Most of these can be categorized as 'big data' issues, facing not only pharmacogenomics, but other fields in the life sciences as well. In this review, we describe some of the analysis techniques and tools being implemented for genetic/genomic discovery in pharmacogenomics.
Collapse
Affiliation(s)
- Ruowang Li
- Bioinformatics & Genomics Graduate Program, The Pennsylvania State University, University Park, PA 16802, USA
| | - Dokyoon Kim
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA 17821, USA
| | - Marylyn D Ritchie
- Bioinformatics & Genomics Graduate Program, The Pennsylvania State University, University Park, PA 16802, USA.,Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA 17821, USA
| |
Collapse
|
20
|
Kao PYP, Leung KH, Chan LWC, Yip SP, Yap MKH. Pathway analysis of complex diseases for GWAS, extending to consider rare variants, multi-omics and interactions. Biochim Biophys Acta Gen Subj 2016; 1861:335-353. [PMID: 27888147 DOI: 10.1016/j.bbagen.2016.11.030] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Revised: 10/17/2016] [Accepted: 11/19/2016] [Indexed: 12/20/2022]
Abstract
BACKGROUND Genome-wide association studies (GWAS) is a major method for studying the genetics of complex diseases. Finding all sequence variants to explain fully the aetiology of a disease is difficult because of their small effect sizes. To better explain disease mechanisms, pathway analysis is used to consolidate the effects of multiple variants, and hence increase the power of the study. While pathway analysis has previously been performed within GWAS only, it can now be extended to examining rare variants, other "-omics" and interaction data. SCOPE OF REVIEW 1. Factors to consider in the choice of software for GWAS pathway analysis. 2. Examples of how pathway analysis is used to analyse rare variants, other "-omics" and interaction data. MAJOR CONCLUSIONS To choose appropriate software tools, factors for consideration include covariate compatibility, null hypothesis, one- or two-step analysis required, curation method of gene sets, size of pathways, and size of flanking regions to define gene boundaries. For rare variants, analysis performance depends on consistency between assumed and actual effect distribution of variants. Integration of other "-omics" data and interaction can better explain gene functions. GENERAL SIGNIFICANCE Pathway analysis methods will be more readily used for integration of multiple sources of data, and enable more accurate prediction of phenotypes.
Collapse
Affiliation(s)
- Patrick Y P Kao
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Kim Hung Leung
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Lawrence W C Chan
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Shea Ping Yip
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China.
| | - Maurice K H Yap
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hong Kong SAR, China
| |
Collapse
|
21
|
Zhang F, Xie D, Liang M, Xiong M. Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits. PLoS Genet 2016; 12:e1005965. [PMID: 27104857 PMCID: PMC4841563 DOI: 10.1371/journal.pgen.1005965] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 03/08/2016] [Indexed: 12/02/2022] Open
Abstract
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes. The widely used statistical methods test interaction for single phenotype. However, we often observe pleotropic genetic interaction effects. The simultaneous gene-gene (GxG) interaction analysis of multiple complementary traits will increase statistical power to detect GxG interactions. Although GxG interactions play an important role in uncovering the genetic structure of complex traits, the statistical methods for detecting GxG interactions in multiple phenotypes remains less developed owing to its potential complexity. Therefore, we extend functional regression model from single variate to multivariate for simultaneous GxG interaction analysis of multiple correlated phenotypes. Large-scale simulations are conducted to evaluate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare power with traditional multivariate pair-wise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for interaction analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic GxG interactions. 267 pairs of genes that formed a genetic interaction network showed significant evidence of interactions influencing five traits.
Collapse
Affiliation(s)
- Futao Zhang
- Department of Computer Science, College of Internet of Things, Hohai University, Changzhou, China
| | - Dan Xie
- College of Information Engineering, Hubei University of Chinese Medicine, Hubei, China
| | - Meimei Liang
- Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Momiao Xiong
- Human Genetics Center, Division of Biostatistics, The University of Texas School of Public Health, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
22
|
Shen J, Li Z, Chen J, Song Z, Zhou Z, Shi Y. SHEsisPlus, a toolset for genetic studies on polyploid species. Sci Rep 2016; 6:24095. [PMID: 27048905 PMCID: PMC4822172 DOI: 10.1038/srep24095] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 03/17/2016] [Indexed: 11/09/2022] Open
Abstract
Currently, algorithms and softwares for genetic analysis of diploid organisms with bi-allelic markers are well-established, while those for polyploids are limited. Here, we present SHEsisPlus, the online algorithm toolset for both dichotomous and quantitative trait genetic analysis on polyploid species (compatible with haploids and diploids, too). SHEsisPlus is also optimized for handling multiple-allele datasets. It's free, open source and also designed to perform a range of analyses, including haplotype inference, linkage disequilibrium analysis, epistasis detection, Hardy-Weinberg equilibrium and single locus association tests. Meanwhile, we developed an accurate and efficient haplotype inference algorithm for polyploids and proposed an entropy-based algorithm to detect epistasis in the context of quantitative traits. A study of both simulated and real datasets showed that our haplotype inference algorithm was much faster and more accurate than existing ones. Our epistasis detection algorithm was the first try to apply information theory to characterizing the gene interactions in quantitative trait datasets. Results showed that its statistical power was significantly higher than conventional approaches. SHEsisPlus is freely available on the web at http://shesisplus.bio-x.cn/. Source code is freely available for download at https://github.com/celaoforever/SHEsisPlus.
Collapse
Affiliation(s)
- Jiawei Shen
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,School of Bio-medical Engineering, Shanghai Jiao Tong University, Shanghai 200230, P.R. China.,Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Zhiqiang Li
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Jianhua Chen
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Zhijian Song
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Zhaowei Zhou
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,Shandong Provincial Key Laboratory of Metabolic Disease, the Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao 266003, China.,Institute of Clinical Research, the Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao 266003, China
| | - Yongyong Shi
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,School of Bio-medical Engineering, Shanghai Jiao Tong University, Shanghai 200230, P.R. China.,Shanghai Changning Mental Health Center, Shanghai 200042, P.R. China.,Department of Psychiatry, the First Teaching Hospital of Xinjiang Medical University, Urumqi 830054, P.R. China
| |
Collapse
|
23
|
FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm. PLoS One 2016; 11:e0150669. [PMID: 27014873 PMCID: PMC4807955 DOI: 10.1371/journal.pone.0150669] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 02/16/2016] [Indexed: 12/24/2022] Open
Abstract
Motivation Two-locus model is a typical significant disease model to be identified in genome-wide association study (GWAS). Due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models. Method In this study, two scoring functions (Bayesian network based K2-score and Gini-score) are used for characterizing two SNP locus as a candidate model, the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models. Harmony search algorithm (HSA) is improved for quickly finding the most likely candidate models among all two-locus models, in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect. Finally G-test statistic is used to further test the candidate models. Results We investigate our method named FHSA-SED on 82 simulated datasets and a real AMD dataset, and compare it with two typical methods (MACOED and CSE) which have been developed recently based on swarm intelligent search algorithm. The results of simulation experiments indicate that our method outperforms the two compared algorithms in terms of detection power, computation time, evaluation times, sensitivity (TPR), specificity (SPC), positive predictive value (PPV) and accuracy (ACC). Our method has identified two SNPs (rs3775652 and rs10511467) that may be also associated with disease in AMD dataset.
Collapse
|
24
|
Abstract
In the single locus strategy a number of genetic variants are analyzed, in order to find variants that are distributed significantly different between controls and patients. A supplementary strategy is to analyze combinations of genetic variants. A combination that is the genetic basis for a polygenic disorder will not occur in in control persons genetically unrelated to patients, so the strategy is to analyze combinations of genetic variants present exclusively in patients. In a previous study of oral cancer and leukoplakia 325 SNPs were analyzed. This study has been supplemented with an analysis of combinations of two SNP genotypes from among the 325 SNPs. Two clusters of combinations containing 95 patient specific combinations were significantly associated with oral cancer or leukoplakia. Of 373 patients with oral cancer 205 patients had a number of these 95 combinations in their genome, whereas none of 535 control persons had any of these combinations in their genome.
Collapse
Affiliation(s)
- Erling Mellerup
- Laboratory of Neuropsychiatry, Department of Neuroscience and Pharmacology, Faculty of Health, University of Copenhagen, Denmark
| | - Gert Lykke Moeller
- Genokey ApS, ScionDTU, Technical University of Denmark, Hoersholm, Denmark
| | | | - Susanta Roychoudhury
- Cancer Biology and Inflammatory Disorder Division, CSIR-Indian Institute of Chemical Biology, Kolkata, India
| |
Collapse
|
25
|
Lishout FV, Gadaleta F, Moore JH, Wehenkel L, Steen KV. gammaMAXT: a fast multiple-testing correction algorithm. BioData Min 2015; 8:36. [PMID: 26594243 PMCID: PMC4654922 DOI: 10.1186/s13040-015-0069-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2015] [Accepted: 11/08/2015] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The purpose of the MaxT algorithm is to provide a significance test algorithm that controls the family-wise error rate (FWER) during simultaneous hypothesis testing. However, the requirements in terms of computing time and memory of this procedure are proportional to the number of investigated hypotheses. The memory issue has been solved in 2013 by Van Lishout's implementation of MaxT, which makes the memory usage independent from the size of the dataset. This algorithm is implemented in MBMDR-3.0.3, a software that is able to identify genetic interactions, for a variety of SNP-SNP based epistasis models effectively. On the other hand, that implementation turned out to be less suitable for genome-wide interaction analysis studies, due to the prohibitive computational burden. RESULTS In this work we introduce gammaMAXT, a novel implementation of the maxT algorithm for multiple testing correction. The algorithm was implemented in software MBMDR-4.2.2, as part of the MB-MDR framework to screen for SNP-SNP, SNP-environment or SNP-SNP-environment interactions at a genome-wide level. We show that, in the absence of interaction effects, test-statistics produced by the MB-MDR methodology follow a mixture distribution with a point mass at zero and a shifted gamma distribution for the top 10 % of the strictly positive values. We show that the gammaMAXT algorithm has a power comparable to MaxT and maintains FWER, but requires less computational resources and time. We analyze a dataset composed of 10(6) SNPs and 1000 individuals within one day on a 256-core computer cluster. The same analysis would take about 10(4) times longer with MBMDR-3.0.3. CONCLUSIONS These results are promising for future GWAIs. However, the proposed gammaMAXT algorithm offers a general significance assessment and multiple testing approach, applicable to any context that requires performing hundreds of thousands of tests. It offers new perspectives for fast and efficient permutation-based significance assessment in large-scale (integrated) omics studies.
Collapse
Affiliation(s)
- François Van Lishout
- Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium
| | - Francesco Gadaleta
- Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium
| | - Jason H Moore
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, 19104-6021 PA USA
| | - Louis Wehenkel
- Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium
| | - Kristel Van Steen
- Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium
| |
Collapse
|
26
|
Mellerup E, Andreassen OA, Bennike B, Dam H, Djurovic S, Hansen T, Jorgensen MB, Kessing LV, Koefoed P, Melle I, Mors O, Werge T, Moeller GL. Combinations of Genetic Data Present in Bipolar Patients, but Absent in Control Persons. PLoS One 2015; 10:e0143432. [PMID: 26587987 PMCID: PMC4654514 DOI: 10.1371/journal.pone.0143432] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 11/04/2015] [Indexed: 11/19/2022] Open
Abstract
The main objective of the study was to find combinations of genetic variants significantly associated with bipolar disorder. In a previous study of bipolar disorder, combinations of three single nucleotide polymorphism (SNP) genotypes taken from 803 SNPs were analyzed, and four clusters of combinations were found to be significantly associated with bipolar disorder. In the present study, combinations of four SNP genotypes taken from the same 803 SNPs were analyzed, and one cluster of combinations was found to be significantly associated with bipolar disorder. Combinations from the new cluster and from the four previous clusters were identified in the genomes of 209 of the 607 patients in the study whereas none of the 1355 control participants had any of these combinations in their genome.
Collapse
Affiliation(s)
- Erling Mellerup
- Laboratory of Neuropsychiatry, Department of Neuroscience and Pharmacology, University of Copenhagen, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
- * E-mail:
| | - Ole A. Andreassen
- Department of Psychiatry, Oslo University Hospital and Institute of Psychiatry, University of Oslo, Kirkeveien 166. 0407 Oslo, Norway
| | - Bente Bennike
- Laboratory of Neuropsychiatry, Department of Neuroscience and Pharmacology, University of Copenhagen, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
| | - Henrik Dam
- Psychiatric Centre Copenhagen, Department O, Copenhagen University Hospital, Rigshospitalet, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
| | - Srdjan Djurovic
- Department of Medical Genetics, Oslo University Hospital and Institute of Psychiatry, University of Oslo, Kirkeveien 166. 0407 Oslo, Norway
| | - Thomas Hansen
- Department of Biological Psychiatry, Mental Health Centre Sct. Hans, Copenhagen University Hospital, Boserupvej 2, DK-4000 Roskilde, Denmark
| | - Martin Balslev Jorgensen
- Psychiatric Centre Copenhagen, Department O, Copenhagen University Hospital, Rigshospitalet, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
| | - Lars Vedel Kessing
- Psychiatric Centre Copenhagen, Department O, Copenhagen University Hospital, Rigshospitalet, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
| | - Pernille Koefoed
- Laboratory of Neuropsychiatry, Department of Neuroscience and Pharmacology, University of Copenhagen, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
- Psychiatric Centre Copenhagen, Department O, Copenhagen University Hospital, Rigshospitalet, Blegdamsvej 9 O-6102, DK-2100 Copenhagen, Denmark
| | - Ingrid Melle
- Department of Psychiatry, Oslo University Hospital and Institute of Psychiatry, University of Oslo, Kirkeveien 166. 0407 Oslo, Norway
| | - Ole Mors
- Centre for Psyciatric Research, Aarhus University Hospital, Skovagervej 2, DK-8240 Risskov, Denmark
| | - Thomas Werge
- Department of Biological Psychiatry, Mental Health Centre Sct. Hans, Copenhagen University Hospital, Boserupvej 2, DK-4000 Roskilde, Denmark
| | - Gert Lykke Moeller
- Genokey ApS, ScionDTU, Technical University Denmark, Agern Allé 3, DK-2970 Hoersholm, Denmark
| |
Collapse
|
27
|
Software for detecting gene-gene interactions in genome wide association studies. BIOTECHNOL BIOPROC E 2015. [DOI: 10.1007/s12257-015-0064-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
28
|
Al-jouie A, Esfandiari M, Ramakrishnan S, Roshan U. Chi8: a GPU program for detecting significant interacting SNPs with the Chi-square 8-df test. BMC Res Notes 2015; 8:436. [PMID: 26369336 PMCID: PMC4568583 DOI: 10.1186/s13104-015-1392-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2014] [Accepted: 08/24/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Determining interacting SNPs in genome-wide association studies is computationally expensive yet of considerable interest in genomics. FINDINGS We present a program Chi8 that calculates the Chi-square 8 degree of freedom test between all pairs of SNPs in a brute force manner on a Graphics Processing Unit. We analyze each of the seven WTCCC genome-wide association studies that have about 5000 total case and controls and 400,000 SNPs in an average of 9.6 h on a single GPU. We also study the power, false positives, and area under curve of our program on simulated data and provide a comparison to the GBOOST program. Our program source code is freely available from http://www.cs.njit.edu/usman/Chi8.
Collapse
Affiliation(s)
- Abdulrhman Al-jouie
- King Abdullah Medical Research Center, King Saud Bin Abdulaziz University for Health Sciences, P.O. Box 22490, Riyadh, 11426, Saudi Arabia. .,Department of Computer Science, New Jersey Institute of Technology, GITC 4400, University Heights, Newark, NJ, 07102, USA.
| | - Mohammadreza Esfandiari
- Department of Computer Science, New Jersey Institute of Technology, GITC 4400, University Heights, Newark, NJ, 07102, USA.
| | | | - Usman Roshan
- Department of Computer Science, New Jersey Institute of Technology, GITC 4400, University Heights, Newark, NJ, 07102, USA.
| |
Collapse
|
29
|
Upton A, Trelles O, Cornejo-García JA, Perkins JR. Review: High-performance computing to detect epistasis in genome scale data sets. Brief Bioinform 2015; 17:368-79. [PMID: 26272945 DOI: 10.1093/bib/bbv058] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Indexed: 11/14/2022] Open
Abstract
It is becoming clear that most human diseases have a complex etiology that cannot be explained by single nucleotide polymorphisms (SNPs) or simple additive combinations; the general consensus is that they are caused by combinations of multiple genetic variations. The limited success of some genome-wide association studies is partly a result of this focus on single genetic markers. A more promising approach is to take into account epistasis, by considering the association of multiple SNP interactions with disease. However, as genomic data continues to grow in resolution, and genome and exome sequencing become more established, the number of combinations of variants to consider increases rapidly. Two potential solutions should be considered: the use of high-performance computing, which allows us to consider a larger number of variables, and heuristics to make the solution more tractable, essential in the case of genome sequencing. In this review, we look at different computational methods to analyse epistatic interactions within disease-related genetic data sets created by microarray technology. We also review efforts to use epistatic analysis results to produce biomarkers for diagnostic tests and give our views on future directions in this field in light of advances in sequencing technology and variants in non-coding regions.
Collapse
|
30
|
Zhang FT, Zhu ZH, Tong XR, Zhu ZX, Qi T, Zhu J. Mixed Linear Model Approaches of Association Mapping for Complex Traits Based on Omics Variants. Sci Rep 2015. [PMID: 26223539 PMCID: PMC5155518 DOI: 10.1038/srep10298] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Precise prediction for genetic architecture of complex traits is impeded by the limited understanding on genetic effects of complex traits, especially on gene-by-gene (GxG) and gene-by-environment (GxE) interaction. In the past decades, an explosion of high throughput technologies enables omics studies at multiple levels (such as genomics, transcriptomics, proteomics, and metabolomics). The analyses of large omics data, especially two-loci interaction analysis, are very time intensive. Integrating the diverse omics data and environmental effects in the analyses also remain challenges. We proposed mixed linear model approaches using GPU (Graphic Processing Unit) computation to simultaneously dissect various genetic effects. Analyses can be performed for estimating genetic main effects, GxG epistasis effects, and GxE environment interaction effects on large-scale omics data for complex traits, and for estimating heritability of specific genetic effects. Both mouse data analyses and Monte Carlo simulations demonstrated that genetic effects and environment interaction effects could be unbiasedly estimated with high statistical power by using the proposed approaches.
Collapse
Affiliation(s)
- Fu-Tao Zhang
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Zhi-Hong Zhu
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Xiao-Ran Tong
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Zhi-Xiang Zhu
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Ting Qi
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Jun Zhu
- Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| |
Collapse
|
31
|
Brenndörfer J, Altmann A, Widner-Andrä R, Pütz B, Czamara D, Tilch E, Kam-Thong T, Weber P, Rex-Haffner M, Bettecken T, Bultmann A, Müller-Myhsok B, Binder EE, Landgraf R, Czibere L. Connecting Anxiety and Genomic Copy Number Variation: A Genome-Wide Analysis in CD-1 Mice. PLoS One 2015; 10:e0128465. [PMID: 26011321 PMCID: PMC4444327 DOI: 10.1371/journal.pone.0128465] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 04/27/2015] [Indexed: 12/05/2022] Open
Abstract
Genomic copy number variants (CNVs) have been implicated in multiple psychiatric disorders, but not much is known about their influence on anxiety disorders specifically. Using next-generation sequencing (NGS) and two additional array-based genotyping approaches, we detected CNVs in a mouse model consisting of two inbred mouse lines showing high (HAB) and low (LAB) anxiety-related behavior, respectively. An influence of CNVs on gene expression in the central (CeA) and basolateral (BLA) amygdala, paraventricular nucleus (PVN), and cingulate cortex (Cg) was shown by a two-proportion Z-test (p = 1.6 x 10-31), with a positive correlation in the CeA (p = 0.0062), PVN (p = 0.0046) and Cg (p = 0.0114), indicating a contribution of CNVs to the genetic predisposition to trait anxiety in the specific context of HAB/LAB mice. In order to confirm anxiety-relevant CNVs and corresponding genes in a second mouse model, we further examined CD-1 outbred mice. We revealed the distribution of CNVs by genotyping 64 CD 1 individuals using a high-density genotyping array (Jackson Laboratory). 78 genes within those CNVs were identified to show nominally significant association (48 genes), or a statistical trend in their association (30 genes) with the time animals spent on the open arms of the elevated plus-maze (EPM). Fifteen of them were considered promising candidate genes of anxiety-related behavior as we could show a significant overlap (permutation test, p = 0.0051) with genes within HAB/LAB CNVs. Thus, here we provide what is to our knowledge the first extensive catalogue of CNVs in CD-1 mice and potential corresponding candidate genes linked to anxiety-related behavior in mice.
Collapse
Affiliation(s)
- Julia Brenndörfer
- Department of Behavioral Neuroendocrinology, Max Planck Institute of Psychiatry, Munich, Germany
- * E-mail:
| | - André Altmann
- Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany
| | - Regina Widner-Andrä
- Department of Behavioral Neuroendocrinology, Max Planck Institute of Psychiatry, Munich, Germany
| | - Benno Pütz
- Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany
| | - Darina Czamara
- Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany
| | - Erik Tilch
- Institute of Human Genetics, Helmholtz Zentrum München, Munich, Germany
- Institute of Human Genetics, Technische Universität München, Munich, Germany
| | - Tony Kam-Thong
- Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany
| | - Peter Weber
- Department of Molecular Genetics of Affective Disorders, Max Planck Institute of Psychiatry, Munich, Germany
| | - Monika Rex-Haffner
- Department of Molecular Genetics of Affective Disorders, Max Planck Institute of Psychiatry, Munich, Germany
| | - Thomas Bettecken
- Department of Behavioral Neuroendocrinology, Max Planck Institute of Psychiatry, Munich, Germany
| | - Andrea Bultmann
- Department of Behavioral Neuroendocrinology, Max Planck Institute of Psychiatry, Munich, Germany
| | - Bertram Müller-Myhsok
- Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany
| | - Elisabeth E. Binder
- Department of Molecular Genetics of Affective Disorders, Max Planck Institute of Psychiatry, Munich, Germany
| | - Rainer Landgraf
- Department of Behavioral Neuroendocrinology, Max Planck Institute of Psychiatry, Munich, Germany
| | - Ludwig Czibere
- Department of Behavioral Neuroendocrinology, Max Planck Institute of Psychiatry, Munich, Germany
| |
Collapse
|
32
|
Brossard M, Fang S, Vaysse A, Wei Q, Chen WV, Mohamdi H, Maubec E, Lavielle N, Galan P, Lathrop M, Avril MF, Lee JE, Amos CI, Demenais F. Integrated pathway and epistasis analysis reveals interactive effect of genetic variants at TERF1 and AFAP1L2 loci on melanoma risk. Int J Cancer 2015; 137:1901-1909. [PMID: 25892537 DOI: 10.1002/ijc.29570] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 03/12/2015] [Accepted: 03/30/2015] [Indexed: 12/18/2022]
Abstract
Genome-wide association studies (GWASs) have characterized 13 loci associated with melanoma, which only account for a small part of melanoma risk. To identify new genes with too small an effect to be detected individually but which collectively influence melanoma risk and/or show interactive effects, we used a two-step analysis strategy including pathway analysis of genome-wide SNP data, in a first step, and epistasis analysis within significant pathways, in a second step. Pathway analysis, using the gene-set enrichment analysis (GSEA) approach and the gene ontology (GO) database, was applied to the outcomes of MELARISK (3,976 subjects) and MDACC (2,827 subjects) GWASs. Cross-gene SNP-SNP interaction analysis within melanoma-associated GOs was performed using the INTERSNP software. Five GO categories were significantly enriched in genes associated with melanoma (false discovery rate ≤ 5% in both studies): response to light stimulus, regulation of mitotic cell cycle, induction of programmed cell death, cytokine activity and oxidative phosphorylation. Epistasis analysis, within each of the five significant GOs, showed significant evidence for interaction for one SNP pair at TERF1 and AFAP1L2 loci (pmeta-int = 2.0 × 10(-7) , which met both the pathway and overall multiple-testing corrected thresholds that are equal to 9.8 × 10(-7) and 2.0 × 10(-7) , respectively) and suggestive evidence for another pair involving correlated SNPs at the same loci (pmeta-int = 3.6 × 10(-6) ). This interaction has important biological relevance given the key role of TERF1 in telomere biology and the reported physical interaction between TERF1 and AFAP1L2 proteins. This finding brings a novel piece of evidence for the emerging role of telomere dysfunction into melanoma development.
Collapse
Affiliation(s)
- Myriam Brossard
- INSERM, Genetic Variation and Human Diseases Unit, UMR-946, Paris, France.,Université Paris Diderot, Sorbonne Paris Cité, Institut Universitaire d'Hématologie, Paris, France
| | - Shenying Fang
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Amaury Vaysse
- INSERM, Genetic Variation and Human Diseases Unit, UMR-946, Paris, France.,Université Paris Diderot, Sorbonne Paris Cité, Institut Universitaire d'Hématologie, Paris, France
| | - Qingyi Wei
- Duke Cancer Institute, Duke University Medical center and Department of Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Wei V Chen
- Laboratory Informatics System, Department of Clinical Applications & Support, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| | - Hamida Mohamdi
- INSERM, Genetic Variation and Human Diseases Unit, UMR-946, Paris, France.,Université Paris Diderot, Sorbonne Paris Cité, Institut Universitaire d'Hématologie, Paris, France
| | - Eve Maubec
- INSERM, Genetic Variation and Human Diseases Unit, UMR-946, Paris, France.,Université Paris Diderot, Sorbonne Paris Cité, Institut Universitaire d'Hématologie, Paris, France.,AP-HP (Assistance Publique-Hôpitaux de Paris), Hôpital Bichat, Service de Dermatologie, Université Paris Diderot, Paris, France
| | - Nolwenn Lavielle
- INSERM, Genetic Variation and Human Diseases Unit, UMR-946, Paris, France.,Université Paris Diderot, Sorbonne Paris Cité, Institut Universitaire d'Hématologie, Paris, France
| | - Pilar Galan
- INSERM, UMR U557; Institut national de la Recherche Agronomique,U1125; Conservatoire national des arts et métiers, Centre de Recherche en Nutrition Humaine, Ile de France, Bobigny, France
| | - Mark Lathrop
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | | | - Jeffrey E Lee
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Christopher I Amos
- Department of Community and Family Medicine, Geisel College of Medicine, Dartmouth College, Hanover, New Hampshire, USA
| | - Florence Demenais
- INSERM, Genetic Variation and Human Diseases Unit, UMR-946, Paris, France.,Université Paris Diderot, Sorbonne Paris Cité, Institut Universitaire d'Hématologie, Paris, France
| |
Collapse
|
33
|
Daya M, van der Merwe L, van Helden PD, Möller M, Hoal EG. Investigating the Role of Gene-Gene Interactions in TB Susceptibility. PLoS One 2015; 10:e0123970. [PMID: 25919455 PMCID: PMC4412713 DOI: 10.1371/journal.pone.0123970] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2014] [Accepted: 02/24/2015] [Indexed: 11/22/2022] Open
Abstract
Tuberculosis (TB) is the second leading cause of mortality from infectious disease worldwide. One of the factors involved in developing disease is the genetics of the host, yet the field of TB susceptibility genetics has not yielded the answers that were expected. A commonly posited explanation for the missing heritability of complex disease is gene-gene interactions, also referred to as epistasis. In this study we investigate the role of gene-gene interactions in genetic susceptibility to TB using a cohort recruited from a high TB incidence community from Cape Town, South Africa. Our discovery data set incorporates genotypes from a large a number of candidate gene studies as well as genome-wide data. After limiting our search space to pairs of putative TB susceptibility genes, as well as pairs of genes that have been curated in online databases as potential interactors, we use statistical modelling to identify pairs of interacting SNPs. We attempt to validate the top models identified in our discovery data set using an independent genome-wide TB case-control data set from The Gambia. A number of models were successfully validated, indicating that interplay between the NRG1 - NRG3, GRIK1 - GRIK3 and IL23R - ATG4C gene pairs may modify susceptibility to TB. Gene pairs involved in the NF-κB pathway were also identified in the discovery data set (SFTPD - NOD2, ISG15 - TLR8 and NLRC5 - IL12RB1), but could not be tested in the Gambian study group due to lack of overlapping data.
Collapse
Affiliation(s)
- Michelle Daya
- SA MRC Centre for TB Research, DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Lize van der Merwe
- SA MRC Centre for TB Research, DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Paul D. van Helden
- SA MRC Centre for TB Research, DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Marlo Möller
- SA MRC Centre for TB Research, DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Eileen G. Hoal
- SA MRC Centre for TB Research, DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| |
Collapse
|
34
|
Babron MC, Etcheto A, Dizier MH. A New Correction for Multiple Testing in Gene-Gene Interaction Studies. Ann Hum Genet 2015; 79:380-384. [PMID: 25912889 DOI: 10.1111/ahg.12113] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2014] [Accepted: 02/26/2015] [Indexed: 11/30/2022]
Abstract
A major problem in gene-gene interaction studies in large marker panels is how to correct for multiple testing while accounting for the dependence between marker pairs due to the presence of linkage disequilibrium. The "gold standard" approach is to perform permutations of case/control labels. However, this is often not feasible in practice, due to computational demands. Here, we propose a correction based on the effective number of independent tests of interaction between marker pairs. This number depends on the effective number of independent single-marker tests. We tested its validity using simulated samples, as well as that of another correction of marker pair tests. We showed that our approach was valid while the other correction strongly underestimated the effective number of independent tests. Our method provides estimates of the effective number of independent tests close to those reported in the literature for a Genome-Wide Interaction Study on a 550K chip. Our correction method is quick and simple, and can be applied whatever the marker panel and the underlying linkage disequilibrium pattern.
Collapse
Affiliation(s)
- Marie-Claude Babron
- Inserm, UMR946, Genetic variation and Human diseases, F-75010, Paris, France.,Université Paris-Diderot, Sorbonne Paris-Cité, UMR946, F-75010, Paris, France
| | - Adrien Etcheto
- Inserm, UMR946, Genetic variation and Human diseases, F-75010, Paris, France.,Université de Nantes, Nantes, F-44000, France
| | - Marie-Helene Dizier
- Inserm, UMR946, Genetic variation and Human diseases, F-75010, Paris, France.,Université Paris-Diderot, Sorbonne Paris-Cité, UMR946, F-75010, Paris, France
| |
Collapse
|
35
|
Grange L, Bureau JF, Nikolayeva I, Paul R, Van Steen K, Schwikowski B, Sakuntabhai A. Filter-free exhaustive odds ratio-based genome-wide interaction approach pinpoints evidence for interaction in the HLA region in psoriasis. BMC Genet 2015; 16:11. [PMID: 25655172 PMCID: PMC4341885 DOI: 10.1186/s12863-015-0174-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 01/23/2015] [Indexed: 12/02/2022] Open
Abstract
Background Deciphering the genetic architecture of complex traits is still a major challenge for human genetics. In most cases, genome-wide association studies have only partially explained the heritability of traits and diseases. Epistasis, one potentially important cause of this missing heritability, is difficult to explore at the genome-wide level. Here, we develop and assess a tool based on interactive odds ratios (IOR), Fast Odds Ratio-based sCan for Epistasis (FORCE), as a novel approach for exhaustive genome-wide epistasis search. IOR is the ratio between the multiplicative term of the odds ratio (OR) of having each variant over the OR of having both of them. By definition, an IOR that significantly deviates from 1 suggests the occurrence of an interaction (epistasis). As the IOR is fast to calculate, we used the IOR to rank and select pairs of interacting polymorphisms for P value estimation, which is more time consuming. Results FORCE displayed power and accuracy similar to existing parametric and non-parametric methods, and is fast enough to complete a filter-free genome-wide epistasis search in a few days on a standard computer. Analysis of psoriasis data uncovered novel epistatic interactions in the HLA region, corroborating the known major and complex role of the HLA region in psoriasis susceptibility. Conclusions Our systematic study revealed the ability of FORCE to uncover novel interactions, highlighted the importance of exhaustiveness, as well as its specificity for certain types of interactions that were not detected by existing approaches. We therefore believe that FORCE is a valuable new tool for decoding the genetic basis of complex diseases. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0174-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laura Grange
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France. .,Université Paris Diderot, Paris, 75013, France.
| | - Jean-François Bureau
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| | - Iryna Nikolayeva
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,Department of Genomes and Genetics, Institut Pasteur, Systems Biology Lab, Paris, 75015, France. .,Université Paris-Descartes, Sorbonne Paris Cité, Paris, France.
| | - Richard Paul
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| | - Kristel Van Steen
- Systems and Modeling Unit, Montefiore institute, University of Liège, Liège, Belgium. .,Bioinformatics and Modeling, GiGA-R, University of Liège, Liège, Belgium.
| | - Benno Schwikowski
- Department of Genomes and Genetics, Institut Pasteur, Systems Biology Lab, Paris, 75015, France.
| | - Anavaj Sakuntabhai
- Department of Genomes and Genetics, Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris, 75015, France. .,CNRS URA3012, Paris, 75015, France.
| |
Collapse
|
36
|
Hibar DP, Stein JL, Jahanshad N, Kohannim O, Hua X, Toga AW, McMahon KL, de Zubicaray GI, Martin NG, Wright MJ, Weiner MW, Thompson PM. Genome-wide interaction analysis reveals replicated epistatic effects on brain structure. Neurobiol Aging 2015; 36 Suppl 1:S151-8. [PMID: 25264344 PMCID: PMC4332874 DOI: 10.1016/j.neurobiolaging.2014.02.033] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Revised: 02/10/2014] [Accepted: 02/16/2014] [Indexed: 11/24/2022]
Abstract
The discovery of several genes that affect the risk for Alzheimer's disease ignited a worldwide search for single-nucleotide polymorphisms (SNPs), common genetic variants that affect the brain. Genome-wide search of all possible SNP-SNP interactions is challenging and rarely attempted because of the complexity of conducting approximately 10(11) pairwise statistical tests. However, recent advances in machine learning, for example, iterative sure independence screening, make it possible to analyze data sets with vastly more predictors than observations. Using an implementation of the sure independence screening algorithm (called EPISIS), we performed a genome-wide interaction analysis testing all possible SNP-SNP interactions affecting regional brain volumes measured on magnetic resonance imaging and mapped using tensor-based morphometry. We identified a significant SNP-SNP interaction between rs1345203 and rs1213205 that explains 1.9% of the variance in temporal lobe volume. We mapped the whole brain, voxelwise effects of the interaction in the Alzheimer's Disease Neuroimaging Initiative data set and separately in an independent replication data set of healthy twins (Queensland Twin Imaging). Each additional loading in the interaction effect was associated with approximately 5% greater brain regional brain volume (a protective effect) in both Alzheimer's Disease Neuroimaging Initiative and Queensland Twin Imaging samples.
Collapse
Affiliation(s)
- Derrek P Hibar
- Imaging Genetics Center, Institute for Neuroimaging and Informatics, University of Southern California, Los Angeles, CA, USA
| | - Jason L Stein
- Imaging Genetics Center, Institute for Neuroimaging and Informatics, University of Southern California, Los Angeles, CA, USA
| | - Neda Jahanshad
- Imaging Genetics Center, Institute for Neuroimaging and Informatics, University of Southern California, Los Angeles, CA, USA
| | - Omid Kohannim
- Imaging Genetics Center, Institute for Neuroimaging and Informatics, University of Southern California, Los Angeles, CA, USA
| | - Xue Hua
- Imaging Genetics Center, Institute for Neuroimaging and Informatics, University of Southern California, Los Angeles, CA, USA
| | - Arthur W Toga
- Imaging Genetics Center, Institute for Neuroimaging and Informatics, University of Southern California, Los Angeles, CA, USA
| | - Katie L McMahon
- Centre for Magnetic Resonance, School of Psychology, University of Queensland, Brisbane, Queensland, Australia
| | - Greig I de Zubicaray
- Functional Magnetic Resonance Imaging Laboratory, School of Psychology, University of Queensland, Brisbane, Queensland, Australia
| | - Nicholas G Martin
- Genetic Epidemiology Laboratory, Queensland Institute of Medical Research, Brisbane, Australia
| | - Margaret J Wright
- Genetic Epidemiology Laboratory, Queensland Institute of Medical Research, Brisbane, Australia
| | - Michael W Weiner
- Department of Radiology, UC San Francisco, San Francisco, CA, USA; Department of Medicine, UC San Francisco, San Francisco, CA, USA; Department of Psychiatry, UC San Francisco, San Francisco, CA, USA; Department of Veterans Affairs Medical Center, San Francisco, CA, USA
| | - Paul M Thompson
- Imaging Genetics Center, Institute for Neuroimaging and Informatics, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
37
|
Levin L, Mishmar D. A Genetic View of the Mitochondrial Role in Ageing: Killing Us Softly. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2015; 847:89-106. [DOI: 10.1007/978-1-4939-2404-2_4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
38
|
Abstract
Genome-wide association studies (GWASs) have become the focus of the statistical analysis of complex traits in humans, successfully shedding light on several aspects of genetic architecture and biological aetiology. Single-nucleotide polymorphisms (SNPs) are usually modelled as having additive, cumulative and independent effects on the phenotype. Although evidently a useful approach, it is often argued that this is not a realistic biological model and that epistasis (that is, the statistical interaction between SNPs) should be included. The purpose of this Review is to summarize recent directions in methodology for detecting epistasis and to discuss evidence of the role of epistasis in human complex trait variation. We also discuss the relevance of epistasis in the context of GWASs and potential hazards in the interpretation of statistical interaction terms.
Collapse
|
39
|
Gusareva ES, Van Steen K. Practical aspects of genome-wide association interaction analysis. Hum Genet 2014; 133:1343-58. [DOI: 10.1007/s00439-014-1480-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 08/18/2014] [Indexed: 12/31/2022]
|
40
|
Ueki M. On the choice of degrees of freedom for testing gene-gene interactions. Stat Med 2014; 33:4934-48. [DOI: 10.1002/sim.6264] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Revised: 06/13/2014] [Accepted: 06/20/2014] [Indexed: 12/24/2022]
Affiliation(s)
- Masao Ueki
- Tohoku Medical Megabank Organization; Tohoku University, Graduate School of Medicine; 2-1 Seiryo-machi, Aoba-ku Sendai 980-8573 Japan
| |
Collapse
|
41
|
Sluga D, Curk T, Zupan B, Lotric U. Heterogeneous computing architecture for fast detection of SNP-SNP interactions. BMC Bioinformatics 2014; 15:216. [PMID: 24964802 PMCID: PMC4230497 DOI: 10.1186/1471-2105-15-216] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2013] [Accepted: 06/19/2014] [Indexed: 12/04/2022] Open
Abstract
Background The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. Results We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. Conclusions General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.
Collapse
Affiliation(s)
| | | | | | - Uros Lotric
- Faculty of Computer and Information Science, University of Ljubljana, Trzaska 25, SI 1000 Ljubljana, SI, Slovenia.
| |
Collapse
|
42
|
Zhang Q, Long Q, Ott J. AprioriGWAS, a new pattern mining strategy for detecting genetic variants associated with disease through interaction effects. PLoS Comput Biol 2014; 10:e1003627. [PMID: 24901472 PMCID: PMC4046917 DOI: 10.1371/journal.pcbi.1003627] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 04/01/2014] [Indexed: 12/11/2022] Open
Abstract
Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1) how to smartly identify combinations of variants that may be associated with the trait from astronomical number of all possible combinations; and (2) how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1) Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM) in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2) To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD) data, we found that: (1) angiopoietin 1 (ANGPT1) and four retinal genes interact with Complement Factor H (CFH). (2) GO term “glycosaminoglycan biosynthetic process” was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs) and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene). AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple evidences. Genes do not operate in vacuum. They interact with each other in many ways. Therefore, to figure out genetic causes of disease by case-control association studies, it is important to take interactions into account. There are two fundamental challenges in interaction-focused analysis. The first is the number of possible combinations of genetic variants easily goes to astronomic which is beyond current computational facility, which is referred as “the curse of dimensionality” in field of computer science. The other is, even if all potential combinations could be exhaustively checked, genuine signals are likely to be buried by false positives that are composed of single variant with large main effect and some other irrelevant variant. In this work, we propose AprioriGWAS that employees Apriori, an algorithm that pioneers the branch of “Frequent Itemset Mining” in computer science to cope with daunting numbers of combinations, and conditional permutation, to enable real signals standing out. By applying AprioriGWAS to age-related macular degeneration (AMD) data and bipolar disorder (BD) in WTCCC data, we found interesting interactions between sensible genes in terms of disease. Consequently, AprioriGWAS could be a good tool to find epistasis interaction from GWA data.
Collapse
Affiliation(s)
- Qingrun Zhang
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multi-scale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- * E-mail: , (QZ); (QL)
| | - Quan Long
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multi-scale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- * E-mail: , (QZ); (QL)
| | - Jurg Ott
- Institute of Psychology, Chinese Academy of Sciences, Chaoyang District, Beijing, PR China
- Laboratory of Statistical Genetics, The Rockefeller University, New York, New York, United States of America
| |
Collapse
|
43
|
Sun X, Lu Q, Mukherjee S, Crane PK, Elston R, Ritchie MD. Analysis pipeline for the epistasis search - statistical versus biological filtering. Front Genet 2014; 5:106. [PMID: 24817878 PMCID: PMC4012196 DOI: 10.3389/fgene.2014.00106] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2014] [Accepted: 04/10/2014] [Indexed: 12/15/2022] Open
Abstract
Gene-gene interactions may contribute to the genetic variation underlying complex traits but have not always been taken fully into account. Statistical analyses that consider gene-gene interaction may increase the power of detecting associations, especially for low-marginal-effect markers, and may explain in part the "missing heritability." Detecting pair-wise and higher-order interactions genome-wide requires enormous computational power. Filtering pipelines increase the computational speed by limiting the number of tests performed. We summarize existing filtering approaches to detect epistasis, after distinguishing the purposes that lead us to search for epistasis. Statistical filtering includes quality control on the basis of single marker statistics to avoid the analysis of bad and least informative data, and limits the search space for finding interactions. Biological filtering includes targeting specific pathways, integrating various databases based on known biological and metabolic pathways, gene function ontology and protein-protein interactions. It is increasingly possible to target single-nucleotide polymorphisms that have defined functions on gene expression, though not belonging to protein-coding genes. Filtering can improve the power of an interaction association study, but also increases the chance of missing important findings.
Collapse
Affiliation(s)
- Xiangqing Sun
- Department of Epidemiology and Biostatistics, Case Western Reserve UniversityCleveland, OH, USA
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State UniversityEast Lansing, MI, USA
| | | | - Paul K. Crane
- Department of Medicine, University of WashingtonSeattle, WA, USA
| | - Robert Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve UniversityCleveland, OH, USA
| | - Marylyn D. Ritchie
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University ParkPA, USA
| |
Collapse
|
44
|
Gou J, Zhao Y, Wei Y, Wu C, Zhang R, Qiu Y, Zeng P, Tan W, Yu D, Wu T, Hu Z, Lin D, Shen H, Chen F. Stability SCAD: a powerful approach to detect interactions in large-scale genomic study. BMC Bioinformatics 2014; 15:62. [PMID: 24580776 PMCID: PMC3984751 DOI: 10.1186/1471-2105-15-62] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 02/18/2014] [Indexed: 11/25/2022] Open
Abstract
Background Evidence suggests that common complex diseases may be partially due to SNP-SNP interactions, but such detection is yet to be fully established in a high-dimensional small-sample (small-n-large-p) study. A number of penalized regression techniques are gaining popularity within the statistical community, and are now being applied to detect interactions. These techniques tend to be over-fitting, and are prone to false positives. The recently developed stability least absolute shrinkage and selection operator (SLASSO) has been used to control family-wise error rate, but often at the expense of power (and thus false negative results). Results Here, we propose an alternative stability selection procedure known as stability smoothly clipped absolute deviation (SSCAD). Briefly, this method applies a smoothly clipped absolute deviation (SCAD) algorithm to multiple sub-samples, and then identifies cluster ensemble of interactions across the sub-samples. The proposed method was compared with SLASSO and two kinds of traditional penalized methods by intensive simulation. The simulation revealed higher power and lower false discovery rate (FDR) with SSCAD. An analysis using the new method on the previously published GWAS of lung cancer confirmed all significant interactions identified with SLASSO, and identified two additional interactions not reported with SLASSO analysis. Conclusions Based on the results obtained in this study, SSCAD presents to be a powerful procedure for the detection of SNP-SNP interactions in large-scale genomic data.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | - Feng Chen
- Department of Epidemiology and Biostatistics and Ministry of Education (MOE) Key Lab for Modern Toxicology, School of Public Health, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
45
|
Hu JK, Wang X, Wang P. Testing gene-gene interactions in genome wide association studies. Genet Epidemiol 2014; 38:123-34. [PMID: 24431225 DOI: 10.1002/gepi.21786] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Revised: 10/11/2013] [Accepted: 12/02/2013] [Indexed: 11/07/2022]
Abstract
Detection of gene-gene interaction has become increasingly popular over the past decade in genome wide association studies (GWAS). Besides traditional logistic regression analysis for detecting interactions between two markers, new methods have been developed in recent years such as comparing linkage disequilibrium (LD) in case and control groups. All these methods form the building blocks of most screening strategies for disease susceptibility loci in GWAS. In this paper, we are interested in comparing the competing methods and providing practical guidelines for selecting appropriate testing methods for interaction in GWAS. We first review a series of existing statistical methods to detect interactions, and then examine different definitions of interactions to gain insight into the theoretical relationship between the existing testing methods. Lastly, we perform extensive simulations to compare powers of various methods to detect either interaction between two markers at two unlinked loci or the overall association allowing for both interaction and main effects. This investigation reveals informative characteristics of various methods that are helpful to GWAS investigators.
Collapse
Affiliation(s)
- Jie Kate Hu
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | | | | |
Collapse
|
46
|
Chen GK, Guo Y. Discovering epistasis in large scale genetic association studies by exploiting graphics cards. Front Genet 2013; 4:266. [PMID: 24348518 PMCID: PMC3848199 DOI: 10.3389/fgene.2013.00266] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 11/16/2013] [Indexed: 11/13/2022] Open
Abstract
Despite the enormous investments made in collecting DNA samples and generating germline variation data across thousands of individuals in modern genome-wide association studies (GWAS), progress has been frustratingly slow in explaining much of the heritability in common disease. Today's paradigm of testing independent hypotheses on each single nucleotide polymorphism (SNP) marker is unlikely to adequately reflect the complex biological processes in disease risk. Alternatively, modeling risk as an ensemble of SNPs that act in concert in a pathway, and/or interact non-additively on log risk for example, may be a more sensible way to approach gene mapping in modern studies. Implementing such analyzes genome-wide can quickly become intractable due to the fact that even modest size SNP panels on modern genotype arrays (500k markers) pose a combinatorial nightmare, require tens of billions of models to be tested for evidence of interaction. In this article, we provide an in-depth analysis of programs that have been developed to explicitly overcome these enormous computational barriers through the use of processors on graphics cards known as Graphics Processing Units (GPU). We include tutorials on GPU technology, which will convey why they are growing in appeal with today's numerical scientists. One obvious advantage is the impressive density of microprocessor cores that are available on only a single GPU. Whereas high end servers feature up to 24 Intel or AMD CPU cores, the latest GPU offerings from nVidia feature over 2600 cores. Each compute node may be outfitted with up to 4 GPU devices. Success on GPUs varies across problems. However, epistasis screens fare well due to the high degree of parallelism exposed in these problems. Papers that we review routinely report GPU speedups of over two orders of magnitude (>100x) over standard CPU implementations.
Collapse
Affiliation(s)
- Gary K Chen
- Division of Biostatics, Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Yunfei Guo
- Division of Biostatics, Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA ; Zilkha Neurogenetic Institute, University of Southern California Los Angeles, CA, USA
| |
Collapse
|
47
|
Bonifaci N, Colas E, Serra-Musach J, Karbalai N, Brunet J, Gómez A, Esteller M, Fernández-Taboada E, Berenguer A, Reventós J, Müller-Myhsok B, Amundadottir L, Duell EJ, Pujana MÀ. Integrating gene expression and epidemiological data for the discovery of genetic interactions associated with cancer risk. Carcinogenesis 2013; 35:578-85. [PMID: 24296589 DOI: 10.1093/carcin/bgt403] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Dozens of common genetic variants associated with cancer risk have been identified through genome-wide association studies (GWASs). However, these variants only explain a modest fraction of the heritability of disease. The missing heritability has been attributed to several factors, among them the existence of genetic interactions (G × G). Systematic screens for G × G in model organisms have revealed their fundamental influence in complex phenotypes. In this scenario, G × G overlap significantly with other types of gene and/or protein relationships. Here, by integrating predicted G × G from GWAS data and complex- and context-defined gene coexpression profiles, we provide evidence for G × G associated with cancer risk. G × G predicted from a breast cancer GWAS dataset identified significant overlaps [relative enrichments (REs) of 8-36%, empirical P values < 0.05 to 10(-4)] with complex (non-linear) gene coexpression in breast tumors. The use of gene or protein data not specific for breast cancer did not reveal overlaps. According to the predicted G × G, experimental assays demonstrated functional interplay between lipoma-preferred partner and transforming growth factor-β signaling in the MCF10A non-tumorigenic mammary epithelial cell model. Next, integration of pancreatic tumor gene expression profiles with pancreatic cancer G × G predicted from a GWAS corroborated the observations made for breast cancer risk (REs of 25-59%). The method presented here can potentially support the identification of genetic interactions associated with cancer risk, providing novel mechanistic hypotheses for carcinogenesis.
Collapse
Affiliation(s)
- Núria Bonifaci
- Breast Cancer and Systems Biology Unit, Translational Research Laboratory, Catalan Institute of Oncology (ICO), Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona 08908, Catalonia, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
From interaction to co-association --a Fisher r-to-z transformation-based simple statistic for real world genome-wide association study. PLoS One 2013; 8:e70774. [PMID: 23923021 PMCID: PMC3726765 DOI: 10.1371/journal.pone.0070774] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2013] [Accepted: 06/21/2013] [Indexed: 12/21/2022] Open
Abstract
Currently, the genetic variants identified by genome wide association study (GWAS) generally only account for a small proportion of the total heritability for complex disease. One crucial reason is the underutilization of gene-gene joint effects commonly encountered in GWAS, which includes their main effects and co-association. However, gene-gene co-association is often customarily put into the framework of gene-gene interaction vaguely. From the causal graph perspective, we elucidate in detail the concept and rationality of gene-gene co-association as well as its relationship with traditional gene-gene interaction, and propose two Fisher r-to-z transformation-based simple statistics to detect it. Three series of simulations further highlight that gene-gene co-association refers to the extent to which the joint effects of two genes differs from the main effects, not only due to the traditional interaction under the nearly independent condition but the correlation between two genes. The proposed statistics are more powerful than logistic regression under various situations, cannot be affected by linkage disequilibrium and can have acceptable false positive rate as long as strictly following the reasonable GWAS data analysis roadmap. Furthermore, an application to gene pathway analysis associated with leprosy confirms in practice that our proposed gene-gene co-association concepts as well as the correspondingly proposed statistics are strongly in line with reality.
Collapse
|
49
|
Lewinger JP, Morrison JL, Thomas DC, Murcray CE, Conti DV, Li D, Gauderman WJ. Efficient two-step testing of gene-gene interactions in genome-wide association studies. Genet Epidemiol 2013; 37:440-51. [PMID: 23633124 DOI: 10.1002/gepi.21720] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2012] [Revised: 12/12/2012] [Accepted: 02/06/2013] [Indexed: 11/06/2022]
Abstract
Exhaustive testing of all possible SNP pairs in a genome-wide association study (GWAS) generally yields low power to detect gene-gene (G × G) interactions because of small effect sizes and stringent requirements for multiple-testing correction. We introduce a new two-step procedure for testing G × G interactions in case-control GWAS to detect interacting single nucleotide polymorphisms (SNPs) regardless of their marginal effects. In an initial screening step, all SNP pairs are tested for gene-gene association in the combined sample of cases and controls. In the second step, the pairs that pass the screening are followed up with a traditional test for G × G interaction. We show that the two-step method is substantially more powerful to detect G × G interactions than the exhaustive testing approach. For example, with 2,000 cases and 2,000 controls, the two-step method can have more than 90% power to detect an interaction odds ratio of 2.0 compared to less than 50% power for the exhaustive testing approach. Moreover, we show that a hybrid two-step approach that combines our newly proposed two-step test and the two-step test that screens for marginal effects retains the best power properties of both. The two-step procedures we introduce have the potential to uncover genetic signals that have not been previously identified in an initial single-SNP GWAS. We demonstrate the computational feasibility of the two-step G × G procedure by performing a G × G scan in the asthma GWAS of the University of Southern California Children's Health Study.
Collapse
Affiliation(s)
- Juan Pablo Lewinger
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California 90032, USA.
| | | | | | | | | | | | | |
Collapse
|
50
|
Hemani G, Knott S, Haley C. An evolutionary perspective on epistasis and the missing heritability. PLoS Genet 2013; 9:e1003295. [PMID: 23509438 PMCID: PMC3585114 DOI: 10.1371/journal.pgen.1003295] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2012] [Accepted: 12/17/2012] [Indexed: 01/04/2023] Open
Abstract
The relative importance between additive and non-additive genetic variance has been widely argued in quantitative genetics. By approaching this question from an evolutionary perspective we show that, while additive variance can be maintained under selection at a low level for some patterns of epistasis, the majority of the genetic variance that will persist is actually non-additive. We propose that one reason that the problem of the “missing heritability” arises is because the additive genetic variation that is estimated to be contributing to the variance of a trait will most likely be an artefact of the non-additive variance that can be maintained over evolutionary time. In addition, it can be shown that even a small reduction in linkage disequilibrium between causal variants and observed SNPs rapidly erodes estimates of epistatic variance, leading to an inflation in the perceived importance of additive effects. We demonstrate that the perception of independent additive effects comprising the majority of the genetic architecture of complex traits is biased upwards and that the search for causal variants in complex traits under selection is potentially underpowered by parameterising for additive effects alone. Given dense SNP panels the detection of causal variants through genome-wide association studies may be improved by searching for epistatic effects explicitly. In this study we have shown that two independent problems may have a common cause. Why do traits under selection exhibit additive genetic variance, and why is the proportion of the heritability explained by additive effects much smaller than the total heritability estimated to exist? Our results indicate that epistatic interactions can allow deleterious mutations to persist under selection and that these interactions can abate the depletion of additive genetic variation. Furthermore, a much larger element of non-additive genetic variance is maintained, which supports the notion that the heritability estimated from family studies could be a mixture of both additive and non-additive components. We show that searching directly for epistatic effects greatly improves the discovery of variants under selection, despite the multiple testing penalty being much larger. Finally, we demonstrate that common practices in genome-wide association studies could lead to both an ascertainment bias in detecting additive effects and a confirmation bias in perceiving that most of the genetic variance is additive.
Collapse
Affiliation(s)
- Gibran Hemani
- The Roslin Institute and Royal (Dick) School of Veterinary Science, University of Edinburgh, Edinburgh, United Kingdom
- * E-mail:
| | - Sara Knott
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Chris Haley
- The Roslin Institute and Royal (Dick) School of Veterinary Science, University of Edinburgh, Edinburgh, United Kingdom
- Institute for Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|