1
|
BridGE: a pathway-based analysis tool for detecting genetic interactions from GWAS. Nat Protoc 2024; 19:1400-1435. [PMID: 38514837 DOI: 10.1038/s41596-024-00954-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 11/22/2023] [Indexed: 03/23/2024]
Abstract
Genetic interactions have the potential to modulate phenotypes, including human disease. In principle, genome-wide association studies (GWAS) provide a platform for detecting genetic interactions; however, traditional methods for identifying them, which tend to focus on testing individual variant pairs, lack statistical power. In this protocol, we describe a novel computational approach, called Bridging Gene sets with Epistasis (BridGE), for discovering genetic interactions between biological pathways from GWAS data. We present a Python-based implementation of BridGE along with instructions for its application to a typical human GWAS cohort. The major stages include initial data processing and quality control, construction of a variant-level genetic interaction network, measurement of pathway-level genetic interactions, evaluation of statistical significance using sample permutations and generation of results in a standardized output format. The BridGE software pipeline includes options for running the analysis on multiple cores and multiple nodes for users who have access to computing clusters or a cloud computing environment. In a cluster computing environment with 10 nodes and 100 GB of memory per node, the method can be run in less than 24 h for typical human GWAS cohorts. Using BridGE requires knowledge of running Python programs and basic shell script programming experience.
Collapse
|
2
|
Genetic interactions of schizophrenia using gene-based statistical epistasis exclusively identify nervous system-related pathways and key hub genes. Front Genet 2024; 14:1301150. [PMID: 38259618 PMCID: PMC10800577 DOI: 10.3389/fgene.2023.1301150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Accepted: 12/12/2023] [Indexed: 01/24/2024] Open
Abstract
Background: The relationship between genotype and phenotype is governed by numerous genetic interactions (GIs), and the mapping of GI networks is of interest for two main reasons: 1) By modelling biological robustness, GIs provide a powerful opportunity to infer compensatory biological mechanisms via the identification of functional relationships between genes, which is of interest for biological discovery and translational research. Biological systems have evolved to compensate for genetic (i.e., variations and mutations) and environmental (i.e., drug efficacy) perturbations by exploiting compensatory relationships between genes, pathways and biological processes; 2) GI facilitates the identification of the direction (alleviating or aggravating interactions) and magnitude of epistatic interactions that influence the phenotypic outcome. The generation of GIs for human diseases is impossible using experimental biology approaches such as systematic deletion analysis. Moreover, the generation of disease-specific GIs has never been undertaken in humans. Methods: We used our Indian schizophrenia case-control (case-816, controls-900) genetic dataset to implement the workflow. Standard GWAS sample quality control procedure was followed. We used the imputed genetic data to increase the SNP coverage to analyse epistatic interactions across the genome comprehensively. Using the odds ratio (OR), we identified the GIs that increase or decrease the risk of a disease phenotype. The SNP-based epistatic results were transformed into gene-based epistatic results. Results: We have developed a novel approach by conducting gene-based statistical epistatic analysis using an Indian schizophrenia case-control genetic dataset and transforming these results to infer GIs that increase the risk of schizophrenia. There were ∼9.5 million GIs with a p-value ≤ 1 × 10-5. Approximately 4.8 million GIs showed an increased risk (OR > 1.0), while ∼4.75 million GIs had a decreased risk (OR <1.0) for schizophrenia. Conclusion: Unlike model organisms, this approach is specifically viable in humans due to the availability of abundant disease-specific genome-wide genotype datasets. The study exclusively identified brain/nervous system-related processes, affirming the findings. This computational approach fills a critical gap by generating practically non-existent heritable disease-specific human GIs from human genetic data. These novel datasets can train innovative deep-learning models, potentially surpassing the limitations of conventional GWAS.
Collapse
|
3
|
A pharmacogenetic interaction analysis of bevacizumab with paclitaxel in advanced breast cancer patients. NPJ Breast Cancer 2022; 8:33. [PMID: 35314692 PMCID: PMC8938486 DOI: 10.1038/s41523-022-00400-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 02/07/2022] [Indexed: 11/18/2022] Open
Abstract
To investigate pharmacogenetic interactions among VEGF-A, VEGFR-2, IL-8, HIF-1α, EPAS-1, and TSP-1 SNPs and their role on progression-free survival (PFS) in metastatic breast cancer (MBC) patients treated with bevacizumab plus first-line paclitaxel or with paclitaxel alone. Analyses were performed on germline DNA, and SNPs were investigated by real-time PCR technique. The multifactor dimensionality reduction (MDR) methodology was applied to investigate the interaction between SNPs. The present study was an explorative, ambidirectional cohort study: 307 patients from 11 Oncology Units were evaluated retrospectively from 2009 to 2016, then followed prospectively (NCT01935102). Two hundred and fifteen patients were treated with paclitaxel and bevacizumab, whereas 92 patients with paclitaxel alone. In the bevacizumab plus paclitaxel group, the MDR software provided two pharmacogenetic interaction profiles consisting of the combination between specific VEGF-A rs833061 and VEGFR-2 rs1870377 genotypes. Median PFS for favorable genetic profile was 16.8 vs. the 10.6 months of unfavorable genetic profile (p = 0.0011). Cox proportional hazards model showed an adjusted hazard ratio of 0.64 (95% CI, 0.5–0.9; p = 0.004). Median OS for the favorable genetic profile was 39.6 vs. 28 months of unfavorable genetic profile (p = 0.0103). Cox proportional hazards model revealed an adjusted hazard ratio of 0.71 (95% CI, 0.5–1.01; p = 0.058). In the 92 patients treated with paclitaxel alone, the results showed no effect of the favorable genetic profile, as compared to the unfavorable genetic profile, either on the PFS (p = 0.509) and on the OS (p = 0.732). The pharmacogenetic statistical interaction between VEGF-A rs833061 and VEGFR-2 rs1870377 genotypes may identify a population of bevacizumab-treated patients with a better PFS.
Collapse
|
4
|
Abstract
AIMS Coronary artery disease (CAD) has a strong genetic predisposition. However, despite substantial discoveries made by genome-wide association studies (GWAS), a large proportion of heritability awaits identification. Non-additive genetic effects might be responsible for part of the unaccounted genetic variance. Here, we attempted a proof-of-concept study to identify non-additive genetic effects, namely epistatic interactions, associated with CAD. METHODS AND RESULTS We tested for epistatic interactions in 10 CAD case-control studies and UK Biobank with focus on 8068 SNPs at 56 loci with known associations with CAD risk. We identified a SNP pair located in cis at the LPA locus, rs1800769 and rs9458001, to be jointly associated with risk for CAD [odds ratio (OR) = 1.37, P = 1.07 × 10-11], peripheral arterial disease (OR = 1.22, P = 2.32 × 10-4), aortic stenosis (OR = 1.47, P = 6.95 × 10-7), hepatic lipoprotein(a) (Lp(a)) transcript levels (beta = 0.39, P = 1.41 × 10-8), and Lp(a) serum levels (beta = 0.58, P = 8.7 × 10-32), while individual SNPs displayed no association. Further exploration of the LPA locus revealed a strong dependency of these associations on a rare variant, rs140570886, that was previously associated with Lp(a) levels. We confirmed increased CAD risk for heterozygous (relative OR = 1.46, P = 9.97 × 10-32) and individuals homozygous for the minor allele (relative OR = 1.77, P = 0.09) of rs140570886. Using forward model selection, we also show that epistatic interactions between rs140570886, rs9458001, and rs1800769 modulate the effects of the rs140570886 risk allele. CONCLUSIONS These results demonstrate the feasibility of a large-scale knowledge-based epistasis scan and provide rare evidence of an epistatic interaction in a complex human disease. We were directed to a variant (rs140570886) influencing risk through additive genetic as well as epistatic effects. In summary, this study provides deeper insights into the genetic architecture of a locus important for cardiovascular diseases.
Collapse
|
5
|
Evaluation of Existing Methods for High-Order Epistasis Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:912-926. [PMID: 33055017 DOI: 10.1109/tcbb.2020.3030312] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Finding epistatic interactions among loci when expressing a phenotype is a widely employed strategy to understand the genetic architecture of complex traits in GWAS. The abundance of methods dedicated to the same purpose, however, makes it increasingly difficult for scientists to decide which method is more suitable for their studies. This work compares the different epistasis detection methods published during the last decade in terms of runtime, detection power and type I error rate, with a special emphasis on high-order interactions. Results show that in terms of detection power, the only methods that perform well across all experiments are the exhaustive methods, although their computational cost may be prohibitive in large-scale studies. Regarding non-exhaustive methods, not one could consistently find epistasis interactions when marginal effects are absent. If marginal effects are present, there are methods that perform well for high-order interactions, such as BADTrees, FDHE-IW, SingleMI or SNPHarvester. As for false-positive control, only SNPHarvester, FDHE-IW and DCHE show good results. The study concludes that there is no single epistasis detection method to recommend in all scenarios. Authors should prioritize exhaustive methods when sufficient computational resources are available considering the data set size, and resort to non-exhaustive methods when the analysis time is prohibitive.
Collapse
|
6
|
RIL-StEp: epistasis analysis of rice recombinant inbred lines reveals candidate interacting genes that control seed hull color and leaf chlorophyll content. G3 (BETHESDA, MD.) 2021; 11:jkab130. [PMID: 33871605 PMCID: PMC8496299 DOI: 10.1093/g3journal/jkab130] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 04/10/2021] [Indexed: 11/19/2022]
Abstract
Characterizing epistatic gene interactions is fundamental for understanding the genetic architecture of complex traits. However, due to the large number of potential gene combinations, detecting epistatic gene interactions is computationally demanding. A simple, easy-to-perform method for sensitive detection of epistasis is required. Due to their homozygous nature, use of recombinant inbred lines excludes the dominance effect of alleles and interactions involving heterozygous genotypes, thereby allowing detection of epistasis in a simple and interpretable model. Here, we present an approach called RIL-StEp (recombinant inbred lines stepwise epistasis detection) to detect epistasis using single-nucleotide polymorphisms in the genome. We applied the method to reveal epistasis affecting rice (Oryza sativa) seed hull color and leaf chlorophyll content and successfully identified pairs of genomic regions that presumably control these phenotypes. This method has the potential to improve our understanding of the genetic architecture of various traits of crops and other organisms.
Collapse
|
7
|
Genetic interaction analysis in microbial pathogens: unravelling networks of pathogenesis, antimicrobial susceptibility and host interactions. FEMS Microbiol Rev 2021; 45:fuaa055. [PMID: 33145589 DOI: 10.1093/femsre/fuaa055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 10/16/2020] [Indexed: 12/13/2022] Open
Abstract
Genetic interaction (GI) analysis is a powerful genetic strategy that analyzes the fitness and phenotypes of single- and double-gene mutant cells in order to dissect the epistatic interactions between genes, categorize genes into biological pathways, and characterize genes of unknown function. GI analysis has been extensively employed in model organisms for foundational, systems-level assessment of the epistatic interactions between genes. More recently, GI analysis has been applied to microbial pathogens and has been instrumental for the study of clinically important infectious organisms. Here, we review recent advances in systems-level GI analysis of diverse microbial pathogens, including bacterial and fungal species. We focus on important applications of GI analysis across pathogens, including GI analysis as a means to decipher complex genetic networks regulating microbial virulence, antimicrobial drug resistance and host-pathogen dynamics, and GI analysis as an approach to uncover novel targets for combination antimicrobial therapeutics. Together, this review bridges our understanding of GI analysis and complex genetic networks, with applications to diverse microbial pathogens, to further our understanding of virulence, the use of antimicrobial therapeutics and host-pathogen interactions. .
Collapse
|
8
|
Impact of modular mitochondrial epistatic interactions on the evolution of human subpopulations. Mitochondrion 2021; 58:111-122. [PMID: 33618020 DOI: 10.1016/j.mito.2021.02.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 09/22/2020] [Accepted: 02/03/2021] [Indexed: 12/23/2022]
Abstract
Investigation of human mitochondrial (mt) genome variation has been shown to provide insights to the human history and natural selection. By analyzing 24,167 human mt-genome samples, collected for five continents, we have developed a co-mutation network model to investigate characteristic human evolutionary patterns. The analysis highlighted richer co-mutating regions of the mt-genome, suggesting the presence of epistasis. Specifically, a large portion of COX genes was found to co-mutate in Asian and American populations, whereas, in African, European, and Oceanic populations, there was greater co-mutation bias in hypervariable regions. Interestingly, this study demonstrated hierarchical modularity as a crucial agent for these co-mutation networks. More profoundly, our ancestry-based co-mutation module analyses showed that mutations cluster preferentially in known mitochondrial haplogroups. Contemporary human mt-genome nucleotides most closely resembled the ancestral state, and very few of them were found to be ancestral-variants. Overall, these results demonstrated that subpopulation-based biases may favor mitochondrial gene specific epistasis.
Collapse
|
9
|
Gene networks determine predisposition to AMD. Genomics 2021; 113:514-522. [PMID: 32979492 DOI: 10.1016/j.ygeno.2020.09.044] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 08/13/2020] [Accepted: 09/21/2020] [Indexed: 01/18/2023]
Abstract
PURPOSE AMD genetic studies have revealed various genetic loci as causal to AMD pathology. We have described the genetic complexity of Indian AMD by describing the interaction of genotypes and subsequent changes in protein expression under the influence of environmental factors. This can be utilized to enhance the diagnostic and therapeutic efficacy in AMD patients. DESIGN Genotype association was studied in 464 participants (AMD =277 & controls = 187) for eight genetic variants and their corresponding protein expression METHODS: SNP analysis and protein expression analysis was carried out in AMD and controls in tandem with longitudinal assessment of protein levels during the course of AMD pathology. ANCOVA and contrast analysis were used to examine the genotypic interactions and corresponding alterations in protein levels. In order to identify the important genetic variants Logistic Regression (LR) modeling was carried out and to authenticate the model Area under the Receiver Operating Characteristic curve (AUROC) were also computed. RESULTS We have found genetic variants of rs5749482 (TIMP-3), rs11200638 (HTRA1), rs769449 (APOE) and rs6795735 (ADAMTS9) to be associated with AMD, concomitant with significant alterations of studied proteins levels. Analysis also revealed that the genetic interaction between APOE-HTRA1 genotypes and changes in LIPC levels (>6 pg/ug) by one unit change in SNP, play a crucial role in AMD. LR model suggested that the seven factors (including both genetic and environmental) can be utilized to predict the AMD cases with 88% efficacy and 95.6% AUROC. CONCLUSION Results suggest that diagnostic and therapeutic strategy for Indian AMD must include estimation of genetic interaction and concomitant changes in expression levels of proteins under influence of environmental factors.
Collapse
|
10
|
A parallelized strategy for epistasis analysis based on Empirical Bayesian Elastic Net models. Bioinformatics 2020; 36:3803-3810. [PMID: 32227194 DOI: 10.1093/bioinformatics/btaa216] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 03/05/2020] [Accepted: 03/26/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Epistasis reflects the distortion on a particular trait or phenotype resulting from the combinatorial effect of two or more genes or genetic variants. Epistasis is an important genetic foundation underlying quantitative traits in many organisms as well as in complex human diseases. However, there are two major barriers in identifying epistasis using large genomic datasets. One is that epistasis analysis will induce over-fitting of an over-saturated model with the high-dimensionality of a genomic dataset. Therefore, the problem of identifying epistasis demands efficient statistical methods. The second barrier comes from the intensive computing time for epistasis analysis, even when the appropriate model and data are specified. RESULTS In this study, we combine statistical techniques and computational techniques to scale up epistasis analysis using Empirical Bayesian Elastic Net (EBEN) models. Specifically, we first apply a matrix manipulation strategy for pre-computing the correlation matrix and pre-filter to narrow down the search space for epistasis analysis. We then develop a parallelized approach to further accelerate the modeling process. Our experiments on synthetic and empirical genomic data demonstrate that our parallelized methods offer tens of fold speed up in comparison with the classical EBEN method which runs in a sequential manner. We applied our parallelized approach to a yeast dataset, and we were able to identify both main and epistatic effects of genetic variants associated with traits such as fitness. AVAILABILITY AND IMPLEMENTATION The software is available at github.com/shilab/parEBEN.
Collapse
|
11
|
Genome-wide association and epistatic interactions of flowering time in soybean cultivar. PLoS One 2020; 15:e0228114. [PMID: 31968016 PMCID: PMC6975553 DOI: 10.1371/journal.pone.0228114] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 01/07/2020] [Indexed: 12/02/2022] Open
Abstract
Genome-wide association studies (GWAS) have enabled the discovery of candidate markers that play significant roles in various complex traits in plants. Recently, with increased interest in the search for candidate markers, studies on epistatic interactions between single nucleotide polymorphism (SNP) markers have also increased, thus enabling the identification of more candidate markers along with GWAS on single-variant-additive-effect. Here, we focused on the identification of candidate markers associated with flowering time in soybean (Glycine max). A large population of 2,662 cultivated soybean accessions was genotyped using the 180k Axiom® SoyaSNP array, and the genomic architecture of these accessions was investigated to confirm the population structure. Then, GWAS was conducted to evaluate the association between SNP markers and flowering time. A total of 93 significant SNP markers were detected within 59 significant genes, including E1 and E3, which are the main determinants of flowering time. Based on the GWAS results, multilocus epistatic interactions were examined between the significant and non-significant SNP markers. Two significant and 16 non-significant SNP markers were discovered as candidate markers affecting flowering time via interactions with each other. These 18 candidate SNP markers mapped to 18 candidate genes including E1 and E3, and the 18 candidate genes were involved in six major flowering pathways. Although further biological validation is needed, our results provide additional information on the existing flowering time markers and present another option to marker-assisted breeding programs for regulating flowering time of soybean.
Collapse
|
12
|
Discovering genetic interactions bridging pathways in genome-wide association studies. Nat Commun 2019; 10:4274. [PMID: 31537791 PMCID: PMC6753138 DOI: 10.1038/s41467-019-12131-7] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 08/20/2019] [Indexed: 12/20/2022] Open
Abstract
Genetic interactions have been reported to underlie phenotypes in a variety of systems, but the extent to which they contribute to complex disease in humans remains unclear. In principle, genome-wide association studies (GWAS) provide a platform for detecting genetic interactions, but existing methods for identifying them from GWAS data tend to focus on testing individual locus pairs, which undermines statistical power. Importantly, a global genetic network mapped for a model eukaryotic organism revealed that genetic interactions often connect genes between compensatory functional modules in a highly coherent manner. Taking advantage of this expected structure, we developed a computational approach called BridGE that identifies pathways connected by genetic interactions from GWAS data. Applying BridGE broadly, we discover significant interactions in Parkinson's disease, schizophrenia, hypertension, prostate cancer, breast cancer, and type 2 diabetes. Our novel approach provides a general framework for mapping complex genetic networks underlying human disease from genome-wide genotype data.
Collapse
|
13
|
How to increase our belief in discovered statistical interactions via large-scale association studies? Hum Genet 2019; 138:293-305. [PMID: 30840129 PMCID: PMC6483943 DOI: 10.1007/s00439-019-01987-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 02/20/2019] [Indexed: 12/31/2022]
Abstract
The understanding that differences in biological epistasis may impact disease risk, diagnosis, or disease management stands in wide contrast to the unavailability of widely accepted large-scale epistasis analysis protocols. Several choices in the analysis workflow will impact false-positive and false-negative rates. One of these choices relates to the exploitation of particular modelling or testing strategies. The strengths and limitations of these need to be well understood, as well as the contexts in which these hold. This will contribute to determining the potentially complementary value of epistasis detection workflows and is expected to increase replication success with biological relevance. In this contribution, we take a recently introduced regression-based epistasis detection tool as a leading example to review the key elements that need to be considered to fully appreciate the value of analytical epistasis detection performance assessments. We point out unresolved hurdles and give our perspectives towards overcoming these.
Collapse
|
14
|
Collective feature selection to identify crucial epistatic variants. BioData Min 2018; 11:5. [PMID: 29713383 PMCID: PMC5907720 DOI: 10.1186/s13040-018-0168-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 04/04/2018] [Indexed: 01/17/2023] Open
Abstract
Background Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis. Many methods have been evaluated and proposed to perform feature selection, but no single method works best in all scenarios. We demonstrate this by conducting two separate simulation analyses to evaluate the proposed collective feature selection approach. Results Through our simulation study we propose a collective feature selection approach to select features that are in the "union" of the best performing methods. We explored various parametric, non-parametric, and data mining approaches to perform feature selection. We choose our top performing methods to select the union of the resulting variables based on a user-defined percentage of variants selected from each method to take to downstream analysis. Our simulation analysis shows that non-parametric data mining approaches, such as MDR, may work best under one simulation criteria for the high effect size (penetrance) datasets, while non-parametric methods designed for feature selection, such as Ranger and Gradient boosting, work best under other simulation criteria. Thus, using a collective approach proves to be more beneficial for selecting variables with epistatic effects also in low effect size datasets and different genetic architectures. Following this, we applied our proposed collective feature selection approach to select the top 1% of variables to identify potential interacting variables associated with Body Mass Index (BMI) in ~ 44,000 samples obtained from Geisinger's MyCode Community Health Initiative (on behalf of DiscovEHR collaboration). Conclusions In this study, we were able to show that selecting variables using a collective feature selection approach could help in selecting true positive epistatic variables more frequently than applying any single method for feature selection via simulation studies. We were able to demonstrate the effectiveness of collective feature selection along with a comparison of many methods in our simulation analysis. We also applied our method to identify non-linear networks associated with obesity.
Collapse
|
15
|
The search for gene-gene interactions in genome-wide association studies: challenges in abundance of methods, practical considerations, and biological interpretation. ANNALS OF TRANSLATIONAL MEDICINE 2018; 6:157. [PMID: 29862246 DOI: 10.21037/atm.2018.04.05] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
One of the primary goals in this era of precision medicine is to understand the biology of human diseases and their treatment, such that each individual patient receives the best possible treatment for their disease based on their genetic and environmental exposures. One way to work towards achieving this goal is to identify the environmental exposures and genetic variants that are relevant to each disease in question, as well as the complex interplay between genes and environment. Genome-wide association studies (GWAS) have allowed for a greater understanding of the genetic component of many complex traits. However, these genetic effects are largely small and thus, our ability to use these GWAS finding for precision medicine is limited. As more and more GWAS have been performed, rather than focusing only on common single nucleotide polymorphisms (SNPs) and additive genetic models, many researchers have begun to explore alternative heritable components of complex traits including rare variants, structural variants, epigenetics, and genetic interactions. While genetic interactions are a plausible reality that could explain some of the heritabliy that has not yet been identified, especially when one considers the identification of genetic interactions in model organisms as well as our understanding of biological complexity, still there are significant challenges and considerations in identifying these genetic interactions. Broadly, these can be summarized in three categories: abundance of methods, practical considerations, and biological interpretation. In this review, we will discuss these important elements in the search for genetic interactions along with some potential solutions. While genetic interactions are theoretically understood to be important for complex human disease, the body of evidence is still building to support this component of the underlying genetic architecture of complex human traits. Our hope is that more sophisticated modeling approaches and more robust computational techniques will enable the community to identify these important genetic interactions and improve our ability to implement precision medicine in the future.
Collapse
|
16
|
Abstract
Most common disorders affecting human health are not attributable to simple Mendelian (single-gene) inheritance patterns. Rather, the risk of developing a complex disease is often the result of interactions across genes, whereby one gene modifies the phenotype of another gene. These types of interactions can occur between two or more genes and are referred to as epistasis. There are five major types of epistatic interactions, but in human genetics, additive epistasis is most often discussed and includes both positive and negative subtypes. Detecting epistatic interactions can be quite difficult because seemingly unrelated genes can interact with and influence each other. As a result of this complexity, statistical geneticists are constantly developing new methods to enhance detection, but there are disadvantages to each proposed method. In this article, we explore the concept of epistasis, discuss different types of epistatic interactions, and provide a brief introduction to statistical methods researchers use to uncover sets of epistatic interactions. Then, we consider Alzheimer's disease as an exemplar for a disease with epistatic effects. Finally, we provide helpful resources, where nurses can learn more about epistasis in order to incorporate these methods into their own program of research.
Collapse
|
17
|
The Influence of Higher-Order Epistasis on Biological Fitness Landscape Topography. JOURNAL OF STATISTICAL PHYSICS 2018; 172:208-225. [PMID: 29904213 PMCID: PMC5986866 DOI: 10.1007/s10955-018-1975-3] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 01/24/2018] [Indexed: 05/31/2023]
Abstract
The effect of a mutation on the organism often depends on what other mutations are already present in its genome. Geneticists refer to such mutational interactions as epistasis. Pairwise epistatic effects have been recognized for over a century, and their evolutionary implications have received theoretical attention for nearly as long. However, pairwise epistatic interactions themselves can vary with genomic background. This is called higher-order epistasis, and its consequences for evolution are much less well understood. Here, we assess the influence that higher-order epistasis has on the topography of 16 published, biological fitness landscapes. We find that on average, their effects on fitness landscape declines with order, and suggest that notable exceptions to this trend may deserve experimental scrutiny. We conclude by highlighting opportunities for further theoretical and experimental work dissecting the influence that epistasis of all orders has on fitness landscape topography and on the efficiency of evolution by natural selection.
Collapse
|
18
|
Another Round of "Clue" to Uncover the Mystery of Complex Traits. Genes (Basel) 2018; 9:E61. [PMID: 29370075 PMCID: PMC5852557 DOI: 10.3390/genes9020061] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 12/19/2017] [Accepted: 01/15/2018] [Indexed: 12/13/2022] Open
Abstract
A plethora of genetic association analyses have identified several genetic risk loci. Technological and statistical advancements have now led to the identification of not only common genetic variants, but also low-frequency variants, structural variants, and environmental factors, as well as multi-omics variations that affect the phenotypic variance of complex traits in a population, thus referred to as complex trait architecture. The concept of heritability, or the proportion of phenotypic variance due to genetic inheritance, has been studied for several decades, but its application is mainly in addressing the narrow sense heritability (or additive genetic component) from Genome-Wide Association Studies (GWAS). In this commentary, we reflect on our perspective on the complexity of understanding heritability for human traits in comparison to model organisms, highlighting another round of clues beyond GWAS and an alternative approach, investigating these clues comprehensively to help in elucidating the genetic architecture of complex traits.
Collapse
|
19
|
Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases. Am J Epidemiol 2017; 186:753-761. [PMID: 28978193 PMCID: PMC5860428 DOI: 10.1093/aje/kwx227] [Citation(s) in RCA: 106] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 03/14/2017] [Accepted: 03/16/2017] [Indexed: 12/25/2022] Open
Abstract
Recently, many new approaches, study designs, and statistical and analytical methods have emerged for studying gene-environment interactions (G×Es) in large-scale studies of human populations. There are opportunities in this field, particularly with respect to the incorporation of -omics and next-generation sequencing data and continual improvement in measures of environmental exposures implicated in complex disease outcomes. In a workshop called "Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases," held October 17-18, 2014, by the National Institute of Environmental Health Sciences and the National Cancer Institute in conjunction with the annual American Society of Human Genetics meeting, participants explored new approaches and tools that have been developed in recent years for G×E discovery. This paper highlights current and critical issues and themes in G×E research that need additional consideration, including the improved data analytical methods, environmental exposure assessment, and incorporation of functional data and annotations.
Collapse
|
20
|
A novel role for ciliary function in atopy: ADGRV1 and DNAH5 interactions. J Allergy Clin Immunol 2017; 141:1659-1667.e11. [PMID: 28927820 DOI: 10.1016/j.jaci.2017.06.050] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 05/30/2017] [Accepted: 06/21/2017] [Indexed: 12/30/2022]
Abstract
BACKGROUND Atopy, an endotype underlying allergic diseases, has a substantial genetic component. OBJECTIVE Our goal was to identify novel genes associated with atopy in asthma-ascertained families. METHODS We implemented a 3-step analysis strategy in 3 data sets: the Epidemiological Study on the Genetics and Environment of Asthma (EGEA) data set (1660 subjects), the Saguenay-Lac-Saint-Jean study data set (1138 subjects), and the Medical Research Council (MRC) data set (446 subjects). This strategy included a single nucleotide polymorphism (SNP) genome-wide association study (GWAS), the selection of related gene pairs based on statistical filtering of GWAS results, and text-mining filtering using Gene Relationships Across Implicated Loci and SNP-SNP interaction analysis of selected gene pairs. RESULTS We identified the 5q14 locus, harboring the adhesion G protein-coupled receptor V1 (ADGRV1) gene, which showed genome-wide significant association with atopy (rs4916831, meta-analysis P value = 6.8 × 10-9). Statistical filtering of GWAS results followed by text-mining filtering revealed relationships between ADGRV1 and 3 genes showing suggestive association with atopy (P ≤ 10-4). SNP-SNP interaction analysis between ADGRV1 and these 3 genes showed significant interaction between ADGRV1 rs17554723 and 2 correlated SNPs (rs2134256 and rs1354187) within the dynein axonemal heavy chain 5 (DNAH5) gene (Pmeta-int = 3.6 × 10-5 and 6.1 × 10-5, which met the multiple-testing corrected threshold of 7.3 × 10-5). Further conditional analysis indicated that rs2134256 alone accounted for the interaction signal with rs17554723. CONCLUSION Because both DNAH5 and ADGRV1 contribute to ciliary function, this study suggests that ciliary dysfunction might represent a novel mechanism underlying atopy. Combining GWAS and epistasis analysis driven by statistical and knowledge-based evidence represents a promising approach for identifying new genes involved in complex traits.
Collapse
|
21
|
Discovery and replication of SNP-SNP interactions for quantitative lipid traits in over 60,000 individuals. BioData Min 2017; 10:25. [PMID: 28770004 PMCID: PMC5525436 DOI: 10.1186/s13040-017-0145-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2017] [Accepted: 07/12/2017] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND The genetic etiology of human lipid quantitative traits is not fully elucidated, and interactions between variants may play a role. We performed a gene-centric interaction study for four different lipid traits: low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), total cholesterol (TC), and triglycerides (TG). RESULTS Our analysis consisted of a discovery phase using a merged dataset of five different cohorts (n = 12,853 to n = 16,849 depending on lipid phenotype) and a replication phase with ten independent cohorts totaling up to 36,938 additional samples. Filters are often applied before interaction testing to correct for the burden of testing all pairwise interactions. We used two different filters: 1. A filter that tested only single nucleotide polymorphisms (SNPs) with a main effect of p < 0.001 in a previous association study. 2. A filter that only tested interactions identified by Biofilter 2.0. Pairwise models that reached an interaction significance level of p < 0.001 in the discovery dataset were tested for replication. We identified thirteen SNP-SNP models that were significant in more than one replication cohort after accounting for multiple testing. CONCLUSIONS These results may reveal novel insights into the genetic etiology of lipid levels. Furthermore, we developed a pipeline to perform a computationally efficient interaction analysis with multi-cohort replication.
Collapse
|
22
|
Functional regression method for whole genome eQTL epistasis analysis with sequencing data. BMC Genomics 2017; 18:385. [PMID: 28521784 PMCID: PMC5436462 DOI: 10.1186/s12864-017-3777-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 05/09/2017] [Indexed: 12/02/2022] Open
Abstract
Background Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. Methods We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. Results By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. Conclusions The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3777-4) contains supplementary material, which is available to authorized users.
Collapse
|
23
|
Identifying gene-gene interactions that are highly associated with four quantitative lipid traits across multiple cohorts. Hum Genet 2016; 136:165-178. [PMID: 27848076 DOI: 10.1007/s00439-016-1738-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 10/07/2016] [Indexed: 10/20/2022]
Abstract
Genetic loci explain only 25-30 % of the heritability observed in plasma lipid traits. Epistasis, or gene-gene interactions may contribute to a portion of this missing heritability. Using the genetic data from five NHLBI cohorts of 24,837 individuals, we combined the use of the quantitative multifactor dimensionality reduction (QMDR) algorithm with two SNP-filtering methods to exhaustively search for SNP-SNP interactions that are associated with HDL cholesterol (HDL-C), LDL cholesterol (LDL-C), total cholesterol (TC) and triglycerides (TG). SNPs were filtered either on the strength of their independent effects (main effect filter) or the prior knowledge supporting a given interaction (Biofilter). After the main effect filter, QMDR identified 20 SNP-SNP models associated with HDL-C, 6 associated with LDL-C, 3 associated with TC, and 10 associated with TG (permutation P value <0.05). With the use of Biofilter, we identified 2 SNP-SNP models associated with HDL-C, 3 associated with LDL-C, 1 associated with TC and 8 associated with TG (permutation P value <0.05). In an independent dataset of 7502 individuals from the eMERGE network, we replicated 14 of the interactions identified after main effect filtering: 11 for HDL-C, 1 for LDL-C and 2 for TG. We also replicated 23 of the interactions found to be associated with TG after applying Biofilter. Prior knowledge supports the possible role of these interactions in the genetic etiology of lipid traits. This study also presents a computationally efficient pipeline for analyzing data from large genotyping arrays and detecting SNP-SNP interactions that are not primarily driven by strong main effects.
Collapse
|
24
|
A comprehensive genome-wide analysis of melanoma Breslow thickness identifies interaction between CDC42 and SCIN genetic variants. Int J Cancer 2016; 139:2012-20. [PMID: 27347659 PMCID: PMC5116391 DOI: 10.1002/ijc.30245] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 06/07/2016] [Indexed: 12/23/2022]
Abstract
Breslow thickness (BT) is a major prognostic factor of cutaneous melanoma (CM), the most fatal skin cancer. The genetic component of BT has only been explored by candidate gene studies with inconsistent results. Our objective was to uncover the genetic factors underlying BT using an hypothesis-free genome-wide approach. Our analysis strategy integrated a genome-wide association study (GWAS) of single nucleotide polymorphisms (SNPs) for BT followed by pathway analysis of GWAS outcomes using the gene-set enrichment analysis (GSEA) method and epistasis analysis within BT-associated pathways. This strategy was applied to two large CM datasets with Hapmap3-imputed SNP data: the French MELARISK study for discovery (966 cases) and the MD Anderson Cancer Center study (1,546 cases) for replication. While no marginal effect of individual SNPs was revealed through GWAS, three pathways, defined by gene ontology (GO) categories were significantly enriched in genes associated with BT (false discovery rate ≤5% in both studies): hormone activity, cytokine activity and myeloid cell differentiation. Epistasis analysis, within each significant GO, identified a statistically significant interaction between CDC42 and SCIN SNPs (pmeta-int =2.2 × 10(-6) , which met the overall multiple-testing corrected threshold of 2.5 × 10(-6) ). These two SNPs (and proxies) are strongly associated with CDC42 and SCIN gene expression levels and map to regulatory elements in skin cells. This interaction has important biological relevance since CDC42 and SCIN proteins have opposite effects in actin cytoskeleton organization and dynamics, a key mechanism underlying melanoma cell migration and invasion.
Collapse
|
25
|
Epistatic Gene-Based Interaction Analyses for Glaucoma in eMERGE and NEIGHBOR Consortium. PLoS Genet 2016; 12:e1006186. [PMID: 27623284 PMCID: PMC5021356 DOI: 10.1371/journal.pgen.1006186] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 06/22/2016] [Indexed: 12/22/2022] Open
Abstract
Primary open angle glaucoma (POAG) is a complex disease and is one of the major leading causes of blindness worldwide. Genome-wide association studies have successfully identified several common variants associated with glaucoma; however, most of these variants only explain a small proportion of the genetic risk. Apart from the standard approach to identify main effects of variants across the genome, it is believed that gene-gene interactions can help elucidate part of the missing heritability by allowing for the test of interactions between genetic variants to mimic the complex nature of biology. To explain the etiology of glaucoma, we first performed a genome-wide association study (GWAS) on glaucoma case-control samples obtained from electronic medical records (EMR) to establish the utility of EMR data in detecting non-spurious and relevant associations; this analysis was aimed at confirming already known associations with glaucoma and validating the EMR derived glaucoma phenotype. Our findings from GWAS suggest consistent evidence of several known associations in POAG. We then performed an interaction analysis for variants found to be marginally associated with glaucoma (SNPs with main effect p-value <0.01) and observed interesting findings in the electronic MEdical Records and GEnomics Network (eMERGE) network dataset. Genes from the top epistatic interactions from eMERGE data (Likelihood Ratio Test i.e. LRT p-value <1e-05) were then tested for replication in the NEIGHBOR consortium dataset. To replicate our findings, we performed a gene-based SNP-SNP interaction analysis in NEIGHBOR and observed significant gene-gene interactions (p-value <0.001) among the top 17 gene-gene models identified in the discovery phase. Variants from gene-gene interaction analysis that we found to be associated with POAG explain 3.5% of additional genetic variance in eMERGE dataset above what is explained by the SNPs in genes that are replicated from previous GWAS studies (which was only 2.1% variance explained in eMERGE dataset); in the NEIGHBOR dataset, adding replicated SNPs from gene-gene interaction analysis explain 3.4% of total variance whereas GWAS SNPs alone explain only 2.8% of variance. Exploring gene-gene interactions may provide additional insights into many complex traits when explored in properly designed and powered association studies. The complex nature of primary-open angle glaucoma (POAG) has left researchers exploring the genetic architecture and searching for the missing heritability using a number of different study designs. Over the past decade, many studies have been conducted to explain the etiology of POAG; however, a high proportion of estimated heritability still remains unexplained. GWA studies for POAG have identified significant associations but these associations have only explained a small proportion of the genetic risk (odds ratios range between 1–3). In this paper, we sought to confirm the primary genome-wide significant associations that have been discovered so far for glaucoma in phenotypes developed from EMR data in an effort to show that EMR data can be a powerful resource for finding genetic variants influencing POAG susceptibility. Next, we tested for statistical interactions, which can be presented as an important tool in an attempt to explain POAG heritability. We used a reduced list of variants filtered by marginal main effect analysis to look for epistatic interactions. We present our results from replication of gene-based interaction analyses performed in eMERGE and the NEIGHBOR consortium data. Using expression data and annotations from various publicly available databases, the most significant genes that replicated in our analyses show expression in the eye and trabecular meshwork. Analysis for estimation of genetic variance explained by significant associations from previous GWAS and replicated variants from gene-based interactions suggest that these explain 5.6% of variance in eMERGE dataset and also explain 3.4% variance in NEIGHBOR dataset.
Collapse
|
26
|
Identification of genetic interaction networks via an evolutionary algorithm evolved Bayesian network. BioData Min 2016; 9:18. [PMID: 27168765 PMCID: PMC4862166 DOI: 10.1186/s13040-016-0094-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 04/18/2016] [Indexed: 12/01/2022] Open
Abstract
Background The future of medicine is moving towards the phase of precision medicine, with the goal to prevent and treat diseases by taking inter-individual variability into account. A large part of the variability lies in our genetic makeup. With the fast paced improvement of high-throughput methods for genome sequencing, a tremendous amount of genetics data have already been generated. The next hurdle for precision medicine is to have sufficient computational tools for analyzing large sets of data. Genome-Wide Association Studies (GWAS) have been the primary method to assess the relationship between single nucleotide polymorphisms (SNPs) and disease traits. While GWAS is sufficient in finding individual SNPs with strong main effects, it does not capture potential interactions among multiple SNPs. In many traits, a large proportion of variation remain unexplained by using main effects alone, leaving the door open for exploring the role of genetic interactions. However, identifying genetic interactions in large-scale genomics data poses a challenge even for modern computing. Results For this study, we present a new algorithm, Grammatical Evolution Bayesian Network (GEBN) that utilizes Bayesian Networks to identify interactions in the data, and at the same time, uses an evolutionary algorithm to reduce the computational cost associated with network optimization. GEBN excelled in simulation studies where the data contained main effects and interaction effects. We also applied GEBN to a Type 2 diabetes (T2D) dataset obtained from the Marshfield Personalized Medicine Research Project (PMRP). We were able to identify genetic interactions for T2D cases and controls and use information from those interactions to classify T2D samples. We obtained an average testing area under the curve (AUC) of 86.8 %. We also identified several interacting genes such as INADL and LPP that are known to be associated with T2D. Conclusions Developing the computational tools to explore genetic associations beyond main effects remains a critically important challenge in human genetics. Methods, such as GEBN, demonstrate the utility of considering genetic interactions, as they likely explain some of the missing heritability.
Collapse
|
27
|
PHENOME-WIDE INTERACTION STUDY (PheWIS) IN AIDS CLINICAL TRIALS GROUP DATA (ACTG). PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:57-68. [PMID: 26776173 PMCID: PMC4722952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Association studies have shown and continue to show a substantial amount of success in identifying links between multiple single nucleotide polymorphisms (SNPs) and phenotypes. These studies are also believed to provide insights toward identification of new drug targets and therapies. Albeit of all the success, challenges still remain for applying and prioritizing these associations based on available biological knowledge. Along with single variant association analysis, genetic interactions also play an important role in uncovering the etiology and progression of complex traits. For gene-gene interaction analysis, selection of the variants to test for associations still poses a challenge in identifying epistatic interactions among the large list of variants available in high-throughput, genome-wide datasets. Therefore in this study, we propose a pipeline to identify interactions among genetic variants that are associated with multiple phenotypes by prioritizing previously published results from main effect association analysis (genome-wide and phenome-wide association analysis) based on a-priori biological knowledge in AIDS Clinical Trials Group (ACTG) data. We approached the prioritization and filtration of variants by using the results of a previously published single variant PheWAS and then utilizing biological information from the Roadmap Epigenome project. We removed variants in low functional activity regions based on chromatin states annotation and then conducted an exhaustive pairwise interaction search using linear regression analysis. We performed this analysis in two independent pre-treatment clinical trial datasets from ACTG to allow for both discovery and replication. Using a regression framework, we observed 50,798 associations that replicate at p-value 0.01 for 26 phenotypes, among which 2,176 associations for 212 unique SNPs for fasting blood glucose phenotype reach Bonferroni significance and an additional 9,970 interactions for high-density lipoprotein (HDL) phenotype and fasting blood glucose (total of 12,146 associations) reach FDR significance. We conclude that this method of prioritizing variants to look for epistatic interactions can be used extensively for generating hypotheses for genomewide and phenome-wide interaction analyses. This original Phenome-wide Interaction study (PheWIS) can be applied further to patients enrolled in randomized clinical trials to establish the relationship between patient's response to a particular drug therapy and non-linear combination of variants that might be affecting the outcome.
Collapse
|
28
|
Identifying gene-gene interactions that are highly associated with Body Mass Index using Quantitative Multifactor Dimensionality Reduction (QMDR). BioData Min 2015; 8:41. [PMID: 26674805 PMCID: PMC4678717 DOI: 10.1186/s13040-015-0074-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 12/04/2015] [Indexed: 11/22/2022] Open
Abstract
Background Despite heritability estimates of 40–70 % for obesity, less than 2 % of its variation is explained by Body Mass Index (BMI) associated loci that have been identified so far. Epistasis, or gene-gene interactions are a plausible source to explain portions of the missing heritability of BMI. Methods Using genotypic data from 18,686 individuals across five study cohorts – ARIC, CARDIA, FHS, CHS, MESA – we filtered SNPs (Single Nucleotide Polymorphisms) using two parallel approaches. SNPs were filtered either on the strength of their main effects of association with BMI, or on the number of knowledge sources supporting a specific SNP-SNP interaction in the context of BMI. Filtered SNPs were specifically analyzed for interactions that are highly associated with BMI using QMDR (Quantitative Multifactor Dimensionality Reduction). QMDR is a nonparametric, genetic model-free method that detects non-linear interactions associated with a quantitative trait. Results We identified seven novel, epistatic models with a Bonferroni corrected p-value of association < 0.1. Prior experimental evidence helps explain the plausible biological interactions highlighted within our results and their relationship with obesity. We identified interactions between genes involved in mitochondrial dysfunction (POLG2), cholesterol metabolism (SOAT2), lipid metabolism (CYP11B2), cell adhesion (EZR), cell proliferation (MAP2K5), and insulin resistance (IGF1R). Moreover, we found an 8.8 % increase in the variance in BMI explained by these seven SNP-SNP interactions, beyond what is explained by the main effects of an index FTO SNP and the SNPs within these interactions. We also replicated one of these interactions and 58 proxy SNP-SNP models representing it in an independent dataset from the eMERGE study. Conclusion This study highlights a novel approach for discovering gene-gene interactions by combining methods such as QMDR with traditional statistics. Electronic supplementary material The online version of this article (doi:10.1186/s13040-015-0074-0) contains supplementary material, which is available to authorized users.
Collapse
|
29
|
Functional interaction between COL4A1/COL4A2 and SMAD3 risk loci for coronary artery disease. Atherosclerosis 2015; 242:543-52. [DOI: 10.1016/j.atherosclerosis.2015.08.008] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Revised: 07/24/2015] [Accepted: 08/06/2015] [Indexed: 12/24/2022]
|
30
|
Prognostic and Predictive Values and Statistical Interactions in the Era of Targeted Treatment. Genet Epidemiol 2015; 39:509-17. [PMID: 26349638 DOI: 10.1002/gepi.21917] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 07/17/2015] [Indexed: 12/25/2022]
Abstract
The current era of targeted treatment has accelerated the interest in studying gene-treatment, gene-gene, and gene-environment interactions using statistical models in the health sciences. Interactions are incorporated into models as product terms of risk factors. The statistical significance of interactions is traditionally examined using a likelihood ratio test (LRT). Epidemiological and clinical studies also evaluate interactions in order to understand the prognostic and predictive values of genetic factors. However, it is not clear how different types and magnitudes of interaction effects are related to prognostic and predictive values. The contribution of interaction to prognostic values can be examined via improvements in the area under the receiver operating characteristic curve due to the inclusion of interaction terms in the model (ΔAUC). We develop a resampling based approach to test the significance of this improvement and show that it is equivalent to LRT. Predictive values provide insights into whether carriers of genetic factors benefit from specific treatment or preventive interventions relative to noncarriers, under some definition of treatment benefit. However, there is no unique definition of the term treatment benefit. We show that ΔAUC and relative excess risk due to interaction (RERI) measure predictive values under two specific definitions of treatment benefit. We investigate the properties of LRT, ΔAUC, and RERI using simulations. We illustrate these approaches using published melanoma data to understand the benefits of possible intervention on sun exposure in relation to the MC1R gene. The goal is to evaluate possible interventions on sun exposure in relation to MC1R.
Collapse
|
31
|
Review: High-performance computing to detect epistasis in genome scale data sets. Brief Bioinform 2015; 17:368-79. [PMID: 26272945 DOI: 10.1093/bib/bbv058] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Indexed: 11/14/2022] Open
Abstract
It is becoming clear that most human diseases have a complex etiology that cannot be explained by single nucleotide polymorphisms (SNPs) or simple additive combinations; the general consensus is that they are caused by combinations of multiple genetic variations. The limited success of some genome-wide association studies is partly a result of this focus on single genetic markers. A more promising approach is to take into account epistasis, by considering the association of multiple SNP interactions with disease. However, as genomic data continues to grow in resolution, and genome and exome sequencing become more established, the number of combinations of variants to consider increases rapidly. Two potential solutions should be considered: the use of high-performance computing, which allows us to consider a larger number of variables, and heuristics to make the solution more tractable, essential in the case of genome sequencing. In this review, we look at different computational methods to analyse epistatic interactions within disease-related genetic data sets created by microarray technology. We also review efforts to use epistatic analysis results to produce biomarkers for diagnostic tests and give our views on future directions in this field in light of advances in sequencing technology and variants in non-coding regions.
Collapse
|
32
|
Integrated pathway and epistasis analysis reveals interactive effect of genetic variants at TERF1 and AFAP1L2 loci on melanoma risk. Int J Cancer 2015; 137:1901-1909. [PMID: 25892537 DOI: 10.1002/ijc.29570] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 03/12/2015] [Accepted: 03/30/2015] [Indexed: 12/18/2022]
Abstract
Genome-wide association studies (GWASs) have characterized 13 loci associated with melanoma, which only account for a small part of melanoma risk. To identify new genes with too small an effect to be detected individually but which collectively influence melanoma risk and/or show interactive effects, we used a two-step analysis strategy including pathway analysis of genome-wide SNP data, in a first step, and epistasis analysis within significant pathways, in a second step. Pathway analysis, using the gene-set enrichment analysis (GSEA) approach and the gene ontology (GO) database, was applied to the outcomes of MELARISK (3,976 subjects) and MDACC (2,827 subjects) GWASs. Cross-gene SNP-SNP interaction analysis within melanoma-associated GOs was performed using the INTERSNP software. Five GO categories were significantly enriched in genes associated with melanoma (false discovery rate ≤ 5% in both studies): response to light stimulus, regulation of mitotic cell cycle, induction of programmed cell death, cytokine activity and oxidative phosphorylation. Epistasis analysis, within each of the five significant GOs, showed significant evidence for interaction for one SNP pair at TERF1 and AFAP1L2 loci (pmeta-int = 2.0 × 10(-7) , which met both the pathway and overall multiple-testing corrected thresholds that are equal to 9.8 × 10(-7) and 2.0 × 10(-7) , respectively) and suggestive evidence for another pair involving correlated SNPs at the same loci (pmeta-int = 3.6 × 10(-6) ). This interaction has important biological relevance given the key role of TERF1 in telomere biology and the reported physical interaction between TERF1 and AFAP1L2 proteins. This finding brings a novel piece of evidence for the emerging role of telomere dysfunction into melanoma development.
Collapse
|
33
|
Biology-Driven Gene-Gene Interaction Analysis of Age-Related Cataract in the eMERGE Network. Genet Epidemiol 2015; 39:376-84. [PMID: 25982363 PMCID: PMC4550090 DOI: 10.1002/gepi.21902] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 02/27/2015] [Accepted: 03/13/2015] [Indexed: 01/19/2023]
Abstract
Bioinformatics approaches to examine gene-gene models provide a means to discover interactions between multiple genes that underlie complex disease. Extensive computational demands and adjusting for multiple testing make uncovering genetic interactions a challenge. Here, we address these issues using our knowledge-driven filtering method, Biofilter, to identify putative single nucleotide polymorphism (SNP) interaction models for cataract susceptibility, thereby reducing the number of models for analysis. Models were evaluated in 3,377 European Americans (1,185 controls, 2,192 cases) from the Marshfield Clinic, a study site of the Electronic Medical Records and Genomics (eMERGE) Network, using logistic regression. All statistically significant models from the Marshfield Clinic were then evaluated in an independent dataset of 4,311 individuals (742 controls, 3,569 cases), using independent samples from additional study sites in the eMERGE Network: Mayo Clinic, Group Health/University of Washington, Vanderbilt University Medical Center, and Geisinger Health System. Eighty-three SNP-SNP models replicated in the independent dataset at likelihood ratio test P < 0.05. Among the most significant replicating models was rs12597188 (intron of CDH1)-rs11564445 (intron of CTNNB1). These genes are known to be involved in processes that include: cell-to-cell adhesion signaling, cell-cell junction organization, and cell-cell communication. Further Biofilter analysis of all replicating models revealed a number of common functions among the genes harboring the 83 replicating SNP-SNP models, which included signal transduction and PI3K-Akt signaling pathway. These findings demonstrate the utility of Biofilter as a biology-driven method, applicable for any genome-wide association study dataset.
Collapse
|
34
|
A cautionary note on the impact of protocol changes for genome-wide association SNP × SNP interaction studies: an example on ankylosing spondylitis. Hum Genet 2015; 134:761-73. [DOI: 10.1007/s00439-015-1560-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 04/26/2015] [Indexed: 12/11/2022]
|
35
|
The foundation of precision medicine: integration of electronic health records with genomics through basic, clinical, and translational research. Front Genet 2015; 6:104. [PMID: 25852745 PMCID: PMC4362332 DOI: 10.3389/fgene.2015.00104] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 02/27/2015] [Indexed: 12/30/2022] Open
|
36
|
Genome-wide genetic interaction analysis of glaucoma using expert knowledge derived from human phenotype networks. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2015; 20:207-18. [PMID: 25592582 PMCID: PMC4299930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The large volume of GWAS data poses great computational challenges for analyzing genetic interactions associated with common human diseases. We propose a computational framework for characterizing epistatic interactions among large sets of genetic attributes in GWAS data. We build the human phenotype network (HPN) and focus around a disease of interest. In this study, we use the GLAUGEN glaucoma GWAS dataset and apply the HPN as a biological knowledge-based filter to prioritize genetic variants. Then, we use the statistical epistasis network (SEN) to identify a significant connected network of pairwise epistatic interactions among the prioritized SNPs. These clearly highlight the complex genetic basis of glaucoma. Furthermore, we identify key SNPs by quantifying structural network characteristics. Through functional annotation of these key SNPs using Biofilter, a software accessing multiple publicly available human genetic data sources, we find supporting biomedical evidences linking glaucoma to an array of genetic diseases, proving our concept. We conclude by suggesting hypotheses for a better understanding of the disease.
Collapse
|
37
|
|
38
|
Genome-wide association studies of suicidal behaviors: a review. Eur Neuropsychopharmacol 2014; 24:1567-77. [PMID: 25219938 DOI: 10.1016/j.euroneuro.2014.08.006] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Revised: 07/24/2014] [Accepted: 08/10/2014] [Indexed: 11/17/2022]
Abstract
Suicidal behaviors represent a fatal dimension of mental ill-health, involving both environmental and heritable (genetic) influences. The putative genetic components of suicidal behaviors have until recent years been mainly investigated by hypothesis-driven research (of "candidate genes"). But technological progress in genotyping has opened the possibilities towards (hypothesis-generating) genomic screens and novel opportunities to explore polygenetic perspectives, now spanning a wide array of possible analyses falling under the term Genome-Wide Association Study (GWAS). Here we introduce and discuss broadly some apparent limitations but also certain developing opportunities of GWAS. We summarize the results from all the eight GWAS conducted up to date focused on suicidality outcomes; treatment emergent suicidal ideation (3 studies), suicide attempts (4 studies) and completed suicides (1 study). Clearly, there are few (if any) genome-wide significant and reproducible findings yet to be demonstrated. We then discuss and pinpoint certain future considerations in relation to sample sizes, the units of genetic associations used, study designs and outcome definitions, psychiatric diagnoses or biological measures, as well as the use of genomic sequencing. We conclude that GWAS should have a lot more potential to show in the case of suicidal outcomes, than what has yet been realized.
Collapse
|
39
|
Abstract
Genome-wide association studies (GWASs) have become the focus of the statistical analysis of complex traits in humans, successfully shedding light on several aspects of genetic architecture and biological aetiology. Single-nucleotide polymorphisms (SNPs) are usually modelled as having additive, cumulative and independent effects on the phenotype. Although evidently a useful approach, it is often argued that this is not a realistic biological model and that epistasis (that is, the statistical interaction between SNPs) should be included. The purpose of this Review is to summarize recent directions in methodology for detecting epistasis and to discuss evidence of the role of epistasis in human complex trait variation. We also discuss the relevance of epistasis in the context of GWASs and potential hazards in the interpretation of statistical interaction terms.
Collapse
|