151
|
Pattin KA, Moore JH. Role for protein-protein interaction databases in human genetics. Expert Rev Proteomics 2010; 6:647-59. [PMID: 19929610 DOI: 10.1586/epr.09.86] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Proteomics and the study of protein-protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein-protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein-protein interactions in human genetics and genetic epidemiology. Since protein-protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies.
Collapse
Affiliation(s)
- Kristine A Pattin
- Computational Genetics Laboratory and Department of Genetics, Dartmouth Medical School, Lebanon, NH, USA.
| | | |
Collapse
|
152
|
Prevalent positive epistasis in Escherichia coli and Saccharomyces cerevisiae metabolic networks. Nat Genet 2010; 42:272-6. [PMID: 20101242 PMCID: PMC2837480 DOI: 10.1038/ng.524] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2009] [Accepted: 12/18/2009] [Indexed: 11/30/2022]
Abstract
Epistasis refers to the interaction between genes. Although high-throughput epistasis data from model organisms are being generated and used to construct genetic networks1-3, to what extent genetic epistasis reflects biologically meaningful interactions remains unclear4-6. We address this question by in silico mapping of positive and negative epistatic interactions amongst biochemical reactions within the metabolic networks of E. coli and S. cerevisiae using flux balance analysis. We found that negative epistasis occurs mainly between nonessential reactions with overlapping functions, whereas positive epistasis usually involves essential reactions, is highly abundant, and surprisingly, often occurs between reactions without overlapping functions. We offered mechanistic explanations of these findings and experimentally validated them for 61 S. cerevisiae gene pairs.
Collapse
|
153
|
Abstract
Motivation: The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype–phenotype relationship that is characterized by significant heterogeneity and gene–gene and gene–environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods. Contact:jason.h.moore@dartmouth.edu
Collapse
Affiliation(s)
- Jason H Moore
- Department of Genetics, Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, NH 03756, USA.
| | | | | |
Collapse
|
154
|
Environmental Sensing of Expert Knowledge in a Computational Evolution System for Complex Problem Solving in Human Genetics. GENETIC PROGRAMMING THEORY AND PRACTICE VII 2010. [DOI: 10.1007/978-1-4419-1626-6_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
155
|
Moore JH. Detecting, characterizing, and interpreting nonlinear gene-gene interactions using multifactor dimensionality reduction. ADVANCES IN GENETICS 2010; 72:101-16. [PMID: 21029850 DOI: 10.1016/b978-0-12-380862-2.00005-9] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Human health is a complex process that is dependent on many genes, many environmental factors and chance events that are perhaps not measurable with current technology or are simply unknowable. Success in the design and execution of population-based association studies to identify those genetic and environmental factors that play an important role in human disease will depend on our ability to embrace, rather that ignore, complexity in the genotype to phenotype mapping relationship for any given human ecology. We review here three general computational challenges that must be addressed. First, data mining and machine learning methods are needed to model nonlinear interactions between multiple genetic and environmental factors. Second, filter and wrapper methods are needed to identify attribute interactions in large and complex solution landscapes. Third, visualization methods are needed to help interpret computational models and results. We provide here an overview of the multifactor dimensionality reduction (MDR) method that was developed for addressing each of these challenges.
Collapse
Affiliation(s)
- Jason H Moore
- Institute for Quantitative Biomedical Sciences, Departments of Genetics and Community and Family Medicine, Dartmouth Medical School, Lebanon, New Hampshire, USA
| |
Collapse
|
156
|
Bush WS, Haines J. Overview of linkage analysis in complex traits. CURRENT PROTOCOLS IN HUMAN GENETICS 2010; Chapter 1:Unit 1.9.1-18. [PMID: 20063263 DOI: 10.1002/0471142905.hg0109s64] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Linkage analysis is a well-established and powerful method for mapping disease genes. While linkage analysis has been most successful when applied to disorders with clear patterns of Mendelian inheritance, it can also be a useful technique for mapping susceptibility genes for common complex diseases. In this unit, we outline the key concepts of complex disease, and how linkage analysis for complex traits differs from simple Mendelian traits. Optimal genetic studies require careful study design, ascertainment strategy, and analysis methods. We describe how disease parameters such as prevalence, heritability estimates, and mode of inheritance should be considered before data is collected. Furthermore, we outline a general strategic approach for conducting linkage analysis of a complex disease, along with several design considerations that can optimize statistical power to detect disease loci and generally improve the quality of a study. Finally, we discuss the benefits and weaknesses of linkage analysis in contrast to genome-wide association studies.
Collapse
Affiliation(s)
- William S Bush
- Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | | |
Collapse
|
157
|
Günther F, Wawro N, Bammann K. Neural networks for modeling gene-gene interactions in association studies. BMC Genet 2009; 10:87. [PMID: 20030838 PMCID: PMC2817696 DOI: 10.1186/1471-2156-10-87] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2009] [Accepted: 12/23/2009] [Indexed: 01/17/2023] Open
Abstract
Background Our aim is to investigate the ability of neural networks to model different two-locus disease models. We conduct a simulation study to compare neural networks with two standard methods, namely logistic regression models and multifactor dimensionality reduction. One hundred data sets are generated for each of six two-locus disease models, which are considered in a low and in a high risk scenario. Two models represent independence, one is a multiplicative model, and three models are epistatic. For each data set, six neural networks (with up to five hidden neurons) and five logistic regression models (the null model, three main effect models, and the full model) with two different codings for the genotype information are fitted. Additionally, the multifactor dimensionality reduction approach is applied. Results The results show that neural networks are more successful in modeling the structure of the underlying disease model than logistic regression models in most of the investigated situations. In our simulation study, neither logistic regression nor multifactor dimensionality reduction are able to correctly identify biological interaction. Conclusions Neural networks are a promising tool to handle complex data situations. However, further research is necessary concerning the interpretation of their parameters.
Collapse
Affiliation(s)
- Frauke Günther
- University of Bremen, Bremen Institute for Prevention Research and Social Medicine, Linzer Strasse 10, 28359 Bremen, Germany.
| | | | | |
Collapse
|
158
|
Günther F, Wawro N, Bammann K. Neural networks for modeling gene-gene interactions in association studies. BMC Genet 2009. [PMID: 20030838 DOI: 10.1186/1471‐2156‐10‐87] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Our aim is to investigate the ability of neural networks to model different two-locus disease models. We conduct a simulation study to compare neural networks with two standard methods, namely logistic regression models and multifactor dimensionality reduction. One hundred data sets are generated for each of six two-locus disease models, which are considered in a low and in a high risk scenario. Two models represent independence, one is a multiplicative model, and three models are epistatic. For each data set, six neural networks (with up to five hidden neurons) and five logistic regression models (the null model, three main effect models, and the full model) with two different codings for the genotype information are fitted. Additionally, the multifactor dimensionality reduction approach is applied. RESULTS The results show that neural networks are more successful in modeling the structure of the underlying disease model than logistic regression models in most of the investigated situations. In our simulation study, neither logistic regression nor multifactor dimensionality reduction are able to correctly identify biological interaction. CONCLUSIONS Neural networks are a promising tool to handle complex data situations. However, further research is necessary concerning the interpretation of their parameters.
Collapse
Affiliation(s)
- Frauke Günther
- University of Bremen, Bremen Institute for Prevention Research and Social Medicine, Linzer Strasse 10, 28359 Bremen, Germany.
| | | | | |
Collapse
|
159
|
Pattin KA, Moore JH. Genome-wide association studies for the identification of biomarkers in metabolic diseases. ACTA ACUST UNITED AC 2009; 4:39-51. [DOI: 10.1517/17530050903322245] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
160
|
He H, Oetting WS, Brott MJ, Basu S. Power of multifactor dimensionality reduction and penalized logistic regression for detecting gene-gene interaction in a case-control study. BMC MEDICAL GENETICS 2009; 10:127. [PMID: 19961594 PMCID: PMC2800840 DOI: 10.1186/1471-2350-10-127] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2009] [Accepted: 12/04/2009] [Indexed: 11/13/2022]
Abstract
BACKGROUND There is a growing awareness that interaction between multiple genes play an important role in the risk of common, complex multi-factorial diseases. Many common diseases are affected by certain genotype combinations (associated with some genes and their interactions). The identification and characterization of these susceptibility genes and gene-gene interaction have been limited by small sample size and large number of potential interactions between genes. Several methods have been proposed to detect gene-gene interaction in a case control study. The penalized logistic regression (PLR), a variant of logistic regression with L2 regularization, is a parametric approach to detect gene-gene interaction. On the other hand, the Multifactor Dimensionality Reduction (MDR) is a nonparametric and genetic model-free approach to detect genotype combinations associated with disease risk. METHODS We compared the power of MDR and PLR for detecting two-way and three-way interactions in a case-control study through extensive simulations. We generated several interaction models with different magnitudes of interaction effect. For each model, we simulated 100 datasets, each with 200 cases and 200 controls and 20 SNPs. We considered a wide variety of models such as models with just main effects, models with only interaction effects or models with both main and interaction effects. We also compared the performance of MDR and PLR to detect gene-gene interaction associated with acute rejection(AR) in kidney transplant patients. RESULTS In this paper, we have studied the power of MDR and PLR for detecting gene-gene interaction in a case-control study through extensive simulation. We have compared their performances for different two-way and three-way interaction models. We have studied the effect of different allele frequencies on these methods. We have also implemented their performance on a real dataset. As expected, none of these methods were consistently better for all data scenarios, but, generally MDR outperformed PLR for more complex models. The ROC analysis on the real dataset suggests that MDR outperforms PLR in detecting gene-gene interaction on the real dataset. CONCLUSION As one might expect, the relative success of each method is context dependent. This study demonstrates the strengths and weaknesses of the methods to detect gene-gene interaction.
Collapse
Affiliation(s)
- Hua He
- Division of Biostatistics, School of Public Health, University of Minnesota, Minnesota, USA
| | - William S Oetting
- Department of experimental and clinical pharmacology, College of Pharmacy and Institute of Human Genetics, University of Minnesota, Minnesota, USA
| | - Marcia J Brott
- Department of experimental and clinical pharmacology, College of Pharmacy and Institute of Human Genetics, University of Minnesota, Minnesota, USA
| | - Saonli Basu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minnesota, USA
| |
Collapse
|
161
|
A developmental systems perspective on epistasis: computational exploration of mutational interactions in model developmental regulatory networks. PLoS One 2009; 4:e6823. [PMID: 19738908 PMCID: PMC2734181 DOI: 10.1371/journal.pone.0006823] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2009] [Accepted: 07/31/2009] [Indexed: 11/19/2022] Open
Abstract
The way in which the information contained in genotypes is translated into complex phenotypic traits (i.e. embryonic expression patterns) depends on its decoding by a multilayered hierarchy of biomolecular systems (regulatory networks). Each layer of this hierarchy displays its own regulatory schemes (i.e. operational rules such as +/− feedback) and associated control parameters, resulting in characteristic variational constraints. This process can be conceptualized as a mapping issue, and in the context of highly-dimensional genotype-phenotype mappings (GPMs) epistatic events have been shown to be ubiquitous, manifested in non-linear correspondences between changes in the genotype and their phenotypic effects. In this study I concentrate on epistatic phenomena pervading levels of biological organization above the genetic material, more specifically the realm of molecular networks. At this level, systems approaches to studying GPMs are specially suitable to shed light on the mechanistic basis of epistatic phenomena. To this aim, I constructed and analyzed ensembles of highly-modular (fully interconnected) networks with distinctive topologies, each displaying dynamic behaviors that were categorized as either arbitrary or functional according to early patterning processes in the Drosophila embryo. Spatio-temporal expression trajectories in virtual syncytial embryos were simulated via reaction-diffusion models. My in silico mutational experiments show that: 1) the average fitness decay tendency to successively accumulated mutations in ensembles of functional networks indicates the prevalence of positive epistasis, whereas in ensembles of arbitrary networks negative epistasis is the dominant tendency; and 2) the evaluation of epistatic coefficients of diverse interaction orders indicates that, both positive and negative epistasis are more prevalent in functional networks than in arbitrary ones. Overall, I conclude that the phenotypic and fitness effects of multiple perturbations are strongly conditioned by both the regulatory architecture (i.e. pattern of coupled feedback structures) and the dynamic nature of the spatio-temporal expression trajectories displayed by the simulated networks.
Collapse
|
162
|
Turner SD, Crawford DC, Ritchie MD. Methods for optimizing statistical analyses in pharmacogenomics research. Expert Rev Clin Pharmacol 2009; 2:559-570. [PMID: 20221410 PMCID: PMC2835152 DOI: 10.1586/ecp.09.32] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Pharmacogenomics is a rapidly developing sector of human genetics research with arguably the highest potential for immediate benefit. There is a considerable body of evidence demonstrating that variability in drug-treatment response can be explained in part by genetic variation. Subsequently, much research has ensued and is ongoing to identify genetic variants associated with drug-response phenotypes. To reap the full benefits of the data we collect we must give careful consideration to the study population under investigation, the phenotype being examined and the statistical methodology used in data analysis. Here, we discuss principles of study design and optimizing statistical methods for pharmacogenomic studies when the outcome of interest is a continuous measure. We review traditional hypothesis testing procedures, as well as novel approaches that may be capable of accounting for more variance in a quantitative pharmacogenomic trait. We give examples of studies that have employed the analytical methodologies discussed here, as well as resources for acquiring software to run the analyses.
Collapse
Affiliation(s)
- Stephen D Turner
- Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville TN, 37232, USA, Tel.: +1 615 343 6549, Fax: +1 615 322 6974,
| | - Dana C Crawford
- Center for Human Genetics Research, Assistant Professor, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville TN, 37232, USA, Tel.: +1 615 343 7852, Fax: +1 615 322 6974,
| | - Marylyn D Ritchie
- Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville TN, 37232, USA, Tel.: +1 615 343 5851, Fax: +1 615 322 6974,
| |
Collapse
|
163
|
Combarros O, Cortina-Borja M, Smith AD, Lehmann DJ. Epistasis in sporadic Alzheimer's disease. Neurobiol Aging 2009; 30:1333-49. [PMID: 18206267 DOI: 10.1016/j.neurobiolaging.2007.11.027] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2007] [Revised: 11/30/2007] [Accepted: 11/30/2007] [Indexed: 10/22/2022]
|
164
|
Epistasis and its implications for personal genetics. Am J Hum Genet 2009; 85:309-20. [PMID: 19733727 DOI: 10.1016/j.ajhg.2009.08.006] [Citation(s) in RCA: 250] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2009] [Revised: 07/31/2009] [Accepted: 08/10/2009] [Indexed: 12/22/2022] Open
Abstract
The widespread availability of high-throughput genotyping technology has opened the door to the era of personal genetics, which brings to consumers the promise of using genetic variations to predict individual susceptibility to common diseases. Despite easy access to commercial personal genetics services, our knowledge of the genetic architecture of common diseases is still very limited and has not yet fulfilled the promise of accurately predicting most people at risk. This is partly because of the complexity of the mapping relationship between genotype and phenotype that is a consequence of epistasis (gene-gene interaction) and other phenomena such as gene-environment interaction and locus heterogeneity. Unfortunately, these aspects of genetic architecture have not been addressed in most of the genetic association studies that provide the knowledge base for interpreting large-scale genetic association results. We provide here an introductory review of how epistasis can affect human health and disease and how it can be detected in population-based studies. We provide some thoughts on the implications of epistasis for personal genetics and some recommendations for improving personal genetics in light of this complexity.
Collapse
|
165
|
Combarros O, van Duijn CM, Hammond N, Belbin O, Arias-Vásquez A, Cortina-Borja M, Lehmann MG, Aulchenko YS, Schuur M, Kölsch H, Heun R, Wilcock GK, Brown K, Kehoe PG, Harrison R, Coto E, Alvarez V, Deloukas P, Mateo I, Gwilliam R, Morgan K, Warden DR, Smith AD, Lehmann DJ. Replication by the Epistasis Project of the interaction between the genes for IL-6 and IL-10 in the risk of Alzheimer's disease. J Neuroinflammation 2009; 6:22. [PMID: 19698145 PMCID: PMC2744667 DOI: 10.1186/1742-2094-6-22] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2009] [Accepted: 08/23/2009] [Indexed: 11/10/2022] Open
Abstract
Background Chronic inflammation is a characteristic of Alzheimer's disease (AD). An interaction associated with the risk of AD has been reported between polymorphisms in the regulatory regions of the genes for the pro-inflammatory cytokine, interleukin-6 (IL-6, gene: IL6), and the anti-inflammatory cytokine, interleukin-10 (IL-10, gene: IL10). Methods We examined this interaction in the Epistasis Project, a collaboration of 7 AD research groups, contributing DNA samples from 1,757 cases of AD and 6,295 controls. Results We replicated the interaction. For IL6 rs2069837 AA × IL10 rs1800871 CC, the synergy factor (SF) was 1.63 (95% confidence interval: 1.10–2.41, p = 0.01), controlling for centre, age, gender and apolipoprotein E ε4 (APOEε4) genotype. Our results are consistent between North Europe (SF = 1.7, p = 0.03) and North Spain (SF = 2.0, p = 0.09). Further replication may require a meta-analysis. However, association due to linkage disequilibrium with other polymorphisms in the regulatory regions of these genes cannot be excluded. Conclusion We suggest that dysregulation of both IL-6 and IL-10 in some elderly people, due in part to genetic variations in the two genes, contributes to the development of AD. Thus, inflammation facilitates the onset of sporadic AD.
Collapse
Affiliation(s)
- Onofre Combarros
- Neurology Service and Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas, Marqués de Valdecilla University Hospital (University of Cantabria), 39008 Santander, Spain.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
166
|
Greene CS, Penrod NM, Williams SM, Moore JH. Failure to replicate a genetic association may provide important clues about genetic architecture. PLoS One 2009; 4:e5639. [PMID: 19503614 PMCID: PMC2685469 DOI: 10.1371/journal.pone.0005639] [Citation(s) in RCA: 206] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2009] [Accepted: 05/01/2009] [Indexed: 11/18/2022] Open
Abstract
Replication has become the gold standard for assessing statistical results from genome-wide association studies. Unfortunately this replication requirement may cause real genetic effects to be missed. A real result can fail to replicate for numerous reasons including inadequate sample size or variability in phenotype definitions across independent samples. In genome-wide association studies the allele frequencies of polymorphisms may differ due to sampling error or population differences. We hypothesize that some statistically significant independent genetic effects may fail to replicate in an independent dataset when allele frequencies differ and the functional polymorphism interacts with one or more other functional polymorphisms. To test this hypothesis, we designed a simulation study in which case-control status was determined by two interacting polymorphisms with heritabilities ranging from 0.025 to 0.4 with replication sample sizes ranging from 400 to 1600 individuals. We show that the power to replicate the statistically significant independent main effect of one polymorphism can drop dramatically with a change of allele frequency of less than 0.1 at a second interacting polymorphism. We also show that differences in allele frequency can result in a reversal of allelic effects where a protective allele becomes a risk factor in replication studies. These results suggest that failure to replicate an independent genetic effect may provide important clues about the complexity of the underlying genetic architecture. We recommend that polymorphisms that fail to replicate be checked for interactions with other polymorphisms, particularly when samples are collected from groups with distinct ethnic backgrounds or different geographic regions.
Collapse
Affiliation(s)
- Casey S. Greene
- Department of Genetics, Dartmouth College, Lebanon, New Hampshire, United States of America
| | - Nadia M. Penrod
- Department of Genetics, Dartmouth College, Lebanon, New Hampshire, United States of America
| | - Scott M. Williams
- Vanderbilt University, Center for Human Genetics, Nashville, Tennessee, United States of America
| | - Jason H. Moore
- Department of Genetics, Dartmouth College, Lebanon, New Hampshire, United States of America
- Vanderbilt University, Center for Human Genetics, Nashville, Tennessee, United States of America
- Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, New Hampshire, United States of America
- Department of Computer Science, University of New Hampshire, Lebanon, New Hampshire, United States of America
- Department of Computer Science, University of Vermont, Burlington, Vermont, United States of America
- Translational Genomics Research Institute, Phoenix, Arizona, United States of America
- * E-mail:
| |
Collapse
|
167
|
Abstract
Following the identification of several disease-associated polymorphisms by genome-wide association (GWA) analysis, interest is now focusing on the detection of effects that, owing to their interaction with other genetic or environmental factors, might not be identified by using standard single-locus tests. In addition to increasing the power to detect associations, it is hoped that detecting interactions between loci will allow us to elucidate the biological and biochemical pathways that underpin disease. Here I provide a critical survey of the methods and related software packages currently used to detect the interactions between genetic loci that contribute to human genetic disease. I also discuss the difficulties in determining the biological relevance of statistical interactions.
Collapse
Affiliation(s)
- Heather J Cordell
- Institute of Human Genetics, Newcastle University, International Centre for Life, Central Parkway, Newcastle upon Tyne NE1 3BZ, UK.
| |
Collapse
|
168
|
Kumar A, Ghosh B. Genetics of asthma: a molecular biologist perspective. Clin Mol Allergy 2009; 7:7. [PMID: 19419542 PMCID: PMC2684737 DOI: 10.1186/1476-7961-7-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2008] [Accepted: 05/06/2009] [Indexed: 12/30/2022] Open
Abstract
Asthma belongs to the category of classical allergic diseases which generally arise due to IgE mediated hypersensitivity to environmental triggers. Since its prevalence is very high in developed or urbanized societies it is also referred to as "disease of civilizations". Due to its increased prevalence among related individuals, it was understood quite long back that it is a genetic disorder. Well designed epidemiological studies reinforced these views. The advent of modern biological technology saw further refinements in our understanding of genetics of asthma and led to the realization that asthma is not a disorder with simple Mendelian mode of inheritance but a multifactorial disorder of the airways brought about by complex interaction between genetic and environmental factors. Current asthma research has witnessed evidences that are compelling researchers to redefine asthma altogether. Although no consensus exists among workers regarding its definition, it seems obvious that several pathologies, all affecting the airways, have been clubbed into one common category called asthma. Needless to say, genetic studies have led from the front in bringing about these transformations. Genomics, molecular biology, immunology and other interrelated disciplines have unearthed data that has changed the way we think about asthma now. In this review, we center our discussions on genetic basis of asthma; the molecular mechanisms involved in its pathogenesis. Taking cue from the existing data we would briefly ponder over the future directions that should improve our understanding of asthma pathogenesis.
Collapse
Affiliation(s)
- Amrendra Kumar
- Molecular Immunogenetics Laboratory, Institute of Genomics and Integrative Biology Mall Road, Delhi-110007, India.
| | | |
Collapse
|
169
|
Tang W, Wu X, Jiang R, Li Y. Epistatic module detection for case-control studies: a Bayesian model with a Gibbs sampling strategy. PLoS Genet 2009; 5:e1000464. [PMID: 19412524 PMCID: PMC2669883 DOI: 10.1371/journal.pgen.1000464] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2008] [Accepted: 03/30/2009] [Indexed: 12/16/2022] Open
Abstract
The detection of epistatic interactive effects of multiple genetic variants on the susceptibility of human complex diseases is a great challenge in genome-wide association studies (GWAS). Although methods have been proposed to identify such interactions, the lack of an explicit definition of epistatic effects, together with computational difficulties, makes the development of new methods indispensable. In this paper, we introduce epistatic modules to describe epistatic interactive effects of multiple loci on diseases. On the basis of this notion, we put forward a Bayesian marker partition model to explain observed case-control data, and we develop a Gibbs sampling strategy to facilitate the detection of epistatic modules. Comparisons of the proposed approach with three existing methods on seven simulated disease models demonstrate the superior performance of our approach. When applied to a genome-wide case-control data set for Age-related Macular Degeneration (AMD), the proposed approach successfully identifies two known susceptible loci and suggests that a combination of two other loci -- one in the gene SGCD and the other in SCAPER -- is associated with the disease. Further functional analysis supports the speculation that the interaction of these two genetic variants may be responsible for the susceptibility of AMD. When applied to a genome-wide case-control data set for Parkinson's disease, the proposed method identifies seven suspicious loci that may contribute independently to the disease.
Collapse
Affiliation(s)
- Wanwan Tang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, China
| | - Xuebing Wu
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, China
- * E-mail: (RJ); (YL)
| | - Yanda Li
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, China
- * E-mail: (RJ); (YL)
| |
Collapse
|
170
|
Calle ML, Urrea V, Vellalta G, Malats N, Steen KV. Improving strategies for detecting genetic patterns of disease susceptibility in association studies. Stat Med 2009; 27:6532-46. [PMID: 18837071 DOI: 10.1002/sim.3431] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The analysis of gene interactions and epistatic patterns of susceptibility is especially important for investigating complex diseases such as cancer characterized by the joint action of several genes. This work is motivated by a case-control study of bladder cancer, aimed at evaluating the role of both genetic and environmental factors in bladder carcinogenesis. In particular, the analysis of the inflammation pathway is of interest, for which information on a total of 282 SNPs in 108 genes involved in the inflammatory response is available. Detecting and interpreting interactions with such a large number of polymorphisms is a great challenge from both the statistical and the computational perspectives. In this paper we propose a two-stage strategy for identifying relevant interactions: (1) the use of a synergy measure among interacting genes and (2) the use of the model-based multifactor dimensionality reduction method (MB-MDR), a model-based version of the MDR method, which allows adjustment for confounders.
Collapse
Affiliation(s)
- M L Calle
- Department of Systems Biology, Universitat de Vic, Carrer de la Sagrada Família, 7-08500 Vic, Spain.
| | | | | | | | | |
Collapse
|
171
|
Hardstone MC, Leichter CA, Scott JG. Multiplicative interaction between the two major mechanisms of permethrin resistance, kdr and cytochrome P450-monooxygenase detoxification, in mosquitoes. J Evol Biol 2009; 22:416-23. [PMID: 19196389 DOI: 10.1111/j.1420-9101.2008.01661.x] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Epistasis is the nonadditive interaction between different loci which contribute to a phenotype. Epistasis between independent loci conferring insecticide resistance is important to investigate as this phenomenon can shape the rate that resistance evolves and can dictate the level of resistance in the field. The evolution of insecticide resistance in mosquitoes is a growing and world-wide problem. The two major mechanisms that confer resistance to permethrin in Culex mosquitoes are target site insensitivity (i.e. kdr) and enhanced detoxification by cytochrome P450 monooxygenases. Using three strains of mosquitoes, and crosses between these strains, we assessed the relative contribution of the two independent loci conferring permethrin resistance, individually and when present together. We found that for all genotype combinations tested, Culex pipiens quinquefasciatus exhibited multiplicative interactions between kdr and P450 detoxification, whether the resistance alleles were homozygous or heterozygous. These results provide a basis for further analysis of the evolution and maintenance of insecticide resistance in mosquitoes.
Collapse
Affiliation(s)
- M C Hardstone
- Department of Entomology, Comstock Hall, Cornell University, Ithaca, NY, USA
| | | | | |
Collapse
|
172
|
Tyler AL, Asselbergs FW, Williams SM, Moore JH. Shadows of complexity: what biological networks reveal about epistasis and pleiotropy. Bioessays 2009; 31:220-7. [PMID: 19204994 PMCID: PMC3159922 DOI: 10.1002/bies.200800022] [Citation(s) in RCA: 123] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Pleiotropy, in which one mutation causes multiple phenotypes, has traditionally been seen as a deviation from the conventional observation in which one gene affects one phenotype. Epistasis, or gene-gene interaction, has also been treated as an exception to the Mendelian one gene-one phenotype paradigm. This simplified perspective belies the pervasive complexity of biology and hinders progress toward a deeper understanding of biological systems. We assert that epistasis and pleiotropy are not isolated occurrences, but ubiquitous and inherent properties of biomolecular networks. These phenomena should not be treated as exceptions, but rather as fundamental components of genetic analyses. A systems level understanding of epistasis and pleiotropy is, therefore, critical to furthering our understanding of human genetics and its contribution to common human disease. Finally, graph theory offers an intuitive and powerful set of tools with which to study the network bases of these important genetic phenomena.
Collapse
Affiliation(s)
- Anna L. Tyler
- Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, NH, USA
| | - Folkert W. Asselbergs
- Department of Cardiology, University Medical Center Groningen, Groningen, The Netherlands
| | - Scott M. Williams
- Center for Human Genetics Research, Department of Medicine, Department of Molecular Physiology and Biophysics, Vanderbilt University Medical School, Nashville, TN, USA
| | - Jason H. Moore
- Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, NH, USA
- Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, NH, USA, Department of Computer Science, University of Vermont, Burlington, Vermont, USA, Department of Computer Science, University of New Hampshire, Durham, NH, USA, Translational Genomics Research Institute, Phoenix, AZ, USA
| |
Collapse
|
173
|
Woo D, Khoury J, Haverbusch MM, Sekar P, Flaherty ML, Kleindorfer DO, Kissela BM, Moomaw CJ, Deka R, Broderick JP. Smoking and family history and risk of aneurysmal subarachnoid hemorrhage. Neurology 2009; 72:69-72. [PMID: 19122033 PMCID: PMC2656254 DOI: 10.1212/01.wnl.0000338567.90260.46] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
OBJECTIVE Smoking and family history of aneurysmal subarachnoid hemorrhage (aSAH) are independent risk factors for aSAH. Using a population-based case-control study of hemorrhagic stroke, we hypothesized that having both a first-degree relative with a brain aneurysm or SAH (+FH) and current smoking interact to increase the risk of aSAH. METHODS Cases of aneurysmal SAH were prospectively recruited from all 17 hospitals in the five-county region around the University of Cincinnati. Controls were identified by random digit dialing. Controls were matched to cases of aSAH by age (+/-5 years), race, and sex. Conditional multiple logistic regression was used to identify independent risk factors. For deviation from the additive model, the interaction constant ratio test was used. RESULTS A total of 339 cases of aSAH were matched to 1,016 controls. Compared to current nonsmokers with no first-degree relatives with aSAH (-FH), the odds ratio (OR) for aSAH for current nonsmokers with +FH was 2.5 (95% confidence interval [CI] 0.9-6.9); for current smokers with -FH, OR = 3.1 (95% CI 2.2-4.4); and for current smokers with +FH, OR = 6.4 (95% CI 3.1-13. 2). The interaction constant ratio, which measured the deviation from the additive model, was significant: 2.19 (95% CI 0.80-5.99). The lower bound of the 95% CI >0.5 signifies a departure from the additive model. CONCLUSION Evidence of a gene-environment interaction with smoking exists for aneurysmal subarachnoid hemorrhage. This finding is important to counseling family members and for screening of intracranial aneurysm (IA) as well as the design and interpretation of genetic epidemiology of IA studies.
Collapse
Affiliation(s)
- D Woo
- Department of Neurology, University of Cincinnati College of Medicine, 260 Stetson Street ML 0525, Cincinnati, OH 45267-0525, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
174
|
Singmann P, Baumert J, Herder C, Meisinger C, Holzapfel C, Klopp N, Wichmann HE, Klingenspor M, Rathmann W, The KORA group, Illig T, Grallert H. Gene-gene interaction between APOA5 and USF1: two candidate genes for the metabolic syndrome. Obes Facts 2009; 2:235-42. [PMID: 20054229 PMCID: PMC2919429 DOI: 10.1159/000227288] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE The metabolic syndrome, a major cluster of risk factors for cardiovascular diseases, shows increasing prevalence worldwide. Several studies have established associations of both apolipoprotein A5 (APOA5) gene variants and upstream stimulatory factor 1 (USF1) gene variants with blood lipid levels and metabolic syndrome. USF1 is a transcription factor for APOA5. METHODS We investigated a possible interaction between these two genes on the risk for the metabolic syndrome, using data from the German population-based KORA survey 4 (1,622 men and women aged 55-74 years). Seven APOA5 single nucleotide polymorphisms (SNPs) were analyzed in combination with six USF1 SNPs, applying logistic regression in an additive model adjusting for age and sex and the definition for metabolic syndrome from the National Cholesterol Education Program's Adult Treatment Panel III (NCEP (AIII)) including medication. RESULTS The overall prevalence for metabolic syndrome was 41%. Two SNP combinations showed a nominal gene-gene interaction (p values 0.024 and 0.047). The effect of one SNP was modified by the other SNP, with a lower risk for the metabolic syndrome with odds ratios (ORs) between 0.33 (95% CI = 0.13-0.83) and 0.40 (95% CI = 0.15-1.12) when the other SNP was homozygous for the minor allele. Nevertheless, none of the associations remained significant after correction for multiple testing. CONCLUSION Thus, there is an indication of an interaction between APOA5 and USF1 on the risk for metabolic syndrome.
Collapse
Affiliation(s)
- Paula Singmann
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | - Jens Baumert
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | - Christian Herder
- Insitute for Clinical Diabetes Research, German Diabetes Center, Leipniz Institute at Heinrich-Heine-University, Düsseldorf, Germany
| | - Christa Meisinger
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | - Christina Holzapfel
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Neuherberg, Germany
- Else Kröner-Fresenius Center for Nutritional Medicine, Technical University of Munich, Germany
| | - Norman Klopp
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| | - H.-Erich Wichmann
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Neuherberg, Germany
- Chair of Epidemiology, IBE, Ludwig-Maximilians-University Munich, Germany
| | - Martin Klingenspor
- Molecular Nutrional Medicine, Else Kröner-Fresenius Center at Technical University of Munich, Germany
| | - Wolfgang Rathmann
- Institute of Biometrics and Epidemiology, German Diabetes Center, Leipniz Institute at Heinrich-Heine-University, Düsseldorf, Germany
| | | | - Thomas Illig
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Neuherberg, Germany
- *Dr. Thomas Illig, Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental, Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany,
| | - Harald Grallert
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Neuherberg, Germany
| |
Collapse
|
175
|
Pattin KA, White BC, Barney N, Gui J, Nelson HH, Kelsey KR, Andrew AS, Karagas MR, Moore JH. A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction. Genet Epidemiol 2009; 33:87-94. [PMID: 18671250 PMCID: PMC2700860 DOI: 10.1002/gepi.20360] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free data mining method for detecting, characterizing, and interpreting epistasis in the absence of significant main effects in genetic and epidemiologic studies of complex traits such as disease susceptibility. The goal of MDR is to change the representation of the data using a constructive induction algorithm to make nonadditive interactions easier to detect using any classification method such as naïve Bayes or logistic regression. Traditionally, MDR constructed variables have been evaluated with a naïve Bayes classifier that is combined with 10-fold cross validation to obtain an estimate of predictive accuracy or generalizability of epistasis models. Traditionally, we have used permutation testing to statistically evaluate the significance of models obtained through MDR. The advantage of permutation testing is that it controls for false positives due to multiple testing. The disadvantage is that permutation testing is computationally expensive. This is an important issue that arises in the context of detecting epistasis on a genome-wide scale. The goal of the present study was to develop and evaluate several alternatives to large-scale permutation testing for assessing the statistical significance of MDR models. Using data simulated from 70 different epistasis models, we compared the power and type I error rate of MDR using a 1,000-fold permutation test with hypothesis testing using an extreme value distribution (EVD). We find that this new hypothesis testing method provides a reasonable alternative to the computationally expensive 1,000-fold permutation test and is 50 times faster. We then demonstrate this new method by applying it to a genetic epidemiology study of bladder cancer susceptibility that was previously analyzed using MDR and assessed using a 1,000-fold permutation test.
Collapse
Affiliation(s)
- Kristine A. Pattin
- Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, NH
| | - Bill C. White
- Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, NH
| | - Nate Barney
- Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, NH
| | - Jiang Gui
- Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, NH
- Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, NH
| | - Heather H. Nelson
- Department of Environmental Health, Harvard School of Public Health, Boston, MA
| | - Karl R. Kelsey
- Department of Community Health, Brown University, Providence, RI
| | - Angeline S. Andrew
- Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, NH
- Norris-Cotton Cancer Center, Dartmouth Medical School, Lebanon, NH
| | - Margaret R. Karagas
- Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, NH
- Norris-Cotton Cancer Center, Dartmouth Medical School, Lebanon, NH
| | - Jason H. Moore
- Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, NH
- Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, NH
- Norris-Cotton Cancer Center, Dartmouth Medical School, Lebanon, NH
- Department of Computer Science, University of New Hampshire, Durham, NH, Department of Computer Science, University of Vermont, Burlington, VT, Translational Genomics Research Institute, Phoenix, AZ
| |
Collapse
|
176
|
Edwards TL, Lewis K, Velez DR, Dudek S, Ritchie MD. Exploring the performance of Multifactor Dimensionality Reduction in large scale SNP studies and in the presence of genetic heterogeneity among epistatic disease models. Hum Hered 2008; 67:183-92. [PMID: 19077437 DOI: 10.1159/000181157] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2008] [Accepted: 07/01/2008] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND/AIMS In genetic studies of complex disease a consideration for the investigator is detection of joint effects. The Multifactor Dimensionality Reduction (MDR) algorithm searches for these effects with an exhaustive approach. Previously unknown aspects of MDR performance were the power to detect interactive effects given large numbers of non-model loci or varying degrees of heterogeneity among multiple epistatic disease models. METHODS To address the performance with many non-model loci, datasets of 500 cases and 500 controls with 100 to 10,000 SNPs were simulated for two-locus models, and one hundred 500-case/500-control datasets with 100 and 500 SNPs were simulated for three-locus models. Multiple levels of locus heterogeneity were simulated in several sample sizes. RESULTS These results show MDR is robust to locus heterogeneity when the definition of power is not as conservative as in previous simulation studies where all model loci were required to be found by the method. The results also indicate that MDR performance is related more strongly to broad-sense heritability than sample size and is not greatly affected by non-model loci. CONCLUSIONS A study in which a population with high heritability estimates is sampled predisposes the MDR study to success more than a larger ascertainment in a population with smaller estimates.
Collapse
Affiliation(s)
- Todd L Edwards
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, Tenn., USA
| | | | | | | | | |
Collapse
|
177
|
Abstract
The goal of this unit is to introduce gene-gene interactions (epistasis) as a significant complicating factor in the search for disease susceptibility genes. This unit begins with an overview of gene-gene interactions and why they are likely to be common. Then, it reviews several statistical and computational methods for detecting and characterizing genes with effects that are dependent on other genes. The focus of this unit is genetic association studies of discrete and quantitative traits because most of the methods for detecting gene-gene interactions have been developed specifically for these study designs.
Collapse
Affiliation(s)
- Jason H Moore
- Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, New Hampshire, USA
| |
Collapse
|
178
|
Edwards TL, Wang X, Chen Q, Wormly B, Riley B, O’Neill FA, Walsh D, Ritchie MD, Kendler KS, Chen X. Interaction between interleukin 3 and dystrobrevin-binding protein 1 in schizophrenia. Schizophr Res 2008; 106:208-17. [PMID: 18804346 PMCID: PMC2746913 DOI: 10.1016/j.schres.2008.07.022] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/10/2008] [Revised: 07/25/2008] [Accepted: 07/28/2008] [Indexed: 10/21/2022]
Abstract
Schizophrenia is a common psychotic mental disorder that is believed to result from the effects of multiple genetic and environmental factors. In this study, we explored gene-gene interactions and main effects in both case-control (657 cases and 411 controls) and family-based (273 families, 1,350 subjects) datasets of English or Irish ancestry. Fifty three markers in 8 genes were genotyped in the family sample and 44 markers in 7 genes were genotyped in the case-control sample. The Multifactor Dimensionality Reduction Pedigree Disequilibrium Test (MDR-PDT) was used to examine epistasis in the family dataset and a 3-locus model was identified (permuted p=0.003). The 3-locus model involved the IL3 (rs2069803), RGS4 (rs2661319), and DTNBP1 (rs2619539) genes. We used MDR to analyze the case-control dataset containing the same markers typed in the RGS4, IL3 and DTNBP1 genes and found evidence of a joint effect between IL3 (rs31400) and DTNBP1 (rs760761) (cross-validation consistency 4/5, balanced prediction accuracy=56.84%, p=0.019). While this is not a direct replication, the results obtained from both the family and case-control samples collectively suggest that IL3 and DTNBP1 are likely to interact and jointly contribute to increase risk for schizophrenia. We also observed a significant main effect in DTNBP1, which survived correction for multiple comparisons, and numerous nominally significant effects in several genes.
Collapse
Affiliation(s)
- Todd L Edwards
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232 USA
- Center for Genetic Epidemiology and Statistical Genetics, Miami Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Xu Wang
- Department of Psychiatry and Virginia Institute for Psychiatric and Behavior Genetics, Virginia Commonwealth University, 800 E. Leigh Street, Richmond, VA 23298 USA
| | - Qi Chen
- Department of Psychiatry and Virginia Institute for Psychiatric and Behavior Genetics, Virginia Commonwealth University, 800 E. Leigh Street, Richmond, VA 23298 USA
| | - Brandon Wormly
- Department of Psychiatry and Virginia Institute for Psychiatric and Behavior Genetics, Virginia Commonwealth University, 800 E. Leigh Street, Richmond, VA 23298 USA
| | - Brien Riley
- Department of Psychiatry and Virginia Institute for Psychiatric and Behavior Genetics, Virginia Commonwealth University, 800 E. Leigh Street, Richmond, VA 23298 USA
| | - F. Anthony O’Neill
- The Department of Psychiatry, The Queens University, Belfast, Northern Ireland, UK
| | | | - Marylyn D. Ritchie
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232 USA
| | - Kenneth S. Kendler
- Department of Psychiatry and Virginia Institute for Psychiatric and Behavior Genetics, Virginia Commonwealth University, 800 E. Leigh Street, Richmond, VA 23298 USA
| | - Xiangning Chen
- Department of Psychiatry and Virginia Institute for Psychiatric and Behavior Genetics, Virginia Commonwealth University, 800 E. Leigh Street, Richmond, VA 23298 USA
| |
Collapse
|
179
|
Motsinger-Reif AA, Reif DM, Fanelli TJ, Ritchie MD. A comparison of analytical methods for genetic association studies. Genet Epidemiol 2008; 32:767-78. [DOI: 10.1002/gepi.20345] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
180
|
Phillips PC. Epistasis--the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet 2008; 9:855-67. [PMID: 18852697 PMCID: PMC2689140 DOI: 10.1038/nrg2452] [Citation(s) in RCA: 1006] [Impact Index Per Article: 59.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Epistasis, or interactions between genes, has long been recognized as fundamentally important to understanding the structure and function of genetic pathways and the evolutionary dynamics of complex genetic systems. With the advent of high-throughput functional genomics and the emergence of systems approaches to biology, as well as a new-found ability to pursue the genetic basis of evolution down to specific molecular changes, there is a renewed appreciation both for the importance of studying gene interactions and for addressing these questions in a unified, quantitative manner.
Collapse
Affiliation(s)
- Patrick C Phillips
- Center for Ecology and Evolution, University of Oregon, Eugene, Oregon 97403 USA.
| |
Collapse
|
181
|
Lou XY, Chen GB, Yan L, Ma JZ, Mangold JE, Zhu J, Elston RC, Li MD. A combinatorial approach to detecting gene-gene and gene-environment interactions in family studies. Am J Hum Genet 2008; 83:457-67. [PMID: 18834969 DOI: 10.1016/j.ajhg.2008.09.001] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2008] [Revised: 09/01/2008] [Accepted: 09/05/2008] [Indexed: 12/23/2022] Open
Abstract
Widespread multifactor interactions present a significant challenge in determining risk factors of complex diseases. Several combinatorial approaches, such as the multifactor dimensionality reduction (MDR) method, have emerged as a promising tool for better detecting gene-gene (G x G) and gene-environment (G x E) interactions. We recently developed a general combinatorial approach, namely the generalized multifactor dimensionality reduction (GMDR) method, which can entertain both qualitative and quantitative phenotypes and allows for both discrete and continuous covariates to detect G x G and G x E interactions in a sample of unrelated individuals. In this article, we report the development of an algorithm that can be used to study G x G and G x E interactions for family-based designs, called pedigree-based GMDR (PGMDR). Compared to the available method, our proposed method has several major improvements, including allowing for covariate adjustments and being applicable to arbitrary phenotypes, arbitrary pedigree structures, and arbitrary patterns of missing marker genotypes. Our Monte Carlo simulations provide evidence that the PGMDR method is superior in performance to identify epistatic loci compared to the MDR-pedigree disequilibrium test (PDT). Finally, we applied our proposed approach to a genetic data set on tobacco dependence and found a significant interaction between two taste receptor genes (i.e., TAS2R16 and TAS2R38) in affecting nicotine dependence.
Collapse
Affiliation(s)
- Xiang-Yang Lou
- Department of Psychiatry and Neurobehavioral Sciences, University of Virginia, Charlottesville, VA 22911, USA
| | | | | | | | | | | | | | | |
Collapse
|
182
|
Beretta L, Cappiello F, Moore JH, Barili M, Greene CS, Scorza R. Ability of epistatic interactions of cytokine single-nucleotide polymorphisms to predict susceptibility to disease subsets in systemic sclerosis patients. ACTA ACUST UNITED AC 2008; 59:974-83. [PMID: 18576303 DOI: 10.1002/art.23836] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
OBJECTIVE Gene-gene interaction, or epistasis, is considered a ubiquitous component of complex human diseases such as systemic sclerosis (SSc). Epistasis is difficult to model by traditional parametric approaches; therefore, nonparametric computational algorithms, such as multifactor dimensionality reduction (MDR), have been developed. METHODS A total of 242 consecutive unrelated Italian SSc patients and an equal number of well-matched healthy controls were genotyped for 22 cytokine single-nucleotide polymorphisms (SNPs; 13 cytokine genes). The distribution of the SNPs between controls and SSc patients, controls and limited cutaneous SSc (lcSSc) patients, and controls and diffuse cutaneous SSc (dcSSc) patients was tested by the MDR constructive induction algorithm and by focused interaction testing framework (FITF), a logistic regression-based approach. RESULTS None of the studied SNPs had main independent effects on SSc or disease subset susceptibility, therefore no epistatic interaction was detectable by FITF. The MDR analysis showed a significant epistatic interaction among the interleukin-2 (IL-2) G-330T, IL-6 C-174G, and interferon-gamma AUTR5644T SNPs and the IL-1 receptor Cpst1970T, IL-6 Ant565G, and IL-10 C-819T SNPs in lcSSc and dcSSc susceptibility, respectively. The relevance of the single multilocus attributes constructed by the MDR inductive algorithm was then confirmed by the parametric approach (P < 0.001 for both controls versus lcSSc patients and controls versus dcSSc patients). CONCLUSION We provide evidence for gene-gene interaction among cytokine SNPs in the context of SSc. The interaction among cytokine SNPs with a profibrotic or a regulatory function on profibrotic interleukins is relevant to the susceptibility to SSc subsets and it appears to be more important than the contribution of any single cytokine SNP.
Collapse
Affiliation(s)
- Lorenzo Beretta
- Referral Center for Systemic Autoimmune Diseases, IRCCS Fondazione Policlinico-Mangiagalli-Regina Elena and University of Milan, Milan, Italy.
| | | | | | | | | | | |
Collapse
|
183
|
Abstract
Recent years have seen great advances in generating and analyzing data to identify the genetic architecture of biological traits. Human disease has understandably received intense research focus, and the genes responsible for most Mendelian diseases have successfully been identified. However, the same advances have shown a consistent if less satisfying pattern, in which complex traits are affected by variation in large numbers of genes, most of which have individually minor or statistically elusive effects, leaving the bulk of genetic etiology unaccounted for. This pattern applies to diverse and unrelated traits, not just disease, in basically all species, and is consistent with evolutionary expectations, raising challenging questions about the best way to approach and understand biological complexity.
Collapse
Affiliation(s)
- Kenneth M Weiss
- Department of Anthropology and Integrated Biosciences Genetics Program, Pennsylvania State University, University Park, Pennsylvania 16802, USA.
| |
Collapse
|
184
|
Pattin KA, Moore JH. Exploiting the proteome to improve the genome-wide genetic analysis of epistasis in common human diseases. Hum Genet 2008; 124:19-29. [PMID: 18551320 PMCID: PMC2780579 DOI: 10.1007/s00439-008-0522-8] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2008] [Accepted: 05/26/2008] [Indexed: 11/24/2022]
Abstract
One of the central goals of human genetics is the identification of loci with alleles or genotypes that confer increased susceptibility. The availability of dense maps of single-nucleotide polymorphisms (SNPs) along with high-throughput genotyping technologies has set the stage for routine genome-wide association studies that are expected to significantly improve our ability to identify susceptibility loci. Before this promise can be realized, there are some significant challenges that need to be addressed. We address here the challenge of detecting epistasis or gene-gene interactions in genome-wide association studies. Discovering epistatic interactions in high dimensional datasets remains a challenge due to the computational complexity resulting from the analysis of all possible combinations of SNPs. One potential way to overcome the computational burden of a genome-wide epistasis analysis would be to devise a logical way to prioritize the many SNPs in a dataset so that the data may be analyzed more efficiently and yet still retain important biological information. One of the strongest demonstrations of the functional relationship between genes is protein-protein interaction. Thus, it is plausible that the expert knowledge extracted from protein interaction databases may allow for a more efficient analysis of genome-wide studies as well as facilitate the biological interpretation of the data. In this review we will discuss the challenges of detecting epistasis in genome-wide genetic studies and the means by which we propose to apply expert knowledge extracted from protein interaction databases to facilitate this process. We explore some of the fundamentals of protein interactions and the databases that are publicly available.
Collapse
Affiliation(s)
- Kristine A. Pattin
- Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, NH
| | - Jason H. Moore
- Computational Genetics Laboratory, Department of Genetics, Dartmouth Medical School, Lebanon, NH
- Department of Genetics and Community and Family Medicine, Norris-Cotton Cancer Center, Dartmouth Medical School, Lebanon, NH; Department of Computer Science, University of New Hampshire, Durham, NH; Department of Computer Science, University of Vermont, Burlington, VT; Translational Genomics Research Institute, Phoenix, AZ
| |
Collapse
|
185
|
Abstract
Motivation: Microbial phenotypes are typically due to the concerted action of multiple gene functions, yet the presence of each gene may have only a weak correlation with the observed phenotype. Hence, it may be more appropriate to examine co-occurrence between sets of genes and a phenotype (multiple-to-one) instead of pairwise relations between a single gene and the phenotype. Here, we propose an efficient class association rule mining algorithm, netCAR, in order to extract sets of COGs (clusters of orthologous groups of proteins) associated with a phenotype from COG phylogenetic profiles and a phenotype profile. netCAR takes into account the phylogenetic co-occurrence graph between COGs to restrict hypothesis space, and uses mutual information to evaluate the biconditional relation. Results: We examined the mining capability of pairwise and multiple-to-one association by using netCAR to extract COGs relevant to six microbial phenotypes (aerobic, anaerobic, facultative, endospore, motility and Gram negative) from 11 969 unique COG profiles across 155 prokaryotic organisms. With the same level of false discovery rate, multiple-to-one association can extract about 10 times more relevant COGs than one-to-one association. We also reveal various topologies of association networks among COGs (modules) from extracted multiple-to-one correlation rules relevant with the six phenotypes; including a well-connected network for motility, a star-shaped network for aerobic and intermediate topologies for the other phenotypes. netCAR outperforms a standard CAR mining algorithm, CARapriori, while requiring several orders of magnitude less computational time for extracting 3-COG sets. Availability: Source code of the Java implementation is available as Supplementary Material at the Bioinformatics online website, or upon request to the author. Contact:makio323@gmail.com Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Makio Tamura
- Lawrence Livermore National Laboratory, Computing Applications and Research Department/Chemistry, Materials, Earth and Life Sciences Department, Microbial Systems Biology Group, Livermore, CA 94550, USA.
| | | |
Collapse
|
186
|
Chen SH, Sun J, Dimitrov L, Turner AR, Adams TS, Meyers DA, Chang BL, Zheng SL, Grönberg H, Xu J, Hsu FC. A support vector machine approach for detecting gene-gene interaction. Genet Epidemiol 2008; 32:152-67. [PMID: 17968988 DOI: 10.1002/gepi.20272] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Although genetic factors play an important role in most human diseases, multiple genes or genes and environmental factors may influence individual risk. In order to understand the underlying biological mechanisms of complex diseases, it is important to understand the complex relationships that control the process. In this paper, we consider different perspectives, from each optimization, complexity analysis, and algorithmic design, which allows us to describe a reasonable and applicable computational framework for detecting gene-gene interactions. Accordingly, support vector machine and combinatorial optimization techniques (local search and genetic algorithm) were tailored to fit within this framework. Although the proposed approach is computationally expensive, our results indicate this is a promising tool for the identification and characterization of high order gene-gene and gene-environment interactions. We have demonstrated several advantages of this method, including the strong power for classification, less concern for overfitting, and the ability to handle unbalanced data and achieve more stable models. We would like to make the support vector machine and combinatorial optimization techniques more accessible to genetic epidemiologists, and to promote the use and extension of these powerful approaches.
Collapse
Affiliation(s)
- Shyh-Huei Chen
- Department of Industrial Management, National Yunlin University of Science and Technology, Yunlin, Taiwan
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
187
|
Zhang Z, Zhang S, Wong MY, Wareham NJ, Sha Q. An ensemble learning approach jointly modeling main and interaction effects in genetic association studies. Genet Epidemiol 2008; 32:285-300. [PMID: 18205210 PMCID: PMC3572743 DOI: 10.1002/gepi.20304] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Complex diseases are presumed to be the results of interactions of several genes and environmental factors, with each gene only having a small effect on the disease. Thus, the methods that can account for gene-gene interactions to search for a set of marker loci in different genes or across genome and to analyze these loci jointly are critical. In this article, we propose an ensemble learning approach (ELA) to detect a set of loci whose main and interaction effects jointly have a significant association with the trait. In the ELA, we first search for "base learners" and then combine the effects of the base learners by a linear model. Each base learner represents a main effect or an interaction effect. The result of the ELA is easy to interpret. When the ELA is applied to analyze a data set, we can get a final model, an overall P-value of the association test between the set of loci involved in the final model and the trait, and an importance measure for each base learner and each marker involved in the final model. The final model is a linear combination of some base learners. We know which base learner represents a main effect and which one represents an interaction effect. The importance measure of each base learner or marker can tell us the relative importance of the base learner or marker in the final model. We used intensive simulation studies as well as a real data set to evaluate the performance of the ELA. Our simulation studies demonstrated that the ELA is more powerful than the single-marker test in all the simulation scenarios. The ELA also outperformed the other three existing multi-locus methods in almost all cases. In an application to a large-scale case-control study for Type 2 diabetes, the ELA identified 11 single nucleotide polymorphisms that have a significant multi-locus effect (P-value=0.01), while none of the single nucleotide polymorphisms showed significant marginal effects and none of the two-locus combinations showed significant two-locus interaction effects.
Collapse
Affiliation(s)
- Zhaogong Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
- Heilongjiang University, Harbin, China
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
- Heilongjiang University, Harbin, China
| | - Man-Yu Wong
- Department of Mathematics, Hong Kong University of Sciences and Technology, Hong Kong, China
| | - Nicholas J. Wareham
- Department of Public Health and Primary Care, University of Cambridge Institute of Public Health, Cambridge, United Kingdom
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| |
Collapse
|
188
|
Beretta L, Cappiello F, Moore JH, Scorza R. Interleukin-1 gene complex single nucleotide polymorphisms in systemic sclerosis: A further step ahead. Hum Immunol 2008; 69:187-92. [DOI: 10.1016/j.humimm.2007.12.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2007] [Revised: 12/05/2007] [Accepted: 12/19/2007] [Indexed: 12/01/2022]
|
189
|
Heidema AG, Feskens EJM, Doevendans PAFM, Ruven HJT, van Houwelingen HC, Mariman ECM, Boer JMA. Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs. Genet Epidemiol 2007; 31:910-21. [PMID: 17615573 DOI: 10.1002/gepi.20251] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Nonparametric approaches have been developed that are able to analyze large numbers of single nucleotide polymorphisms (SNPs) in modest sample sizes. These approaches have different selection features and may not provide similar results when applied to the same dataset. Therefore, we compared the results of three approaches (set association, random forests and multifactor dimensionality reduction [MDR]) to select from a total of 93 candidate SNPs a subset of SNPs that are important in determining high-density lipoprotein (HDL)-cholesterol levels. The study population consisted of a random sample from a Dutch monitoring project for cardiovascular disease risk factors and was dichotomized into cases (low HDL-cholesterol, n = 533) and non-cases (high HDL-cholesterol, n = 545) based on gender-specific median values for HDL cholesterol. Clearly, all three approaches prioritized three SNPs as important (CETP Taq1B, CETP-629 C/A and LPL Ser447X). Two SNPs with weaker main effects were additionally prioritized by random forests (APOC3 3175 G/C and CCR2 Val62Ile), whereas MTHFR 677 C/T was selected in combination with CETP Taq1B as best model by MDR. Obtained p-values for the selected models were significant for the set association approach (p =.0019), random forests (p<.01) and MDR (p<.02). In conclusion, the application of a combination of multi-locus methods is a useful approach in genetic association studies to select a well-defined set of important SNPs for further statistical and epidemiological interpretation, providing increased confidence and more information compared with the application of only one method.
Collapse
Affiliation(s)
- A Geert Heidema
- Centre for Nutrition and Health, National Institute for Public Health and the Environment, Bilthoven, The Netherlands.
| | | | | | | | | | | | | |
Collapse
|
190
|
Qi Y, Niu W, Zhu T, Zhou W, Qiu C. Synergistic effect of the genetic polymorphisms of the renin-angiotensin-aldosterone system on high-altitude pulmonary edema: a study from Qinghai-Tibet altitude. Eur J Epidemiol 2007; 23:143-52. [PMID: 17987391 DOI: 10.1007/s10654-007-9208-0] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2006] [Accepted: 10/30/2007] [Indexed: 11/26/2022]
Abstract
The pathogenesis of high-altitude pulmonary edema (HAPE) has been at least partially attributed to the local dysregulation of the renin-angiotensin-aldosterone system (RAAS) cascade. To address this issue, we conducted the largest nested case-control study to-date to explore the association between variations in RAAS genes and HAPE in Chinese population. We recruited 140 HAPE patients and 144 controls during the construction of Qinghai-Tibet railway and genotyped 10 gene polymorphisms evenly interspersed in 5 RAAS candidate genes. The data were analyzed by haplotype and multifactor dimensionality reduction (MDR). The single-locus analysis showed that CYP11B2 C-344T and K173R and ACE A-240T polymorphisms were significantly associated with HAPE after Bonferroni correction (P<0.005). The linkage analysis constructed a linkage block including C-344T and K173R polymorphisms in complete linkage disequilibrium with each other, while occurred with significantly different frequencies between HAPE and control groups. The gene-gene interaction analysis found the overall best model including ACE A-240T and A2350G and CYP11B2 C-344T polymorphisms with strong synergistic effect. This model had a maximum testing accuracy of 68.61% and a maximum cross validation consistency of 9 out of 10 (P=0.004). The homozygous genotype combination of -240AA, 2350GG and -344TT conferred high genetic susceptibility to HAPE, which was further strengthened by haplotype analysis. Our results add evidence for synergistic effect of RAAS gene polymorphisms on HAPE susceptibility. Moreover, we proposed a promising data-mining analytical approach (MDR) for detecting and characterizing gene-gene interactions.
Collapse
Affiliation(s)
- Yue Qi
- National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences/Peking Union Medical College, No.5 Dong Dan San Tiao, Beijing 100005, China
| | | | | | | | | |
Collapse
|
191
|
|
192
|
Abstract
What makes some people neurotic or schizophrenic or right-handed or fearless? The challenge in answering this is to map from genotype to anatomical and physiological phenotypes and beyond to behavior and cognition.
Collapse
Affiliation(s)
- Kevin J Mitchell
- Smurfit Institute of Genetics and Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.
| |
Collapse
|
193
|
Briollais L, Wang Y, Rajendram I, Onay V, Shi E, Knight J, Ozcelik H. Methodological issues in detecting gene-gene interactions in breast cancer susceptibility: a population-based study in Ontario. BMC Med 2007; 5:22. [PMID: 17683639 PMCID: PMC1976420 DOI: 10.1186/1741-7015-5-22] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2007] [Accepted: 08/07/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There is growing evidence that gene-gene interactions are ubiquitous in determining the susceptibility to common human diseases. The investigation of such gene-gene interactions presents new statistical challenges for studies with relatively small sample sizes as the number of potential interactions in the genome can be large. Breast cancer provides a useful paradigm to study genetically complex diseases because commonly occurring single nucleotide polymorphisms (SNPs) may additively or synergistically disturb the system-wide communication of the cellular processes leading to cancer development. METHODS In this study, we systematically studied SNP-SNP interactions among 19 SNPs from 18 key genes involved in major cancer pathways in a sample of 398 breast cancer cases and 372 controls from Ontario. We discuss the methodological issues associated with the detection of SNP-SNP interactions in this dataset by applying and comparing three commonly used methods: the logistic regression model, classification and regression trees (CART), and the multifactor dimensionality reduction (MDR) method. RESULTS Our analyses show evidence for several simple (two-way) and complex (multi-way) SNP-SNP interactions associated with breast cancer. For example, all three methods identified XPD-[Lys751Gln]*IL10-[G(-1082)A] as the most significant two-way interaction. CART and MDR identified the same critical SNPs participating in complex interactions. Our results suggest that the use of multiple statistical approaches (or an integrated approach) rather than a single methodology could be the best strategy to elucidate complex gene interactions that have generally very different patterns. CONCLUSION The strategy used here has the potential to identify complex biological relationships among breast cancer genes and processes. This will lead to the discovery of novel biological information, which will improve breast cancer risk management.
Collapse
Affiliation(s)
- Laurent Briollais
- Prosserman Centre for Health Research, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, M5T 3L9, Canada
- Public Health Sciences Department, University of Toronto, Toronto, M5T 3M7, Canada
| | - Yuanyuan Wang
- Prosserman Centre for Health Research, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, M5T 3L9, Canada
| | - Isaac Rajendram
- Prosserman Centre for Health Research, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, M5T 3L9, Canada
- Fred A Litwin Centre for Cancer Genetics, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, M5T 3L9, Canada
| | - Venus Onay
- Prosserman Centre for Health Research, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, M5T 3L9, Canada
- Fred A Litwin Centre for Cancer Genetics, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, M5T 3L9, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, M5G 1X5, Canada
| | - Ellen Shi
- Ontario Cancer Genetics Network, Cancer Care Ontario, Toronto, M5G 2L9, Canada
| | - Julia Knight
- Prosserman Centre for Health Research, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, M5T 3L9, Canada
- Public Health Sciences Department, University of Toronto, Toronto, M5T 3M7, Canada
| | - Hilmi Ozcelik
- Fred A Litwin Centre for Cancer Genetics, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, M5T 3L9, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, M5G 1X5, Canada
- Ontario Cancer Genetics Network, Cancer Care Ontario, Toronto, M5G 2L9, Canada
| |
Collapse
|
194
|
Velez DR, White BC, Motsinger AA, Bush WS, Ritchie MD, Williams SM, Moore JH. A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 2007; 31:306-15. [PMID: 17323372 DOI: 10.1002/gepi.20211] [Citation(s) in RCA: 224] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Multifactor dimensionality reduction (MDR) was developed as a method for detecting statistical patterns of epistasis. The overall goal of MDR is to change the representation space of the data to make interactions easier to detect. It is well known that machine learning methods may not provide robust models when the class variable (e.g. case-control status) is imbalanced and accuracy is used as the fitness measure. This is because most methods learn patterns that are relevant for the larger of the two classes. The goal of this study was to evaluate three different strategies for improving the power of MDR to detect epistasis in imbalanced datasets. The methods evaluated were: (1) over-sampling that resamples with replacement the smaller class until the data are balanced, (2) under-sampling that randomly removes subjects from the larger class until the data are balanced, and (3) balanced accuracy [(sensitivity+specificity)/2] as the fitness function with and without an adjusted threshold. These three methods were compared using simulated data with two-locus epistatic interactions of varying heritability (0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4) and minor allele frequency (0.2, 0.4) that were embedded in 100 replicate datasets of varying sample sizes (400, 800, 1600). Each dataset was generated with different ratios of cases to controls (1 : 1, 1 : 2, 1 : 4). We found that the balanced accuracy function with an adjusted threshold significantly outperformed both over-sampling and under-sampling and fully recovered the power. These results suggest that balanced accuracy should be used instead of accuracy for the MDR analysis of epistasis in imbalanced datasets.
Collapse
Affiliation(s)
- Digna R Velez
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, Tennessee
| | | | | | | | | | | | | |
Collapse
|
195
|
Gjuvsland AB, Hayes BJ, Meuwissen THE, Plahte E, Omholt SW. Nonlinear regulation enhances the phenotypic expression of trans-acting genetic polymorphisms. BMC SYSTEMS BIOLOGY 2007; 1:32. [PMID: 17651484 PMCID: PMC1994684 DOI: 10.1186/1752-0509-1-32] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2007] [Accepted: 07/25/2007] [Indexed: 11/10/2022]
Abstract
BACKGROUND Genetic variation explains a considerable part of observed phenotypic variation in gene expression networks. This variation has been shown to be located both locally (cis) and distally (trans) to the genes being measured. Here we explore to which degree the phenotypic manifestation of local and distant polymorphisms is a dynamic feature of regulatory design. RESULTS By combining mathematical models of gene expression networks with genetic maps and linkage analysis we find that very different network structures and regulatory motifs give similar cis/trans linkage patterns. However, when the shape of the cis-regulatory input functions is more nonlinear or threshold-like, we observe for all networks a dramatic increase in the phenotypic expression of distant compared to local polymorphisms under otherwise equal conditions. CONCLUSION Our findings indicate that genetic variation affecting the form of cis-regulatory input functions may reshape the genotype-phenotype map by changing the relative importance of cis and trans variation. Our approach combining nonlinear dynamic models with statistical genetics opens up for a systematic investigation of how functional genetic variation is translated into phenotypic variation under various systemic conditions.
Collapse
Affiliation(s)
- Arne B Gjuvsland
- Centre for Integrative Genetics (CIGENE), Norwegian University of Life Sciences, Ås, Norway
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, Norway
| | - Ben J Hayes
- Centre for Integrative Genetics (CIGENE), Norwegian University of Life Sciences, Ås, Norway
- Animal Genetics and Genomics, Department of Primary Industries, Attwood, Victoria, Australia
| | - Theo HE Meuwissen
- Centre for Integrative Genetics (CIGENE), Norwegian University of Life Sciences, Ås, Norway
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, Norway
| | - Erik Plahte
- Centre for Integrative Genetics (CIGENE), Norwegian University of Life Sciences, Ås, Norway
- Department of Chemistry, Biotechnology, and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Stig W Omholt
- Centre for Integrative Genetics (CIGENE), Norwegian University of Life Sciences, Ås, Norway
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, Norway
| |
Collapse
|
196
|
Sepúlveda N, Paulino CD, Carneiro J, Penha-Gonçalves C. Allelic penetrance approach as a tool to model two-locus interaction in complex binary traits. Heredity (Edinb) 2007; 99:173-84. [PMID: 17551528 DOI: 10.1038/sj.hdy.6800979] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Many binary phenotypes do not follow a classical Mendelian inheritance pattern. Interaction between genetic and environmental factors is thought to contribute to the incomplete penetrance phenomena often observed in these complex binary traits. Several two-locus models for penetrance have been proposed to aid the genetic dissection of binary traits. Such models assume linear genetic effects of both loci in different mathematical scales of penetrance, resembling the analytical framework of quantitative traits. However, changes in phenotypic scale are difficult to envisage in binary traits and limited genetic interpretation is extractable from current modeling of penetrance. To overcome this limitation, we derived an allelic penetrance approach that attributes incomplete penetrance to the stochastic expression of the alleles controlling the phenotype, the genetic background and environmental factors. We applied this approach to formulate dominance and recessiveness in a single diallelic locus and to model different genetic mechanisms for the joint action of two diallelic loci. We fit the models to data on the genetic susceptibility of mice following infections with Listeria monocytogenes and Plasmodium berghei. These models gain in genetic interpretation, because they specify the alleles that are responsible for the genetic (inter)action and their genetic nature (dominant or recessive), and predict genotypic combinations determining the phenotype. Further, we show via computer simulations that the proposed models produce penetrance patterns not captured by traditional two-locus models. This approach provides a new analysis framework for dissecting mechanisms of interlocus joint action in binary traits using genetic crosses.
Collapse
Affiliation(s)
- N Sepúlveda
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | | | | | | |
Collapse
|
197
|
Lou XY, Chen GB, Yan L, Ma JZ, Zhu J, Elston RC, Li MD. A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence. Am J Hum Genet 2007; 80:1125-37. [PMID: 17503330 PMCID: PMC1867100 DOI: 10.1086/518312] [Citation(s) in RCA: 453] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2007] [Accepted: 03/21/2007] [Indexed: 11/04/2022] Open
Abstract
The determination of gene-by-gene and gene-by-environment interactions has long been one of the greatest challenges in genetics. The traditional methods are typically inadequate because of the problem referred to as the "curse of dimensionality." Recent combinatorial approaches, such as the multifactor dimensionality reduction (MDR) method, the combinatorial partitioning method, and the restricted partition method, have a straightforward correspondence to the concept of the phenotypic landscape that unifies biological, statistical genetics, and evolutionary theories. However, the existing approaches have several limitations, such as not allowing for covariates, that restrict their practical use. In this study, we report a generalized MDR (GMDR) method that permits adjustment for discrete and quantitative covariates and is applicable to both dichotomous and continuous phenotypes in various population-based study designs. Computer simulations indicated that the GMDR method has superior performance in its ability to identify epistatic loci, compared with current methods in the literature. We applied our proposed method to a genetics study of four genes that were reported to be associated with nicotine dependence and found significant joint action between CHRNB4 and NTRK2. Moreover, our example illustrates that the newly proposed GMDR approach can increase prediction ability, suggesting that its use is justified in practice. In summary, GMDR serves the purpose of identifying contributors to population variation better than do the other existing methods.
Collapse
Affiliation(s)
- Xiang-Yang Lou
- Department of Psychiatry and Neurobehavioral Sciences, University of Virginia, Charlottesville, VA 22911, USA
| | | | | | | | | | | | | |
Collapse
|
198
|
Alvarez-Castro JM, Carlborg O. A unified model for functional and statistical epistasis and its application in quantitative trait Loci analysis. Genetics 2007; 176:1151-67. [PMID: 17409082 PMCID: PMC1894581 DOI: 10.1534/genetics.106.067348] [Citation(s) in RCA: 128] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2006] [Accepted: 03/20/2007] [Indexed: 11/18/2022] Open
Abstract
Interaction between genes, or epistasis, is found to be common and it is a key concept for understanding adaptation and evolution of natural populations, response to selection in breeding programs, and determination of complex disease. Currently, two independent classes of models are used to study epistasis. Statistical models focus on maintaining desired statistical properties for detection and estimation of genetic effects and for the decomposition of genetic variance using average effects of allele substitutions in populations as parameters. Functional models focus on the evolutionary consequences of the attributes of the genotype-phenotype map using natural effects of allele substitutions as parameters. Here we provide a new, general and unified model framework: the natural and orthogonal interactions (NOIA) model. NOIA implements tools for transforming genetic effects measured in one population to the ones of other populations (e.g., between two experimental designs for QTL) and parameters of statistical and functional epistasis into each other (thus enabling us to obtain functional estimates of QTL), as demonstrated numerically. We develop graphical interpretations of functional and statistical models as regressions of the genotypic values on the gene content, which illustrates the difference between the models--the constraint on the slope of the functional regression--and when the models are equivalent. Furthermore, we use our theoretical foundations to conceptually clarify functional and statistical epistasis, discuss the advantages of NOIA over previous theory, and stress the importance of linking functional and statistical models.
Collapse
|
199
|
Jasnos L, Korona R. Epistatic buffering of fitness loss in yeast double deletion strains. Nat Genet 2007; 39:550-4. [PMID: 17322879 DOI: 10.1038/ng1986] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2006] [Accepted: 01/27/2007] [Indexed: 11/09/2022]
Abstract
Interactions between deleterious mutations have been insufficiently studied, despite the fact that their strength and direction are critical for understanding the evolution of genetic recombination and the buildup of mutational load in populations. We compiled a list of 758 yeast gene deletions causing growth defects (from the Munich Information Center for Protein Sequences database and ref. 7). Using BY4741 and BY4742 single-deletion strains, we carried out 639 random crosses and assayed growth curves of the resulting progeny. We show that the maximum growth rate averaged over strains lacking deletions and those with double deletions is higher than that of strains with single deletions, indicating a positive epistatic effect. This tendency is shared by genes belonging to a variety of functional classes. Based on our data and former theoretical work, we suggest that epistasis is likely to diminish the negative effects of mutations when the ability to produce biomass at high rates contributes significantly to fitness.
Collapse
Affiliation(s)
- Lukasz Jasnos
- Institute of Environmental Sciences, Jagiellonian University, Gronostajowa 7, 30-387 Krakow, Poland
| | | |
Collapse
|
200
|
Abstract
The workhorse of modern genetic analysis is the parametric linear model. The advantages of the linear modeling framework are many and include a mathematical understanding of the model fitting process and ease of interpretation. However, an important limitation is that linear models make assumptions about the nature of the data being modeled. This assumption may not be realistic for complex biological systems such as disease susceptibility where nonlinearities in the genotype to phenotype mapping relationship that result from epistasis, plastic reaction norms, locus heterogeneity, and phenocopy, for example, are the norm rather than the exception. We have previously developed a flexible modeling approach called symbolic discriminant analysis (SDA) that makes no assumptions about the patterns in the data. Rather, SDA lets the data dictate the size, shape, and complexity of a symbolic discriminant function that could include any set of mathematical functions from a list of candidates supplied by the user. Here, we outline a new five step process for symbolic model discovery that uses genetic programming (GP) for coarse-grained stochastic searching, experimental design for parameter optimization, graphical modeling for generating expert knowledge, and estimation of distribution algorithms for fine-grained stochastic searching. Finally, we introduce function mapping as a new method for interpreting symbolic discriminant functions. We show that function mapping when combined with measures of interaction information facilitates statistical interpretation by providing a graphical approach to decomposing complex models to highlight synergistic, redundant, and independent effects of polymorphisms and their composite functions. We illustrate this five step SDA modeling process with a real case-control dataset.
Collapse
Affiliation(s)
- Jason H Moore
- Computational Genetics Laboratory, Norris-Cotton Cancer Center, Dartmouth Medical School, Lebanon, NH 03756, USA
| | | | | | | | | | | |
Collapse
|