1
|
Herrera-Luis E, Benke K, Volk H, Ladd-Acosta C, Wojcik GL. Gene-environment interactions in human health. Nat Rev Genet 2024:10.1038/s41576-024-00731-z. [PMID: 38806721 DOI: 10.1038/s41576-024-00731-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/03/2024] [Indexed: 05/30/2024]
Abstract
Gene-environment interactions (G × E), the interplay of genetic variation with environmental factors, have a pivotal impact on human complex traits and diseases. Statistically, G × E can be assessed by determining the deviation from expectation of predictive models based solely on the phenotypic effects of genetics or environmental exposures. Despite the unprecedented, widespread and diverse use of G × E analytical frameworks, heterogeneity in their application and reporting hinders their applicability in public health. In this Review, we discuss study design considerations as well as G × E analytical frameworks to assess polygenic liability dependent on the environment, to identify specific genetic variants exhibiting G × E, and to characterize environmental context for these dynamics. We conclude with recommendations to address the most common challenges and pitfalls in the conceptualization, methodology and reporting of G × E studies, as well as future directions.
Collapse
Affiliation(s)
- Esther Herrera-Luis
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Kelly Benke
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Heather Volk
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Christine Ladd-Acosta
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
2
|
Guo X, Chatterjee N, Dutta D. Subset-based method for cross-tissue transcriptome-wide association studies improves power and interpretability. HGG ADVANCES 2024; 5:100283. [PMID: 38491773 PMCID: PMC10999697 DOI: 10.1016/j.xhgg.2024.100283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 03/09/2024] [Accepted: 03/09/2024] [Indexed: 03/18/2024] Open
Abstract
Integrating results from genome-wide association studies (GWASs) and studies of molecular phenotypes such as gene expressions can improve our understanding of the biological functions of trait-associated variants and can help prioritize candidate genes for downstream analysis. Using reference expression quantitative trait locus (eQTL) studies, several methods have been proposed to identify gene-trait associations, primarily based on gene expression imputation. To increase the statistical power by leveraging substantial eQTL sharing across tissues, meta-analysis methods aggregating such gene-based test results across multiple tissues or contexts have been developed as well. However, most existing meta-analysis methods have limited power to identify associations when the gene has weaker associations in only a few tissues and cannot identify the subset of tissues in which the gene is "activated." For this, we developed a cross-tissue subset-based transcriptome-wide association study (CSTWAS) meta-analysis method that improves power under such scenarios and can extract the set of potentially associated tissues. To improve applicability, CSTWAS uses only GWAS summary statistics and pre-computed correlation matrices to identify a subset of tissues that have the maximal evidence of gene-trait association. Through numerical simulations, we found that CSTWAS can maintain a well-calibrated type-I error rate, improves power especially when there is a small number of associated tissues for a gene-trait association, and identifies an accurate associated tissue set. By analyzing GWAS summary statistics of three complex traits and diseases, we demonstrate that CSTWAS could identify biological meaningful signals while providing an interpretation of disease etiology by extracting a set of potentially associated tissues.
Collapse
Affiliation(s)
- Xinyu Guo
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90007, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Diptavo Dutta
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology & Genetics, National Cancer Institute, Rockville, MD 20850, USA.
| |
Collapse
|
3
|
Sanchez-Roige S, Kember RL, Agrawal A. Substance use and common contributors to morbidity: A genetics perspective. EBioMedicine 2022; 83:104212. [PMID: 35970022 PMCID: PMC9399262 DOI: 10.1016/j.ebiom.2022.104212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 07/27/2022] [Accepted: 07/28/2022] [Indexed: 11/23/2022] Open
Abstract
Excessive substance use and substance use disorders (SUDs) are common, serious and relapsing medical conditions. They frequently co-occur with other diseases that are leading contributors to disability worldwide. While heavy substance use may potentiate the course of some of these illnesses, there is accumulating evidence suggesting common genetic architectures. In this narrative review, we focus on four heritable medical conditions - cardiometabolic disease, chronic pain, depression and COVID-19, which are commonly overlapping with, but not necessarily a direct consequence of, SUDs. We find persuasive evidence of underlying genetic liability that predisposes to both SUDs and chronic pain, depression, and COVID-19. For cardiometabolic disease, there is greater support for a potential causal influence of problematic substance use. Our review encourages de-stigmatization of SUDs and the assessment of substance use in clinical settings. We assert that identifying shared pathways of risk has high translational potential, allowing tailoring of treatments for multiple medical conditions. Funding SSR acknowledges T29KT0526, T32IR5226 and DP1DA054394; RLK acknowledges AA028292; AA acknowledges DA054869 & K02DA032573. The funders had no role in the conceptualization or writing of the paper.
Collapse
|
4
|
Dutta D, VandeHaar P, Fritsche LG, Zöllner S, Boehnke M, Scott LJ, Lee S. A powerful subset-based method identifies gene set associations and improves interpretation in UK Biobank. Am J Hum Genet 2021; 108:669-681. [PMID: 33730541 DOI: 10.1016/j.ajhg.2021.02.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Accepted: 02/19/2021] [Indexed: 02/06/2023] Open
Abstract
Tests of association between a phenotype and a set of genes in a biological pathway can provide insights into the genetic architecture of complex phenotypes beyond those obtained from single-variant or single-gene association analysis. However, most existing gene set tests have limited power to detect gene set-phenotype association when a small fraction of the genes are associated with the phenotype and cannot identify the potentially "active" genes that might drive a gene set-based association. To address these issues, we have developed Gene set analysis Association Using Sparse Signals (GAUSS), a method for gene set association analysis that requires only GWAS summary statistics. For each significantly associated gene set, GAUSS identifies the subset of genes that have the maximal evidence of association and can best account for the gene set association. Using pre-computed correlation structure among test statistics from a reference panel, our p value calculation is substantially faster than other permutation- or simulation-based approaches. In simulations with varying proportions of causal genes, we find that GAUSS effectively controls type 1 error rate and has greater power than several existing methods, particularly when a small proportion of genes account for the gene set signal. Using GAUSS, we analyzed UK Biobank GWAS summary statistics for 10,679 gene sets and 1,403 binary phenotypes. We found that GAUSS is scalable and identified 13,466 phenotype and gene set association pairs. Within these gene sets, we identify an average of 17.2 (max = 405) genes that underlie these gene set associations.
Collapse
Affiliation(s)
- Diptavo Dutta
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Peter VandeHaar
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lars G Fritsche
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Sebastian Zöllner
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Michael Boehnke
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Laura J Scott
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Seunggeun Lee
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Graduate School of Data Science, Seoul National University, Seoul 08826, Republic of Korea.
| |
Collapse
|
5
|
Majumdar A, Burch KS, Haldar T, Sankararaman S, Pasaniuc B, Gauderman WJ, Witte JS. A two-step approach to testing overall effect of gene-environment interaction for multiple phenotypes. Bioinformatics 2021; 36:5640-5648. [PMID: 33453114 DOI: 10.1093/bioinformatics/btaa1083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 12/09/2020] [Accepted: 12/17/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION While gene-environment (GxE) interactions contribute importantly to many different phenotypes, detecting such interactions requires well-powered studies and has proven difficult. To address this, we combine two approaches to improve GxE power: simultaneously evaluating multiple phenotypes and using a two-step analysis approach. Previous work shows that the power to identify a main genetic effect can be improved by simultaneously analyzing multiple related phenotypes. For a univariate phenotype, two-step methods produce higher power for detecting a GxE interaction compared to single step analysis. Therefore, we propose a two-step approach to test for an overall GxE effect for multiple phenotypes. RESULTS Using simulations we demonstrate that, when more than one phenotype has GxE effect (i.e., GxE pleiotropy), our approach offers substantial gain in power (18%-43%) to detect an aggregate-level GxE effect for a multivariate phenotype compared to an analogous two-step method to identify GxE effect for a univariate phenotype. We applied the proposed approach to simultaneously analyze three lipids, LDL, HDL and Triglyceride with the frequency of alcohol consumption as environmental factor in the UK Biobank. The method identified two loci with an overall GxE effect on the vector of lipids, one of which was missed by the competing approaches. AVAILABILITY We provide an R package MPGE implementing the proposed approach which is available from CRAN: https://cran.r-project.org/web/packages/MPGE/index.html.
Collapse
Affiliation(s)
- Arunabha Majumdar
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.,Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
| | - Kathryn S Burch
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.,Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - W James Gauderman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - John S Witte
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
| |
Collapse
|