1
|
Wang P, Lin Z, Xue H, Pan W. Collider bias correction for multiple covariates in GWAS using robust multivariable Mendelian randomization. PLoS Genet 2024; 20:e1011246. [PMID: 38648211 PMCID: PMC11065275 DOI: 10.1371/journal.pgen.1011246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 05/02/2024] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
Genome-wide association studies (GWAS) have identified many genetic loci associated with complex traits and diseases in the past 20 years. Multiple heritable covariates may be added into GWAS regression models to estimate direct effects of genetic variants on a focal trait, or to improve the power by accounting for environmental effects and other sources of trait variations. When one or more covariates are causally affected by both genetic variants and hidden confounders, adjusting for them in GWAS will produce biased estimation of SNP effects, known as collider bias. Several approaches have been developed to correct collider bias through estimating the bias by Mendelian randomization (MR). However, these methods work for only one covariate, some of which utilize MR methods with relatively strong assumptions, both of which may not hold in practice. In this paper, we extend the bias-correction approaches in two aspects: first we derive an analytical expression for the collider bias in the presence of multiple covariates, then we propose estimating the bias using a robust multivariable MR (MVMR) method based on constrained maximum likelihood (called MVMR-cML), allowing the presence of invalid instrumental variables (IVs) and correlated pleiotropy. We also established the estimation consistency and asymptotic normality of the new bias-corrected estimator. We conducted simulations to show that all methods mitigated collider bias under various scenarios. In real data analyses, we applied the methods to two GWAS examples, the first a GWAS of waist-hip ratio with adjustment for only one covariate, body-mass index (BMI), and the second a GWAS of BMI adjusting metabolomic principle components as multiple covariates, illustrating the effectiveness of bias correction.
Collapse
Affiliation(s)
- Peiyao Wang
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Zhaotong Lin
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota, United States of America
- Department of Statistics, Florida State University, Tallahassee, Florida, United States of America
| | - Haoran Xue
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota, United States of America
- Department of Biostatistics, City University of Hong Kong, Hong Kong, China
| | - Wei Pan
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
2
|
Xue F, Tang X, Kim G, Koenen KC, Martin CL, Galea S, Wildman D, Uddin M, Qu A. Heterogeneous Mediation Analysis on Epigenomic PTSD and Traumatic Stress in a Predominantly African American Cohort. J Am Stat Assoc 2022; 117:1669-1683. [PMID: 36875798 PMCID: PMC9980467 DOI: 10.1080/01621459.2022.2089572] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
DNA methylation (DNAm) has been suggested to play a critical role in post-traumatic stress disorder (PTSD), through mediating the relationship between trauma and PTSD. However, this underlying mechanism of PTSD for African Americans still remains unknown. To fill this gap, in this article, we investigate how DNAm mediates the effects of traumatic experiences on PTSD symptoms in the Detroit Neighborhood Health Study (DNHS) (2008-2013) which involves primarily African Americans adults. To achieve this, we develop a new mediation analysis approach for high-dimensional potential DNAm mediators. A key novelty of our method is that we consider heterogeneity in mediation effects across subpopulations. Specifically, mediators in different subpopulations could have opposite effects on the outcome, and thus could be difficult to identify under a traditional homogeneous model framework. In contrast, the proposed method can estimate heterogeneous mediation effects and identifies subpopulations in which individuals share similar effects. Simulation studies demonstrate that the proposed method outperforms existing methods for both homogeneous and heterogeneous data. We also present our mediation analysis results of a dataset with 125 participants and more than 450,000 CpG sites from the DNHS study. The proposed method finds three subgroups of subjects and identifies DNAm mediators corresponding to genes such as HSP90AA1 and NFATC1 which have been linked to PTSD symptoms in literature. Our finding could be useful in future finer-grained investigation of PTSD mechanism and in the development of new treatments for PTSD.
Collapse
Affiliation(s)
- Fei Xue
- Purdue University, West Lafayette, IN
| | - Xiwei Tang
- University of Virginia, Charlottesville, VA
| | - Grace Kim
- University of Illinois College of Medicine, Chicago, IL
| | | | - Chantel L Martin
- The University of North Carolina at Chapel Hill, Chapel Hill, NC
| | | | | | | | - Annie Qu
- University of California Irvine, Irvine, CA
| |
Collapse
|
3
|
Cerutti J, Lussier AA, Zhu Y, Liu J, Dunn EC. Associations between indicators of socioeconomic position and DNA methylation: a scoping review. Clin Epigenetics 2021; 13:221. [PMID: 34906220 PMCID: PMC8672601 DOI: 10.1186/s13148-021-01189-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Accepted: 10/21/2021] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Socioeconomic position (SEP) is a major determinant of health across the life course. Yet, little is known about the biological mechanisms explaining this relationship. One possibility widely pursued in the scientific literature is that SEP becomes biologically embedded through epigenetic processes such as DNA methylation (DNAm), wherein the socioeconomic environment causes no alteration in the DNA sequence but modifies gene activity in ways that shape health. METHODS To understand the evidence supporting a potential SEP-DNAm link, we performed a scoping review of published empirical findings on the association between SEP assessed from prenatal development to adulthood and DNAm measured across the life course, with an emphasis on exploring how the developmental timing, duration, and type of SEP exposure influenced DNAm. RESULTS Across the 37 identified studies, we found that: (1) SEP-related DNAm signatures varied across the timing, duration, and type of SEP indicator; (2) however, longitudinal studies examining repeated SEP and DNAm measures are generally lacking; and (3) prior studies are conceptually and methodologically diverse, limiting the interpretability of findings across studies with respect to these three SEP features. CONCLUSIONS Given the complex relationship between SEP and DNAm across the lifespan, these findings underscore the importance of analyzing SEP features, including timing, duration, and type. To guide future research, we highlight additional research gaps and propose four recommendations to further unravel the relationship between SEP and DNAm.
Collapse
Affiliation(s)
- Janine Cerutti
- Department of Pscyhology, University of Vermont, 2 Colchester Ave, Burlington, VT, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, 185 Cambridge Street, Simches Research Building 6th Floor, Boston, MA, 02114, USA
| | - Alexandre A Lussier
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, 185 Cambridge Street, Simches Research Building 6th Floor, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Yiwen Zhu
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, 185 Cambridge Street, Simches Research Building 6th Floor, Boston, MA, 02114, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jiaxuan Liu
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, 185 Cambridge Street, Simches Research Building 6th Floor, Boston, MA, 02114, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Erin C Dunn
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, 185 Cambridge Street, Simches Research Building 6th Floor, Boston, MA, 02114, USA.
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA.
- Stanley Center for Psychiatric Research, The Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
4
|
Schaid DJ, Dikilitas O, Sinnwell JP, Kullo IJ. Penalized mediation models for multivariate data. Genet Epidemiol 2021; 46:32-50. [PMID: 34664742 DOI: 10.1002/gepi.22433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 08/04/2021] [Accepted: 10/04/2021] [Indexed: 11/11/2022]
Abstract
Statistical methods to integrate multiple layers of data, from exposures to intermediate traits to outcome variables, are needed to guide interpretation of complex data sets for which variables are likely contributing in a causal pathway from exposure to outcome. Statistical mediation analysis based on structural equation models provide a general modeling framework, yet they can be difficult to apply to high-dimensional data and they are not automated to select the best fitting model. To overcome these limitations, we developed novel algorithms and software to simultaneously evaluate multiple exposure variables, multiple intermediate traits, and multiple outcome variables. Our penalized mediation models are computationally efficient and simulations demonstrate that they produce reliable results for large data sets. Application of our methods to a study of vascular disease demonstrates their utility to identify novel direct effects of single-nucleotide polymorphisms (SNPs) on coronary heart disease and peripheral artery disease, while disentangling the effects of SNPs on the intermediate risk factors including lipids, cigarette smoking, systolic blood pressure, and type 2 diabetes.
Collapse
Affiliation(s)
- Daniel J Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA
| | - Ozan Dikilitas
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Jason P Sinnwell
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
5
|
Haralambieva IH, Eberhard KG, Ovsyannikova IG, Grill DE, Schaid DJ, Kennedy RB, Poland GA. Transcriptional signatures associated with rubella virus-specific humoral immunity after a third dose of MMR vaccine in women of childbearing age. Eur J Immunol 2021; 51:1824-1838. [PMID: 33818775 PMCID: PMC9841595 DOI: 10.1002/eji.202049054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 03/03/2021] [Accepted: 12/17/2020] [Indexed: 01/19/2023]
Abstract
Multiple factors linked to host genetics/inherent biology play a role in interindividual variability in immune response outcomes after rubella vaccination. In order to identify these factors, we conducted a study of rubella-specific humoral immunity before (Baseline) and after (Day 28) a third dose of MMR-II vaccine in a cohort of 109 women of childbearing age. We performed mRNA-Seq profiling of PBMCs after rubella virus in vitro stimulation to delineate genes associated with post-vaccination rubella humoral immunity and to define genes mediating the association between prior immune response status (high or low antibody) and subsequent immune response outcome. Our study identified novel genes that mediated the association between prior immune response and neutralizing antibody titer after a third MMR vaccine dose. These genes included the following: CDC34; CSNK1D; APOBEC3F; RAD18; AAAS; SLC37A1; FAS; and JAK2. The encoded proteins are involved in innate antiviral response, IFN/cytokine signaling, B cell repertoire generation, the clonal selection of B lymphocytes in germinal centers, and somatic hypermutation/antibody affinity maturation to promote optimal antigen-specific B cell immune function. These data advance our understanding of how subjects' prior immune status and/or genetic propensity to respond to rubella/MMR vaccination ultimately affects innate immunity and humoral immune outcomes after vaccination.
Collapse
Affiliation(s)
| | | | | | - Diane E. Grill
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - Daniel J. Schaid
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - Richard B. Kennedy
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, MN 55905, USA
| | - Gregory A. Poland
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
6
|
Zeng P, Shao Z, Zhou X. Statistical methods for mediation analysis in the era of high-throughput genomics: Current successes and future challenges. Comput Struct Biotechnol J 2021; 19:3209-3224. [PMID: 34141140 PMCID: PMC8187160 DOI: 10.1016/j.csbj.2021.05.042] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 05/21/2021] [Accepted: 05/21/2021] [Indexed: 12/12/2022] Open
Abstract
Mediation analysis investigates the intermediate mechanism through which an exposure exerts its influence on the outcome of interest. Mediation analysis is becoming increasingly popular in high-throughput genomics studies where a common goal is to identify molecular-level traits, such as gene expression or methylation, which actively mediate the genetic or environmental effects on the outcome. Mediation analysis in genomics studies is particularly challenging, however, thanks to the large number of potential mediators measured in these studies as well as the composite null nature of the mediation effect hypothesis. Indeed, while the standard univariate and multivariate mediation methods have been well-established for analyzing one or multiple mediators, they are not well-suited for genomics studies with a large number of mediators and often yield conservative p-values and limited power. Consequently, over the past few years many new high-dimensional mediation methods have been developed for analyzing the large number of potential mediators collected in high-throughput genomics studies. In this work, we present a thorough review of these important recent methodological advances in high-dimensional mediation analysis. Specifically, we describe in detail more than ten high-dimensional mediation methods, focusing on their motivations, basic modeling ideas, specific modeling assumptions, practical successes, methodological limitations, as well as future directions. We hope our review will serve as a useful guidance for statisticians and computational biologists who develop methods of high-dimensional mediation analysis as well as for analysts who apply mediation methods to high-throughput genomics studies.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhonghe Shao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor 48109, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor 48109, MI, USA
| |
Collapse
|
7
|
Xue H, Pan W. Inferring causal direction between two traits in the presence of horizontal pleiotropy with GWAS summary data. PLoS Genet 2020; 16:e1009105. [PMID: 33137120 PMCID: PMC7660933 DOI: 10.1371/journal.pgen.1009105] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 11/12/2020] [Accepted: 09/08/2020] [Indexed: 01/14/2023] Open
Abstract
Orienting the causal relationship between pairs of traits is a fundamental task in scientific research with significant implications in practice, such as in prioritizing molecular targets and modifiable risk factors for developing therapeutic and interventional strategies for complex diseases. A recent method, called Steiger’s method, using a single SNP as an instrument variable (IV) in the framework of Mendelian randomization (MR), has since been widely applied. We report the following new contributions. First, we propose a single SNP-based alternative, overcoming a severe limitation of Steiger’s method in simply assuming, instead of inferring, the existence of a causal relationship. We also clarify a condition necessary for the validity of the methods in the presence of hidden confounding. Second, to improve statistical power, we propose combining the results from multiple, and possibly correlated, SNPs as multiple instruments. Third, we develop three goodness-of-fit tests to check modeling assumptions, including those required for valid IVs. Fourth, by relaxing one of the three IV assumptions in MR, we propose several methods, including an Egger regression-like approach and its multivariable version (analogous to multivariable MR), to account for horizontal pleiotropy of the SNPs/IVs, which is often unavoidable in practice. All our methods can simultaneously infer both the existence and (if so) the direction of a causal relationship, largely expanding their applicability over that of Steiger’s method. Although we focus on uni-directional causal relationships, we also briefly discuss an extension to bi-directional relationships. Through extensive simulations and an application to infer the causal directions between low density lipoprotein (LDL) cholesterol, or high density lipoprotein (HDL) cholesterol, and coronary artery disease (CAD), we demonstrate the superior performance and advantage of our proposed methods over Steiger’s method and bi-directional MR. In particular, after accounting for horizontal pleiotropy, our method confirmed the well known causal direction from LDL to CAD, while other methods, including bi-directional MR, might fail. In spite of its importance, due to technical challenges, orienting causal relationships between pairs of traits has been largely under-studied. Mendelian randomization (MR) Steiger’s method has become increasingly used in the last two years. Here we point out several limitations with MR Steiger’s method and propose alternative approaches. First, MR Steiger’s method is based on using only one single SNP as the instrument variable (IV), for which we propose a correlation ratio-based method, called Causal Direction-Ratio, or simply CD-Ratio. An advantage of CD-Ratio is its inference of both the existence and (if so) the direction of a causal relationship, in contrast to MR Steiger’s prior assumption of the existence and its poor performance if the assumption is violated. Furthermore, CD-Ratio can be extended to combine the results from multiple, possibly correlated, SNPs with improved statistical power. Second, we propose two methods, called CD-Egger and CD-GLS, for multiple and possibly correlated SNPs while allowing horizontal pleiotropy. Third, we propose three goodness-of-fit tests to check modeling assumptions for the three proposed methods. Finally, we introduce multivariable CD-Egger, analogous to multivariable MR, as a more robust approach, and an extension of CD-Ratio to cases with possibly bi-directional causal relationships. Our numerical studies demonstrated superior performance of our proposed methods over MR Steiger and bi-directional MR. Our proposed methods, along with freely available software, are expected to be useful in practice for causal inference.
Collapse
Affiliation(s)
- Haoran Xue
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
- * E-mail:
| |
Collapse
|