1
|
Xu H, Mu S, Bao J, Davatzikos C, Shou H, Shen L. High-dimensional mediation analysis reveals the mediating role of physical activity patterns in genetic pathways leading to AD-like brain atrophy. BioData Min 2025; 18:24. [PMID: 40128806 PMCID: PMC11931790 DOI: 10.1186/s13040-025-00432-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Accepted: 02/07/2025] [Indexed: 03/26/2025] Open
Abstract
BACKGROUND Alzheimer's disease (AD) is a complex disorder that affects multiple biological systems including cognition, behavior and physical health. Unfortunately, the pathogenic mechanisms behind AD are not yet clear and the treatment options are still limited. Despite the increasing number of studies examining the pairwise relationships between genetic factors, physical activity (PA), and AD, few have successfully integrated all three domains of data, which may help reveal mechanisms and impact of these genomic and phenomic factors on AD. We use high-dimensional mediation analysis as an integrative framework to study the relationships among genetic factors, PA and AD-like brain atrophy quantified by spatial patterns of brain atrophy. RESULTS We integrate data from genetics, PA and neuroimaging measures collected from 13,425 UK Biobank samples to unveil the complex relationship among genetic risk factors, behavior and brain signatures in the contexts of aging and AD. Specifically, we used a composite imaging marker, Spatial Pattern of Abnormality for Recognition of Early AD (SPARE-AD) that characterizes AD-like brain atrophy, as an outcome variable to represent AD risk. Through GWAS, we identified single nucleotide polymorphisms (SNPs) that are significantly associated with SPARE-AD as exposure variables. We employed conventional summary statistics and functional principal component analysis to extract patterns of PA as mediators. After constructing these variables, we utilized a high-dimensional mediation analysis method, Bayesian Mediation Analysis (BAMA), to estimate potential mediating pathways between SNPs, multivariate PA signatures and SPARE-AD. BAMA incorporates Bayesian continuous shrinkage prior to select the active mediators from a large pool of candidates. We identified a total of 22 mediation pathways, indicating how genetic variants can influence SPARE-AD by altering physical activity. By comparing the results with those obtained using univariate mediation analysis, we demonstrate the advantages of high-dimensional mediation analysis methods over univariate mediation analysis. CONCLUSION Through integrative analysis of multi-omics data, we identified several mediation pathways of physical activity between genetic factors and SPARE-AD. These findings contribute to a better understanding of the pathogenic mechanisms of AD. Moreover, our research demonstrates the potential of the high-dimensional mediation analysis method in revealing the mechanisms of disease.
Collapse
Affiliation(s)
- Hanxiang Xu
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Family Medicine and Public Health, University of California, San Diego, CA, 92093, USA
| | - Shizhuo Mu
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Jingxuan Bao
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Christos Davatzikos
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Haochang Shou
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
2
|
Liu Z, Liu ZA, Hosni A, Kim J, Jiang B, Saarela O. A Bayesian joint model for mediation analysis with matrix-valued mediators. Biometrics 2024; 80:ujae143. [PMID: 39671276 DOI: 10.1093/biomtc/ujae143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 09/19/2024] [Accepted: 12/11/2024] [Indexed: 12/15/2024]
Abstract
Unscheduled treatment interruptions may lead to reduced quality of care in radiation therapy (RT). Identifying the RT prescription dose effects on the outcome of treatment interruptions, mediated through doses distributed into different organs at risk (OARs), can inform future treatment planning. The radiation exposure to OARs can be summarized by a matrix of dose-volume histograms (DVH) for each patient. Although various methods for high-dimensional mediation analysis have been proposed recently, few studies investigated how matrix-valued data can be treated as mediators. In this paper, we propose a novel Bayesian joint mediation model for high-dimensional matrix-valued mediators. In this joint model, latent features are extracted from the matrix-valued data through an adaptation of probabilistic multilinear principal components analysis (MPCA), retaining the inherent matrix structure. We derive and implement a Gibbs sampling algorithm to jointly estimate all model parameters, and introduce a Varimax rotation method to identify active indicators of mediation among the matrix-valued data. Our simulation study finds that the proposed joint model has higher efficiency in estimating causal decomposition effects compared to an alternative two-step method, and demonstrates that the mediation effects can be identified and visualized in the matrix form. We apply the method to study the effect of prescription dose on treatment interruptions in anal canal cancer patients.
Collapse
Affiliation(s)
- Zijin Liu
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario M5T 3M7, Canada
| | - Zhihui Amy Liu
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario M5T 3M7, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2M9, Canada
| | - Ali Hosni
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2M9, Canada
| | - John Kim
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2M9, Canada
| | - Bei Jiang
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, Alberta T6G 2G1, Canada
| | - Olli Saarela
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario M5T 3M7, Canada
| |
Collapse
|
3
|
Wang JX, Lu ZH, Reddick WE, Conklin HM, Glass JO, Jacola L, Onar-Thomas A, Jeha S, Cheng C, Zhou X, Li Y. High-Dimensional Mediation Analysis with Network Mediators: Applications to Pediatric Acute Lymphoblastic Leukemia. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.23.614601. [PMID: 39386504 PMCID: PMC11463675 DOI: 10.1101/2024.09.23.614601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Acute lymphoblastic leukemia (ALL) is the most common childhood cancer, with survivors frequently experiencing long-term neurocognitive morbidities. Here, we utilize the TOTXVI clinical trial data to elucidate the mechanisms underlying treatment-related neurocognitive side effects in pediatric ALL patients by incorporating brain connectivity network data. To enable such analysis, we propose a high-dimensional mediation analysis method with a novel network mediation structural shrinkage (NMSS) prior, which is particularly suited for analyzing high-dimensional brain structural connectivity network data that serve as mediators. Our method is capable of addressing the structural dependencies of brain connectivity networks including sparsity, effective degrees of nodes, and modularity, yielding accurate estimates of the high-dimensional coefficients and mediation effects. We demonstrate the effectiveness and superiority of the proposed NMSS method through simulation studies and apply it to the TOTXVI data, revealing significant mediation effects of brain connectivity on visual processing speed directed by IT intensity. The findings shed light on the potential of targeted interventions to mitigate neurocognitive deficits in pediatric ALL survivors.
Collapse
|
4
|
Domingo-Relloso A, Tellez-Plaza M, Valeri L. Methods for the Analysis of Multiple Epigenomic Mediators in Environmental Epidemiology. Curr Environ Health Rep 2024; 11:109-117. [PMID: 38386268 DOI: 10.1007/s40572-024-00436-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/12/2024] [Indexed: 02/23/2024]
Abstract
PURPOSE OF REVIEW Epigenetic changes can be highly influenced by environmental factors and have in turn been proposed to influence chronic disease. Being able to quantify to which extent epigenomic processes are mediators of the association between environmental exposures and diseases is of interest for epidemiologic research. In this review, we summarize the proposed mediation analysis methods with applications to epigenomic data. RECENT FINDINGS The ultra-high dimensionality and high correlations that characterize omics data have hindered the precise quantification of mediated effects. Several methods have been proposed to deal with mediation in high-dimensional settings, including methods that incorporate dimensionality reduction techniques to the mediation algorithm. Although important methodological advances have been conducted in the previous years, key challenges such as the development of sensitivity analyses, dealing with mediator-mediator interactions, including environmental mixtures as exposures, or the integration of different omic data should be the focus of future methodological developments for epigenomic mediation analysis.
Collapse
Affiliation(s)
- Arce Domingo-Relloso
- Department of Biostatistics, Columbia University Mailman School of Public Health, 722 West 168Th Street, New York, NY, 10032, USA.
| | - Maria Tellez-Plaza
- Department of Chronic Diseases Epidemiology, National Center for Epidemiology, Carlos III Health Institute, Madrid, Spain
| | - Linda Valeri
- Department of Biostatistics, Columbia University Mailman School of Public Health, 722 West 168Th Street, New York, NY, 10032, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
5
|
Cai Q, Fu Y, Lyu C, Wang Z, Rao S, Alvarez JA, Bai Y, Kang J, Yu T. A new framework for exploratory network mediator analysis in omics data. Genome Res 2024; 34:642-654. [PMID: 38719472 PMCID: PMC11146592 DOI: 10.1101/gr.278684.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/11/2024] [Indexed: 06/01/2024]
Abstract
Omics methods are widely used in basic biology and translational medicine research. More and more omics data are collected to explain the impact of certain risk factors on clinical outcomes. To explain the mechanism of the risk factors, a core question is how to find the genes/proteins/metabolites that mediate their effects on the clinical outcome. Mediation analysis is a modeling framework to study the relationship between risk factors and pathological outcomes, via mediator variables. However, high-dimensional omics data are far more challenging than traditional data: (1) From tens of thousands of genes, can we overcome the curse of dimensionality to reliably select a set of mediators? (2) How do we ensure that the selected mediators are functionally consistent? (3) Many biological mechanisms contain nonlinear effects. How do we include nonlinear effects in the high-dimensional mediation analysis? (4) How do we consider multiple risk factors at the same time? To meet these challenges, we propose a new exploratory mediation analysis framework, medNet, which focuses on finding mediators through predictive modeling. We propose new definitions for predictive exposure, predictive mediator, and predictive network mediator, using a statistical hypothesis testing framework to identify predictive exposures and mediators. Additionally, two heuristic search algorithms are proposed to identify network mediators, essentially subnetworks in the genome-scale biological network that mediate the effects of single or multiple exposures. We applied medNet on a breast cancer data set and a metabolomics data set combined with food intake questionnaire data. It identified functionally consistent network mediators for the exposures' impact on the outcome, facilitating data interpretation.
Collapse
Affiliation(s)
- Qingpo Cai
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia 30322, USA
| | - Yinghao Fu
- Shenzhen Research Institute of Big Data, School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China
- School of Medicine, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China
| | - Cheng Lyu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia 30322, USA
| | - Zihe Wang
- Shenzhen Research Institute of Big Data, School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China
| | - Shun Rao
- Shenzhen Research Institute of Big Data, School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China
| | - Jessica A Alvarez
- Department of Medicine, Emory University, Atlanta, Georgia 30322, USA
| | - Yun Bai
- School of Medicine, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Tianwei Yu
- Shenzhen Research Institute of Big Data, School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Guangdong 518172, P.R. China;
| |
Collapse
|
6
|
Clark-Boucher D, Zhou X, Du J, Liu Y, Needham BL, Smith JA, Mukherjee B. Methods for mediation analysis with high-dimensional DNA methylation data: Possible choices and comparisons. PLoS Genet 2023; 19:e1011022. [PMID: 37934796 PMCID: PMC10655967 DOI: 10.1371/journal.pgen.1011022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 11/17/2023] [Accepted: 10/18/2023] [Indexed: 11/09/2023] Open
Abstract
Epigenetic researchers often evaluate DNA methylation as a potential mediator of the effect of social/environmental exposures on a health outcome. Modern statistical methods for jointly evaluating many mediators have not been widely adopted. We compare seven methods for high-dimensional mediation analysis with continuous outcomes through both diverse simulations and analysis of DNAm data from a large multi-ethnic cohort in the United States, while providing an R package for their seamless implementation and adoption. Among the considered choices, the best-performing methods for detecting active mediators in simulations are the Bayesian sparse linear mixed model (BSLMM) and high-dimensional mediation analysis (HDMA); while the preferred methods for estimating the global mediation effect are high-dimensional linear mediation analysis (HILMA) and principal component mediation analysis (PCMA). We provide guidelines for epigenetic researchers on choosing the best method in practice and offer suggestions for future methodological development.
Collapse
Affiliation(s)
- Dylan Clark-Boucher
- Department of Biostatistics, Harvard T.H. Chan School of Public Health; Boston, Massachusetts, United States of America
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan; Ann Arbor, Michigan, United States of America
| | - Jiacong Du
- Department of Biostatistics, University of Michigan; Ann Arbor, Michigan, United States of America
| | - Yongmei Liu
- Department of Medicine, Divisions of Cardiology and Neurology, Duke University Medical Center; Durham, North Carolina, United States of America
| | - Belinda L. Needham
- Department of Epidemiology, University of Michigan; Ann Arbor, Michigan, United States of America
| | - Jennifer A. Smith
- Department of Epidemiology, University of Michigan; Ann Arbor, Michigan, United States of America
- Survey Research Center, Institute for Social Research, University of Michigan; Ann Arbor, Michigan, United States of America
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan; Ann Arbor, Michigan, United States of America
- Department of Epidemiology, University of Michigan; Ann Arbor, Michigan, United States of America
| |
Collapse
|
7
|
Fu J, Koslovsky MD, Neophytou AM, Vannucci M. A Bayesian joint model for compositional mediation effect selection in microbiome data. Stat Med 2023. [PMID: 37173609 DOI: 10.1002/sim.9764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 04/17/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]
Abstract
Analyzing multivariate count data generated by high-throughput sequencing technology in microbiome research studies is challenging due to the high-dimensional and compositional structure of the data and overdispersion. In practice, researchers are often interested in investigating how the microbiome may mediate the relation between an assigned treatment and an observed phenotypic response. Existing approaches designed for compositional mediation analysis are unable to simultaneously determine the presence of direct effects, relative indirect effects, and overall indirect effects, while quantifying their uncertainty. We propose a formulation of a Bayesian joint model for compositional data that allows for the identification, estimation, and uncertainty quantification of various causal estimands in high-dimensional mediation analysis. We conduct simulation studies and compare our method's mediation effects selection performance with existing methods. Finally, we apply our method to a benchmark data set investigating the sub-therapeutic antibiotic treatment effect on body weight in early-life mice.
Collapse
Affiliation(s)
- Jingyan Fu
- Department of Statistics, Rice University, Houston, Texas, USA
| | - Matthew D Koslovsky
- Department of Statistics, Colorado State University, Fort Collins, Colorado, USA
| | - Andreas M Neophytou
- Department of Environmental & Radiological Health Sciences, Colorado State University, Fort Collins, Colorado, USA
| | - Marina Vannucci
- Department of Statistics, Rice University, Houston, Texas, USA
| |
Collapse
|
8
|
Shang L, Zhao W, Wang YZ, Li Z, Choi JJ, Kho M, Mosley TH, Kardia SLR, Smith JA, Zhou X. meQTL mapping in the GENOA study reveals genetic determinants of DNA methylation in African Americans. Nat Commun 2023; 14:2711. [PMID: 37169753 PMCID: PMC10175543 DOI: 10.1038/s41467-023-37961-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 04/07/2023] [Indexed: 05/13/2023] Open
Abstract
Identifying genetic variants that are associated with variation in DNA methylation, an analysis commonly referred to as methylation quantitative trait locus (meQTL) mapping, is an important first step towards understanding the genetic architecture underlying epigenetic variation. Most existing meQTL mapping studies have focused on individuals of European ancestry and are underrepresented in other populations, with a particular absence of large studies in populations with African ancestry. We fill this critical knowledge gap by performing a large-scale cis-meQTL mapping study in 961 African Americans from the Genetic Epidemiology Network of Arteriopathy (GENOA) study. We identify a total of 4,565,687 cis-acting meQTLs in 320,965 meCpGs. We find that 45% of meCpGs harbor multiple independent meQTLs, suggesting potential polygenic genetic architecture underlying methylation variation. A large percentage of the cis-meQTLs also colocalize with cis-expression QTLs (eQTLs) in the same population. Importantly, the identified cis-meQTLs explain a substantial proportion (median = 24.6%) of methylation variation. In addition, the cis-meQTL associated CpG sites mediate a substantial proportion (median = 24.9%) of SNP effects underlying gene expression. Overall, our results represent an important step toward revealing the co-regulation of methylation and gene expression, facilitating the functional interpretation of epigenetic and gene regulation underlying common diseases in African Americans.
Collapse
Affiliation(s)
- Lulu Shang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Yi Zhe Wang
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Zheng Li
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Jerome J Choi
- Population Health Sciences, University of Wisconsin-Madison School of Medicine and Public Health, Madison, WI, 53726, USA
| | - Minjung Kho
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Thomas H Mosley
- Memory Impairment and Neurodegenerative Dementia (MIND) Center, University of Mississippi Medical Center, Jackson, MS, 39126, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA.
| | - Xiang Zhou
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
9
|
Genetical analysis of mastitis and reproductive traits in first-parity Holstein cows using standard and structural equation modelling. Animal 2023; 17:100777. [PMID: 37043934 DOI: 10.1016/j.animal.2023.100777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 03/03/2023] [Accepted: 03/07/2023] [Indexed: 03/17/2023] Open
Abstract
The present study aimed to investigate the causal relationships between clinical mastitis and some reproductive traits, including success at first insemination (SFI), the number of inseminations to pregnancy (INS), the interval from calving to first service (CTFS), first and last service interval (IFL), and open days (OD) in first-parity Holstein cows. For this purpose, the records of 58 281 first parity Holstein cows were analysed. These data sets were collected from 17 large dairy herds from 2008 to 2017. Recursive Mixed Models (RMMs) were applied and compared with the estimations under Standard Mixed Models. Then, one trivariate and three bivariate Gaussian-threshold models were used for the analyses. Recursive models were applied, considering that clinical mastitis can influence fertility traits. Mastitis is considered a covariate for the reproductive traits to determine their causal relationship. The results of this study indicated that causal effects of mastitis on SFI (on the observed scale, %), CTFS, IFL, OD, and INS were -5.7%, 3.3 days, 12.27 days, seven days, and 0.26 services, respectively. The estimated structural coefficients of the recursive models in the first parity imply that mastitis significantly lengthened the fertility interval and decreased the conception rate. In addition, genetic, residual, and phenotypic correlations between mastitis and the reproductive traits under both models were statistically significant. Results of genetic correlations between mastitis and fertility traits suggest that more incidence of mastitis during lactation is related to the delays in the heat show and pregnancy rate after insemination. In summary, considering the causal effects under RMMs may be advantageous to comprehend complicated relationships between complex traits better.
Collapse
|
10
|
Clark-Boucher D, Zhou X, Du J, Liu Y, Needham BL, Smith JA, Mukherjee B. Methods for Mediation Analysis with High-Dimensional DNA Methylation Data: Possible Choices and Comparison. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.10.23285764. [PMID: 36824903 PMCID: PMC9949196 DOI: 10.1101/2023.02.10.23285764] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
Epigenetic researchers often evaluate DNA methylation as a mediator between social/environmental exposures and disease, but modern statistical methods for jointly evaluating many mediators have not been widely adopted. We compare seven methods for high-dimensional mediation analysis with continuous outcomes through both diverse simulations and analysis of DNAm data from a large national cohort in the United States, while providing an R package for their implementation. Among the considered choices, the best-performing methods for detecting active mediators in simulations are the Bayesian sparse linear mixed model by Song et al. (2020) and high-dimensional mediation analysis by Gao et al. (2019); while the superior methods for estimating the global mediation effect are high-dimensional linear mediation analysis by Zhou et al. (2021) and principal component mediation analysis by Huang and Pan (2016). We provide guidelines for epigenetic researchers on choosing the best method in practice and offer suggestions for future methodological development.
Collapse
Affiliation(s)
- Dylan Clark-Boucher
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI
| | - Jiacong Du
- Department of Biostatistics, University of Michigan, Ann Arbor, MI
| | - Yongmei Liu
- Department of Medicine, Divisions of Cardiology and Neurology, Duke University Medical Center, Durham, NC
| | | | - Jennifer A Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, MI
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI
- Department of Epidemiology, University of Michigan, Ann Arbor, MI
| |
Collapse
|
11
|
Bayesian joint modeling for causal mediation analysis with a binary outcome and a binary mediator: Exploring the role of obesity in the association between cranial radiation therapy for childhood acute lymphoblastic leukemia treatment and the long-term risk of insulin resistance. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2022.107586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
12
|
Han Q, Wang Y, Sun N, Chu J, Hu W, Shen Y. Mediation analysis method review of high throughput data. Stat Appl Genet Mol Biol 2023; 22:sagmb-2023-0031. [PMID: 38015771 DOI: 10.1515/sagmb-2023-0031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 11/11/2023] [Indexed: 11/30/2023]
Abstract
High-throughput technologies have made high-dimensional settings increasingly common, providing opportunities for the development of high-dimensional mediation methods. We aimed to provide useful guidance for researchers using high-dimensional mediation analysis and ideas for biostatisticians to develop it by summarizing and discussing recent advances in high-dimensional mediation analysis. The method still faces many challenges when extended single and multiple mediation analyses to high-dimensional settings. The development of high-dimensional mediation methods attempts to address these issues, such as screening true mediators, estimating mediation effects by variable selection, reducing the mediation dimension to resolve correlations between variables, and utilizing composite null hypothesis testing to test them. Although these problems regarding high-dimensional mediation have been solved to some extent, some challenges remain. First, the correlation between mediators are rarely considered when the variables are selected for mediation. Second, downscaling without incorporating prior biological knowledge makes the results difficult to interpret. In addition, a method of sensitivity analysis for the strict sequential ignorability assumption in high-dimensional mediation analysis is still lacking. An analyst needs to consider the applicability of each method when utilizing them, while a biostatistician could consider extensions and improvements in the methodology.
Collapse
Affiliation(s)
- Qiang Han
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
| | - Yu Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
| | - Na Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
| | - Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
| | - Wei Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou 215123, China
| |
Collapse
|
13
|
Wang YZ, Zhao W, Ammous F, Song Y, Du J, Shang L, Ratliff SM, Moore K, Kelly KM, Needham BL, Diez Roux AV, Liu Y, Butler KR, Kardia SLR, Mukherjee B, Zhou X, Smith JA. DNA Methylation Mediates the Association Between Individual and Neighborhood Social Disadvantage and Cardiovascular Risk Factors. Front Cardiovasc Med 2022; 9:848768. [PMID: 35665255 PMCID: PMC9162507 DOI: 10.3389/fcvm.2022.848768] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 04/29/2022] [Indexed: 12/14/2022] Open
Abstract
Low socioeconomic status (SES) and living in a disadvantaged neighborhood are associated with poor cardiovascular health. Multiple lines of evidence have linked DNA methylation to both cardiovascular risk factors and social disadvantage indicators. However, limited research has investigated the role of DNA methylation in mediating the associations of individual- and neighborhood-level disadvantage with multiple cardiovascular risk factors in large, multi-ethnic, population-based cohorts. We examined whether disadvantage at the individual level (childhood and adult SES) and neighborhood level (summary neighborhood SES as assessed by Census data and social environment as assessed by perceptions of aesthetic quality, safety, and social cohesion) were associated with 11 cardiovascular risk factors including measures of obesity, diabetes, lipids, and hypertension in 1,154 participants from the Multi-Ethnic Study of Atherosclerosis (MESA). For significant associations, we conducted epigenome-wide mediation analysis to identify methylation sites mediating the relationship between individual/neighborhood disadvantage and cardiovascular risk factors using the JT-Comp method that assesses sparse mediation effects under a composite null hypothesis. In models adjusting for age, sex, race/ethnicity, smoking, medication use, and genetic principal components of ancestry, epigenetic mediation was detected for the associations of adult SES with body mass index (BMI), insulin, and high-density lipoprotein cholesterol (HDL-C), as well as for the association between neighborhood socioeconomic disadvantage and HDL-C at FDR q < 0.05. The 410 CpG mediators identified for the SES-BMI association were enriched for CpGs associated with gene expression (expression quantitative trait methylation loci, or eQTMs), and corresponding genes were enriched in antigen processing and presentation pathways. For cardiovascular risk factors other than BMI, most of the epigenetic mediators lost significance after controlling for BMI. However, 43 methylation sites showed evidence of mediating the neighborhood socioeconomic disadvantage and HDL-C association after BMI adjustment. The identified mediators were enriched for eQTMs, and corresponding genes were enriched in inflammatory and apoptotic pathways. Our findings support the hypothesis that DNA methylation acts as a mediator between individual- and neighborhood-level disadvantage and cardiovascular risk factors, and shed light on the potential underlying epigenetic pathways. Future studies are needed to fully elucidate the biological mechanisms that link social disadvantage to poor cardiovascular health.
Collapse
Affiliation(s)
- Yi Zhe Wang
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Farah Ammous
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Yanyi Song
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Jiacong Du
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Lulu Shang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Scott M. Ratliff
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Kari Moore
- Urban Health Collaborative, Drexel University, Philadelphia, PA, United States
| | - Kristen M. Kelly
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Belinda L. Needham
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Ana V. Diez Roux
- Department of Epidemiology and Biostatistics, Dornsife School of Public Health, Drexel University, Philadelphia, PA, United States
| | - Yongmei Liu
- Division of Cardiology, Department of Medicine, Duke University School of Medicine, Durham, NC, United States
| | - Kenneth R. Butler
- Department of Medicine, Division of Geriatrics, University of Mississippi Medical Center, Jackson, MS, United States
| | - Sharon L. R. Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Bhramar Mukherjee
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Xiang Zhou
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, United States
| | - Jennifer A. Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, United States
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
14
|
Song Y, Zhou X, Kang J, Aung MT, Zhang M, Zhao W, Needham BL, Kardia SLR, Liu Y, Meeker JD, Smith JA, Mukherjee B. Bayesian hierarchical models for high-dimensional mediation analysis with coordinated selection of correlated mediators. Stat Med 2021; 40:6038-6056. [PMID: 34404112 PMCID: PMC9257993 DOI: 10.1002/sim.9168] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 07/30/2021] [Accepted: 08/05/2021] [Indexed: 01/18/2023]
Abstract
We consider Bayesian high-dimensional mediation analysis to identify among a large set of correlated potential mediators the active ones that mediate the effect from an exposure variable to an outcome of interest. Correlations among mediators are commonly observed in modern data analysis; examples include the activated voxels within connected regions in brain image data, regulatory signals driven by gene networks in genome data, and correlated exposure data from the same source. When correlations are present among active mediators, mediation analysis that fails to account for such correlation can be suboptimal and may lead to a loss of power in identifying active mediators. Building upon a recent high-dimensional mediation analysis framework, we propose two Bayesian hierarchical models, one with a Gaussian mixture prior that enables correlated mediator selection and the other with a Potts mixture prior that accounts for the correlation among active mediators in mediation analysis. We develop efficient sampling algorithms for both methods. Various simulations demonstrate that our methods enable effective identification of correlated active mediators, which could be missed by using existing methods that assume prior independence among active mediators. The proposed methods are applied to the LIFECODES birth cohort and the Multi-Ethnic Study of Atherosclerosis (MESA) and identified new active mediators with important biological implications.
Collapse
Affiliation(s)
- Yanyi Song
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan USA
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan USA
| | - Max T. Aung
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan USA
| | - Min Zhang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan USA
| | - Wei Zhao
- Department of Epidemiology, University of Michigan, Ann Arbor, Michigan USA
| | - Belinda L. Needham
- Department of Epidemiology, University of Michigan, Ann Arbor, Michigan USA
| | | | - Yongmei Liu
- Division of Cardiology, Department of Medicine, Duke University School of Medicine, Durham, North Carolina USA
| | - John D. Meeker
- Department of Environmental Health Sciences, University of Michigan, Ann Arbor, Michigan USA
| | - Jennifer A. Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, Michigan USA
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan USA
| |
Collapse
|