1
|
Castelletti F. Learning Bayesian Networks: A Copula Approach for Mixed-Type Data. PSYCHOMETRIKA 2024; 89:658-686. [PMID: 38609693 DOI: 10.1007/s11336-024-09969-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 03/14/2024] [Indexed: 04/14/2024]
Abstract
Estimating dependence relationships between variables is a crucial issue in many applied domains and in particular psychology. When several variables are entertained, these can be organized into a network which encodes their set of conditional dependence relations. Typically however, the underlying network structure is completely unknown or can be partially drawn only; accordingly it should be learned from the available data, a process known as structure learning. In addition, data arising from social and psychological studies are often of different types, as they can include categorical, discrete and continuous measurements. In this paper, we develop a novel Bayesian methodology for structure learning of directed networks which applies to mixed data, i.e., possibly containing continuous, discrete, ordinal and binary variables simultaneously. Whenever available, our method can easily incorporate known dependence structures among variables represented by paths or edge directions that can be postulated in advance based on the specific problem under consideration. We evaluate the proposed method through extensive simulation studies, with appreciable performances in comparison with current state-of-the-art alternative methods. Finally, we apply our methodology to well-being data from a social survey promoted by the United Nations, and mental health data collected from a cohort of medical students. R code implementing the proposed methodology is available at https://github.com/FedeCastelletti/bayes_networks_mixed_data .
Collapse
Affiliation(s)
- Federico Castelletti
- Department of Statistical Sciences, Universitá Cattolica del Sacro Cuore, Milan, Italy.
| |
Collapse
|
2
|
Castelletti F, Consonni G. Bayesian graphical modeling for heterogeneous causal effects. Stat Med 2023; 42:15-32. [PMID: 36317356 DOI: 10.1002/sim.9599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 09/08/2022] [Accepted: 10/15/2022] [Indexed: 12/24/2022]
Abstract
There is a growing interest in current medical research to develop personalized treatments using a molecular-based approach. The broad goal is to implement a more precise and targeted decision-making process, relative to traditional treatments based primarily on clinical diagnoses. Specifically, we consider patients affected by Acute Myeloid Leukemia (AML), an hematological cancer characterized by uncontrolled proliferation of hematopoietic stem cells in the bone marrow. Because AML responds poorly to chemotherapeutic treatments, the development of targeted therapies is essential to improve patients' prospects. In particular, the dataset we analyze contains the levels of proteins involved in cell cycle regulation and linked to the progression of the disease. We evaluate treatment effects within a causal framework represented by a Directed Acyclic Graph (DAG) model, whose vertices are the protein levels in the network. A major obstacle in implementing the above program is represented by individual heterogeneity. We address this issue through a Dirichlet Process (DP) mixture of Gaussian DAG-models where both the graphical structure as well as the allied model parameters are regarded as uncertain. Our procedure determines a clustering structure of the units reflecting the underlying heterogeneity, and produces subject-specific estimates of causal effects based on Bayesian Model Averaging (BMA). With reference to the AML dataset, we identify different effects of protein regulation among individuals; moreover, our method clusters patients into groups that exhibit only mild similarities with traditional categories based on morphological features.
Collapse
Affiliation(s)
- Federico Castelletti
- Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
| | - Guido Consonni
- Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
| |
Collapse
|
3
|
Choi S, Kim Y, Park G. Densely connected sub-Gaussian linear structural equation model learning via ℓ1- and ℓ2-regularized regressions. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2023.107691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
4
|
Samanta S, Khare K, Michailidis G. A generalized likelihood-based Bayesian approach for scalable joint regression and covariance selection in high dimensions. STATISTICS AND COMPUTING 2022; 32:47. [PMID: 36713060 PMCID: PMC9881595 DOI: 10.1007/s11222-022-10102-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 04/27/2022] [Indexed: 06/05/2023]
Abstract
The paper addresses joint sparsity selection in the regression coefficient matrix and the error precision (inverse covariance) matrix for high-dimensional multivariate regression models in the Bayesian paradigm. The selected sparsity patterns are crucial to help understand the network of relationships between the predictor and response variables, as well as the conditional relationships among the latter. While Bayesian methods have the advantage of providing natural uncertainty quantification through posterior inclusion probabilities and credible intervals, current Bayesian approaches either restrict to specific sub-classes of sparsity patterns and/or are not scalable to settings with hundreds of responses and predictors. Bayesian approaches which only focus on estimating the posterior mode are scalable, but do not generate samples from the posterior distribution for uncertainty quantification. Using a bi-convex regression based generalized likelihood and spike-and-slab priors, we develop an algorithm called Joint Regression Network Selector (JRNS) for joint regression and covariance selection which (a) can accommodate general sparsity patterns, (b) provides posterior samples for uncertainty quantification, and (c) is scalable and orders of magnitude faster than the state-of-the-art Bayesian approaches providing uncertainty quantification. We demonstrate the statistical and computational efficacy of the proposed approach on synthetic data and through the analysis of selected cancer data sets. We also establish high-dimensional posterior consistency for one of the developed algorithms.
Collapse
|
5
|
Castelletti F, Consonni G, La Rocca L. Discussion to: Bayesian graphical models for modern biological applications by Y. Ni, V. Baladandayuthapani, M. Vannucci and F.C. Stingo. STAT METHOD APPL-GER 2022. [DOI: 10.1007/s10260-021-00601-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
6
|
Lee K, Cao X. Bayesian joint inference for multiple directed acyclic graphs. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2022.105003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
7
|
Affiliation(s)
| | - Stefano Peluso
- Department of Statistics and Quantitative Methods, Università degli Studi di Milano-Bicocca, Milan
| |
Collapse
|
8
|
Zareifard H, Rezaei Tabar V, Plewczynski D. A Gibbs sampler for learning DAG: a unification for discrete and Gaussian domains. J STAT COMPUT SIM 2021. [DOI: 10.1080/00949655.2021.1909026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
| | - Vahid Rezaei Tabar
- Department of Statistics, Allameh Tabataba'i University, Tehran, Iran
- School of Biological Sciences, IPM, Tehran, Iran
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| |
Collapse
|
9
|
Ghosh S, Khare K, Michailidis G. Strong selection consistency of Bayesian vector autoregressive models based on a pseudo-likelihood approach. Ann Stat 2021. [DOI: 10.1214/20-aos1992] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Satyajit Ghosh
- Department of Statistics and Biostatistics, Rutgers University
| | | | | |
Collapse
|
10
|
Niu Y, Pati D, Mallick BK. Bayesian graph selection consistency under model misspecification. BERNOULLI 2021; 27:637-672. [PMID: 34305432 PMCID: PMC8300537 DOI: 10.3150/20-bej1253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Gaussian graphical models are a popular tool to learn the dependence structure in the form of a graph among variables of interest. Bayesian methods have gained in popularity in the last two decades due to their ability to simultaneously learn the covariance and the graph. There is a wide variety of model-based methods to learn the underlying graph assuming various forms of the graphical structure. Although for scalability of the Markov chain Monte Carlo algorithms, decomposability is commonly imposed on the graph space, its possible implication on the posterior distribution of the graph is not clear. An open problem in Bayesian decomposable structure learning is whether the posterior distribution is able to select a meaningful decomposable graph that is "close" to the true non-decomposable graph, when the dimension of the variables increases with the sample size. In this article, we explore specific conditions on the true precision matrix and the graph, which results in an affirmative answer to this question with a commonly used hyper-inverse Wishart prior on the covariance matrix and a suitable complexity prior on the graph space. In absence of structural sparsity assumptions, our strong selection consistency holds in a high-dimensional setting where p = O(nα ) for α < 1/3. We show when the true graph is non-decomposable, the posterior distribution concentrates on a set of graphs that are minimal triangulations of the true graph.
Collapse
Affiliation(s)
- Yabo Niu
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Debdeep Pati
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Bani K Mallick
- Department of Statistics, Texas A&M University, College Station, TX, USA
| |
Collapse
|
11
|
Affiliation(s)
| | - Xuan Cao
- Department of Mathematical Sciences, University of Cincinnati
| |
Collapse
|
12
|
Zhang H, Huang X, Han S, Rezwan FI, Karmaus W, Arshad H, Holloway JW. Gaussian Bayesian network comparisons with graph ordering unknown. Comput Stat Data Anal 2020; 157. [PMID: 33408431 DOI: 10.1016/j.csda.2020.107156] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
A Bayesian approach is proposed that unifies Gaussian Bayesian network constructions and comparisons between two networks (identical or differential) for data with graph ordering unknown. When sampling graph ordering, to escape from local maximums, an adjusted single queue equi-energy algorithm is applied. The conditional posterior probability mass function for network differentiation is derived and its asymptotic proposition is theoretically assessed. Simulations are used to demonstrate the approach and compare with existing methods. Based on epigenetic data at a set of DNA methylation sites (CpG sites), the proposed approach is further examined on its ability to detect network differentiations. Findings from theoretical assessment, simulations, and real data applications support the efficacy and efficiency of the proposed method for network comparisons.
Collapse
Affiliation(s)
- Hongmei Zhang
- Division of Epidemiology, Biostatistics, and Environmental Health, School of Public Health, University of Memphis, Memphis, TN, USA
| | - Xianzheng Huang
- Department of Statistics, University of South Carolina, Columbia, SC, USA
| | - Shengtong Han
- Joseph J. Zilber School of Public Health, University of Wisconsin, Milwaukee, WI, USA
| | - Faisal I Rezwan
- School of Water, Energy and Environment, Cranfield University, Cranfield, Bedfordshire, UK
| | - Wilfried Karmaus
- Division of Epidemiology, Biostatistics, and Environmental Health, School of Public Health, University of Memphis, Memphis, TN, USA
| | - Hasan Arshad
- Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Sothampton, UK
- David Hide Asthma and Allergy Research Centre, Isle of Wight, UK
| | - John W Holloway
- Human Development and Health, Faculty of Medicine, University of Southampton, Sothampton, UK
| |
Collapse
|
13
|
Cao X, Lee K, Huang Q. Bayesian variable selection in logistic regression with application to whole-brain functional connectivity analysis for Parkinson's disease. Stat Methods Med Res 2020; 30:826-842. [PMID: 33308007 DOI: 10.1177/0962280220978990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Parkinson's disease is a progressive, chronic, and neurodegenerative disorder that is primarily diagnosed by clinical examinations and magnetic resonance imaging (MRI). In this paper, we propose a Bayesian model to predict Parkinson's disease employing a functional MRI (fMRI) based radiomics approach. We consider a spike and slab prior for variable selection in high-dimensional logistic regression models, and present an approximate Gibbs sampler by replacing a logistic distribution with a t-distribution. Under mild conditions, we establish model selection consistency of the induced posterior and illustrate the performance of the proposed method outperforms existing state-of-the-art methods through simulation studies. In fMRI analysis, 6216 whole-brain functional connectivity features are extracted for 50 healthy controls along with 70 Parkinson's disease patients. We apply our method to the resulting dataset and further show its benefits with a higher average prediction accuracy of 0.83 compared to other contenders based on 10 random splits. The model fitting procedure also reveals the most discriminative brain regions for Parkinson's disease. These findings demonstrate that the proposed Bayesian variable selection method has the potential to support radiological diagnosis for patients with Parkinson's disease.
Collapse
Affiliation(s)
- Xuan Cao
- Division of Statistics and Data Science, Department of Mathematical Sciences, University of Cincinnati
| | - Kyoungjae Lee
- Department of Statistics, Inha University, Incheon, Korea
| | - Qingling Huang
- Department of Radiology, Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China
| |
Collapse
|
14
|
Cao X, Khare K, Ghosh M. Consistent Bayesian sparsity selection for high-dimensional Gaussian DAG models with multiplicative and beta-mixture priors. J MULTIVARIATE ANAL 2020. [DOI: 10.1016/j.jmva.2020.104628] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
15
|
Castelletti F. Bayesian Model Selection of Gaussian Directed Acyclic Graph Structures. Int Stat Rev 2020. [DOI: 10.1111/insr.12379] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Federico Castelletti
- Department of Statistical Sciences Università Cattolica del Sacro Cuore Milano Italy
| |
Collapse
|
16
|
Lee K, Cao X. Bayesian group selection in logistic regression with application to MRI data analysis. Biometrics 2020; 77:391-400. [PMID: 32365231 DOI: 10.1111/biom.13290] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 04/24/2020] [Accepted: 04/27/2020] [Indexed: 12/22/2022]
Abstract
We consider Bayesian logistic regression models with group-structured covariates. In high-dimensional settings, it is often assumed that only a small portion of groups are significant, and thus, consistent group selection is of significant importance. While consistent frequentist group selection methods have been proposed, theoretical properties of Bayesian group selection methods for logistic regression models have not been investigated yet. In this paper, we consider a hierarchical group spike and slab prior for logistic regression models in high-dimensional settings. Under mild conditions, we establish strong group selection consistency of the induced posterior, which is the first theoretical result in the Bayesian literature. Through simulation studies, we demonstrate that the proposed method outperforms existing state-of-the-art methods in various settings. We further apply our method to a magnetic resonance imaging data set for predicting Parkinson's disease and show its benefits over other contenders.
Collapse
Affiliation(s)
- Kyoungjae Lee
- Department of Statistics, Inha University, Incheon, South Korea
| | - Xuan Cao
- Department of Mathematical Sciences, University of Cincinnati, Cincinnati, Ohio
| |
Collapse
|
17
|
Castelletti F, Consonni G. Bayesian inference of causal effects from observational data in Gaussian graphical models. Biometrics 2020; 77:136-149. [PMID: 32294233 DOI: 10.1111/biom.13281] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Revised: 03/25/2020] [Accepted: 03/30/2020] [Indexed: 11/30/2022]
Abstract
We assume that multivariate observational data are generated from a distribution whose conditional independencies are encoded in a Directed Acyclic Graph (DAG). For any given DAG, the causal effect of a variable onto another one can be evaluated through intervention calculus. A DAG is typically not identifiable from observational data alone. However, its Markov equivalence class (a collection of DAGs) can be estimated from the data. As a consequence, for the same intervention a set of causal effects, one for each DAG in the equivalence class, can be evaluated. In this paper, we propose a fully Bayesian methodology to make inference on the causal effects of any intervention in the system. Main features of our method are: (a) both uncertainty on the equivalence class and the causal effects are jointly modeled; (b) priors on the parameters of the modified Cholesky decomposition of the precision matrices across all DAG models are constructively assigned starting from a unique prior on the complete (unrestricted) DAG; (c) an efficient algorithm to sample from the posterior distribution on graph space is adopted; (d) an objective Bayes approach, requiring virtually no user specification, is used throughout. We demonstrate the merits of our methodology in simulation studies, wherein comparisons with current state-of-the-art procedures turn out to be highly satisfactory. Finally we examine a real data set of gene expressions for Arabidopsis thaliana.
Collapse
Affiliation(s)
- Federico Castelletti
- Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
| | - Guido Consonni
- Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
| |
Collapse
|
18
|
Córdoba I, Bielza C, Larrañaga P. A review of Gaussian Markov models for conditional independence. J Stat Plan Inference 2020. [DOI: 10.1016/j.jspi.2019.09.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
19
|
Cao X, Ding L, Mersha TB. Joint variable selection and network modeling for detecting eQTLs. Stat Appl Genet Mol Biol 2020; 19:/j/sagmb.ahead-of-print/sagmb-2019-0032/sagmb-2019-0032.xml. [PMID: 32078577 DOI: 10.1515/sagmb-2019-0032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In this study, we conduct a comparison of three most recent statistical methods for joint variable selection and covariance estimation with application of detecting expression quantitative trait loci (eQTL) and gene network estimation, and introduce a new hierarchical Bayesian method to be included in the comparison. Unlike the traditional univariate regression approach in eQTL, all four methods correlate phenotypes and genotypes by multivariate regression models that incorporate the dependence information among phenotypes, and use Bayesian multiplicity adjustment to avoid multiple testing burdens raised by traditional multiple testing correction methods. We presented the performance of three methods (MSSL - Multivariate Spike and Slab Lasso, SSUR - Sparse Seemingly Unrelated Bayesian Regression, and OBFBF - Objective Bayes Fractional Bayes Factor), along with the proposed, JDAG (Joint estimation via a Gaussian Directed Acyclic Graph model) method through simulation experiments, and publicly available HapMap real data, taking asthma as an example. Compared with existing methods, JDAG identified networks with higher sensitivity and specificity under row-wise sparse settings. JDAG requires less execution in small-to-moderate dimensions, but is not currently applicable to high dimensional data. The eQTL analysis in asthma data showed a number of known gene regulations such as STARD3, IKZF3 and PGAP3, all reported in asthma studies. The code of the proposed method is freely available at GitHub (https://github.com/xuan-cao/Joint-estimation-for-eQTL).
Collapse
Affiliation(s)
- Xuan Cao
- Division of Statistics and Data Science, Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH45221,USA
| | - Lili Ding
- Division of Biostatistics and Epidemiology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH45229,USA
| | - Tesfaye B Mersha
- Division of Asthma Research, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH45229,USA
| |
Collapse
|
20
|
Peluso S, Consonni G. Compatible priors for model selection of high-dimensional Gaussian DAGs. Electron J Stat 2020. [DOI: 10.1214/20-ejs1768] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
21
|
Lee K, Lee J, Lin L. Minimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factors. Ann Stat 2019. [DOI: 10.1214/18-aos1783] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
22
|
Castelletti F, Consonni G. Objective Bayes model selection of Gaussian interventional essential graphs for the identification of signaling pathways. Ann Appl Stat 2019. [DOI: 10.1214/19-aoas1275] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
23
|
Cao X, Zhang S. A permutation-based Bayesian approach for inverse covariance estimation. COMMUN STAT-THEOR M 2019. [DOI: 10.1080/03610926.2019.1590601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Xuan Cao
- Department of Mathematical Sciences, University of Cincinnati, Cincinnati, Ohio, USA
| | - Shaojun Zhang
- Department of Statistics, University of Florida, Gainesville, Florida, USA
| |
Collapse
|