1
|
Castelletti F, Consonni G. Bayesian graphical modeling for heterogeneous causal effects. Stat Med 2023; 42:15-32. [PMID: 36317356 DOI: 10.1002/sim.9599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 09/08/2022] [Accepted: 10/15/2022] [Indexed: 12/24/2022]
Abstract
There is a growing interest in current medical research to develop personalized treatments using a molecular-based approach. The broad goal is to implement a more precise and targeted decision-making process, relative to traditional treatments based primarily on clinical diagnoses. Specifically, we consider patients affected by Acute Myeloid Leukemia (AML), an hematological cancer characterized by uncontrolled proliferation of hematopoietic stem cells in the bone marrow. Because AML responds poorly to chemotherapeutic treatments, the development of targeted therapies is essential to improve patients' prospects. In particular, the dataset we analyze contains the levels of proteins involved in cell cycle regulation and linked to the progression of the disease. We evaluate treatment effects within a causal framework represented by a Directed Acyclic Graph (DAG) model, whose vertices are the protein levels in the network. A major obstacle in implementing the above program is represented by individual heterogeneity. We address this issue through a Dirichlet Process (DP) mixture of Gaussian DAG-models where both the graphical structure as well as the allied model parameters are regarded as uncertain. Our procedure determines a clustering structure of the units reflecting the underlying heterogeneity, and produces subject-specific estimates of causal effects based on Bayesian Model Averaging (BMA). With reference to the AML dataset, we identify different effects of protein regulation among individuals; moreover, our method clusters patients into groups that exhibit only mild similarities with traditional categories based on morphological features.
Collapse
Affiliation(s)
- Federico Castelletti
- Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
| | - Guido Consonni
- Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
| |
Collapse
|
2
|
Bernardini D, Paterlini S, Taufer E. A 2-stage elastic net algorithm for estimation of sparse networks with heavy-tailed data. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2124992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Davide Bernardini
- Department of Economics and Management, University of Trento, Trento, Italy
| | - Sandra Paterlini
- Department of Economics and Management, University of Trento, Trento, Italy
| | - Emanuele Taufer
- Department of Economics and Management, University of Trento, Trento, Italy
| |
Collapse
|
3
|
Ogawa M, Nakamoto K, Sei T. On the fractional moments of a truncated centered multivariate normal distribution. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2020.1725821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Mitsunori Ogawa
- Interfaculty Initiative in Information Studies, Graduate School of Interdisciplinary Information Studies, The University of Tokyo, Tokyo, Japan
| | - Kazuki Nakamoto
- Department of Mathematics, Faculty of Science and Technology, Keio University, Kanagawa, Japan
| | - Tomonari Sei
- Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
4
|
Dimension-wise scaled normal mixtures with application to finance and biometry. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2022.105020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
5
|
Rejoinder to the discussion of “Bayesian graphical models for modern biological applications”. STAT METHOD APPL-GER 2022. [DOI: 10.1007/s10260-022-00634-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
6
|
Abstract
Graphical models are powerful tools that are regularly used to investigate complex dependence structures in high-throughput biomedical datasets. They allow for holistic, systems-level view of the various biological processes, for intuitive and rigorous understanding and interpretations. In the context of large networks, Bayesian approaches are particularly suitable because it encourages sparsity of the graphs, incorporate prior information, and most importantly account for uncertainty in the graph structure. These features are particularly important in applications with limited sample size, including genomics and imaging studies. In this paper, we review several recently developed techniques for the analysis of large networks under non-standard settings, including but not limited to, multiple graphs for data observed from multiple related subgroups, graphical regression approaches used for the analysis of networks that change with covariates, and other complex sampling and structural settings. We also illustrate the practical utility of some of these methods using examples in cancer genomics and neuroimaging.
Collapse
|
7
|
Wang JH, Chen YH. Network-adjusted Kendall's Tau Measure for Feature Screening with Application to High-dimensional Survival Genomic Data. Bioinformatics 2021; 37:2150-2156. [PMID: 33595070 DOI: 10.1093/bioinformatics/btab064] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 12/17/2020] [Accepted: 01/26/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In high-dimensional genetic/genomic data, the identification of genes related to clinical survival trait is a challenging and important issue. In particular, right-censored survival outcomes and contaminated biomarker data make the relevant feature screening difficult. Several independence screening methods have been developed, but they fail to account for gene-gene dependency information, and may be sensitive to outlying feature data. RESULTS We improve the inverse probability-of-censoring weighted (IPCW) Kendall's tau statistic by using Google's PageRank Markov matrix to incorporate feature dependency network information. Also, to tackle outlying feature data, the nonparanormal approach transforming the feature data to multivariate normal variates are utilized in the graphical lasso procedure to estimate the network structure in feature data. Simulation studies under various scenarios show that the proposed network-adjusted weighted Kendall's tau approach leads to more accurate feature selection and survival prediction than the methods without accounting for feature dependency network information and outlying feature data. The applications on the clinical survival outcome data of diffuse large B-cell lymphoma and of The Cancer Genome Atlas lung adenocarcinoma patients demonstrate clearly the advantages of the new proposal over the alternative methods. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jie-Huei Wang
- Department of Statistics, Feng Chia University, Seatwen, Taichung 40724, Taiwan
| | - Yi-Hau Chen
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei 11529, Taiwan
| |
Collapse
|
8
|
Tang P, Jiang H, Kim H, Deng X. Robust estimation of sparse precision matrix using adaptive weighted graphical lasso approach. J Nonparametr Stat 2021. [DOI: 10.1080/10485252.2021.1931688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Peng Tang
- AI, Cruise LLC, San Francisco, CA, USA
| | - Huijing Jiang
- Global Biometrics and Data Sciences, Bristol Myers Squibb, Berkeley Heights, NJ, USA
| | - Heeyoung Kim
- Department of Industrial & Systems Engineering, KAIST, Daejeon, South Korea
| | - Xinwei Deng
- Department of Statistics, Virginia Tech, Blacksburg, VA, USA
| |
Collapse
|
9
|
Zhang C, Tian GL, Yuen KC, Liu P, Tang ML. A new multivariate t distribution with variant tail weights and its application in robust regression analysis. J Appl Stat 2021; 49:2629-2656. [DOI: 10.1080/02664763.2021.1913106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Chi Zhang
- College of Economics, Shenzhen University, Shenzhen, Guangdong Province, People's Republic of China
| | - Guo-Liang Tian
- Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen, Guangdong Province, People's Republic of China
| | - Kam Chuen Yuen
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Pengyi Liu
- School of Statistics and Mathematics, Yunnan University of Finance and Economics, Kunming, Yunnan Province, People's Republic of China
| | - Man-Lai Tang
- Department of Mathematics, Statistics and Insurance, School of Decision Sciences, The Hang Seng University of Hong Kong, Hong Kong, People's Republic of China
| |
Collapse
|
10
|
Cremaschi A, Argiento R, Shoemaker K, Peterson C, Vannucci M. Hierarchical Normalized Completely Random Measures for Robust Graphical Modeling. BAYESIAN ANALYSIS 2019; 14:1271-1301. [PMID: 32431780 PMCID: PMC7237071 DOI: 10.1214/19-ba1153] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Gaussian graphical models are useful tools for exploring network structures in multivariate normal data. In this paper we are interested in situations where data show departures from Gaussianity, therefore requiring alternative modeling distributions. The multivariate t-distribution, obtained by dividing each component of the data vector by a gamma random variable, is a straightforward generalization to accommodate deviations from normality such as heavy tails. Since different groups of variables may be contaminated to a different extent, Finegold and Drton (2014) introduced the Dirichlet t-distribution, where the divisors are clustered using a Dirichlet process. In this work, we consider a more general class of nonparametric distributions as the prior on the divisor terms, namely the class of normalized completely random measures (NormCRMs). To improve the effectiveness of the clustering, we propose modeling the dependence among the divisors through a nonparametric hierarchical structure, which allows for the sharing of parameters across the samples in the data set. This desirable feature enables us to cluster together different components of multivariate data in a parsimonious way. We demonstrate through simulations that this approach provides accurate graphical model inference, and apply it to a case study examining the dependence structure in radiomics data derived from The Cancer Imaging Atlas.
Collapse
Affiliation(s)
- Andrea Cremaschi
- Department of Cancer Immunology, Institute of Cancer Research, Oslo University Hospital, Oslo, Norway
- Oslo Centre for Biostatistics and Epidemiology (OCBE), University of Oslo, Oslo, Norway
| | - Raffaele Argiento
- ESOMAS Department, University of Torino, Torino, Italy
- Collegio Carlo Alberto, Torino, Italy
| | - Katherine Shoemaker
- Department of Statistics, Rice University, Houston, TX, USA
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Christine Peterson
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | | |
Collapse
|
11
|
Chun H, Lee MH, Kim S, Oh J. Robust precision matrix estimation via weighted median regression with regularization. CAN J STAT 2018. [DOI: 10.1002/cjs.11356] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Hyonho Chun
- Department of StatisticsPurdue UniversityWest LafayetteIN 47906 U.S.A
| | - Myung Hee Lee
- Center for Global Health, Weill Cornell MedicineNew YorkNY 10065 U.S.A
| | - Sung‐Ho Kim
- Department of Mathematical SciencesKorea Advanced Institute of Science and TechnologyDaejeon KR
| | - Jihwan Oh
- Department of StatisticsPurdue UniversityWest LafayetteIN 47906 U.S.A
| |
Collapse
|
12
|
Katayama S, Fujisawa H, Drton M. Robust and sparse Gaussian graphical modelling under cell-wise contamination. Stat (Int Stat Inst) 2018. [DOI: 10.1002/sta4.181] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Affiliation(s)
- Shota Katayama
- Department of Industrial Engineering and Economics; Tokyo Institute of Technology; 2-12-1 Ookayama Meguro-ku 152-8552 Tokyo Japan
| | - Hironori Fujisawa
- The Institute of Statistical Mathematics; 10-3 Midori-cho, Tachikawa; 190-8562 Tokyo Japan
- Graduate School of Medicine; Nagoya University; 65 Tsurumai-cho, Showa-ku Nagoya 466-8550 Japan
| | - Mathias Drton
- Department of Statistics; University of Washington; Seattle 98195-4322 WA USA
| |
Collapse
|
13
|
Noor A, Ahmad A, Serpedin E. SparseNCA: Sparse Network Component Analysis for Recovering Transcription Factor Activities with Incomplete Prior Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:387-395. [PMID: 26529780 DOI: 10.1109/tcbb.2015.2495224] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Network component analysis (NCA) is an important method for inferring transcriptional regulatory networks (TRNs) and recovering transcription factor activities (TFAs) using gene expression data, and the prior information about the connectivity matrix. The algorithms currently available crucially depend on the completeness of this prior information. However, inaccuracies in the measurement process may render incompleteness in the available knowledge about the connectivity matrix. Hence, computationally efficient algorithms are needed to overcome the possible incompleteness in the available data. We present a sparse network component analysis algorithm (sparseNCA), which incorporates the effect of incompleteness in the estimation of TRNs by imposing an additional sparsity constraint using the norm, which results in a greater estimation accuracy. In order to improve the computational efficiency, an iterative re-weighted method is proposed for the NCA problem which not only promotes sparsity but is hundreds of times faster than the norm based solution. The performance of sparseNCA is rigorously compared to that of FastNCA and NINCA using synthetic data as well as real data. It is shown that sparseNCA outperforms the existing state-of-the-art algorithms both in terms of estimation accuracy and consistency with the added advantage of low computational complexity. The performance of sparseNCA compared to its predecessors is particularly pronounced in case of incomplete prior information about the sparsity of the network. Subnetwork analysis is performed on the E.coli data which reiterates the superior consistency of the proposed algorithm.
Collapse
|
14
|
Bhadra A, Rao A, Baladandayuthapani V. Inferring network structure in non-normal and mixed discrete-continuous genomic data. Biometrics 2018; 74:185-195. [PMID: 28437848 PMCID: PMC5654714 DOI: 10.1111/biom.12711] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 02/01/2017] [Accepted: 03/01/2017] [Indexed: 11/28/2022]
Abstract
Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach.
Collapse
Affiliation(s)
- Anindya Bhadra
- Department of Statistics, Purdue University, 250 N. University Street, West Lafayette, Indiana 47907, U.S.A
| | - Arvind Rao
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, 1400 Pressler Dr., Houston, Texas 77030, U.S.A
| | - Veerabhadran Baladandayuthapani
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1400 Pressler Dr., Houston,Texas 77030, U.S.A
| |
Collapse
|
15
|
Yang E, Lozano AC, Aravkin A. A general family of trimmed estimators for robust high-dimensional data analysis. Electron J Stat 2018. [DOI: 10.1214/18-ejs1470] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
16
|
Loh PL, Tan XL. High-dimensional robust precision matrix estimation: Cellwise corruption under $\epsilon $-contamination. Electron J Stat 2018. [DOI: 10.1214/18-ejs1427] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
|
18
|
|
19
|
|
20
|
|
21
|
Lin L, Drton M, Shojaie A. Estimation of High-Dimensional Graphical Models Using Regularized Score Matching. Electron J Stat 2016; 10:806-854. [PMID: 28638498 PMCID: PMC5476334 DOI: 10.1214/16-ejs1126] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Graphical models are widely used to model stochastic dependences among large collections of variables. We introduce a new method of estimating undirected conditional independence graphs based on the score matching loss, introduced by Hyvärinen (2005), and subsequently extended in Hyvärinen (2007). The regularized score matching method we propose applies to settings with continuous observations and allows for computationally efficient treatment of possibly non-Gaussian exponential family models. In the well-explored Gaussian setting, regularized score matching avoids issues of asymmetry that arise when applying the technique of neighborhood selection, and compared to existing methods that directly yield symmetric estimates, the score matching approach has the advantage that the considered loss is quadratic and gives piecewise linear solution paths under ℓ1 regularization. Under suitable irrepresentability conditions, we show that ℓ1-regularized score matching is consistent for graph estimation in sparse high-dimensional settings. Through numerical experiments and an application to RNAseq data, we confirm that regularized score matching achieves state-of-the-art performance in the Gaussian case and provides a valuable tool for computationally efficient estimation in non-Gaussian graphical models.
Collapse
Affiliation(s)
- Lina Lin
- Department of Statistics, University of Washington, Seattle, WA 98195, U.S.A
| | - Mathias Drton
- Department of Statistics, University of Washington, Seattle, WA 98195, U.S.A
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
22
|
Wang X, Alshawaqfeh M, Dang X, Wajid B, Noor A, Qaraqe M, Serpedin E. An Overview of NCA-Based Algorithms for Transcriptional Regulatory Network Inference. ACTA ACUST UNITED AC 2015; 4:596-617. [PMID: 27600242 PMCID: PMC4996402 DOI: 10.3390/microarrays4040596] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 10/07/2015] [Accepted: 11/11/2015] [Indexed: 01/08/2023]
Abstract
In systems biology, the regulation of gene expressions involves a complex network of regulators. Transcription factors (TFs) represent an important component of this network: they are proteins that control which genes are turned on or off in the genome by binding to specific DNA sequences. Transcription regulatory networks (TRNs) describe gene expressions as a function of regulatory inputs specified by interactions between proteins and DNA. A complete understanding of TRNs helps to predict a variety of biological processes and to diagnose, characterize and eventually develop more efficient therapies. Recent advances in biological high-throughput technologies, such as DNA microarray data and next-generation sequence (NGS) data, have made the inference of transcription factor activities (TFAs) and TF-gene regulations possible. Network component analysis (NCA) represents an efficient computational framework for TRN inference from the information provided by microarrays, ChIP-on-chip and the prior information about TF-gene regulation. However, NCA suffers from several shortcomings. Recently, several algorithms based on the NCA framework have been proposed to overcome these shortcomings. This paper first overviews the computational principles behind NCA, and then, it surveys the state-of-the-art NCA-based algorithms proposed in the literature for TRN reconstruction.
Collapse
Affiliation(s)
- Xu Wang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Mustafa Alshawaqfeh
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Xuan Dang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Bilal Wajid
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Amina Noor
- Institute of Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA.
| | - Marwa Qaraqe
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Erchin Serpedin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
23
|
Peterson CB, Stingo FC, Vannucci M. Joint Bayesian variable and graph selection for regression models with network-structured predictors. Stat Med 2015; 35:1017-31. [PMID: 26514925 DOI: 10.1002/sim.6792] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 10/12/2015] [Accepted: 10/14/2015] [Indexed: 01/09/2023]
Abstract
In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications because it allows the identification of pathways of functionally related genes or proteins that impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is available a priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival.
Collapse
Affiliation(s)
- Christine B Peterson
- Department of Health Research and Policy, Stanford University, Stanford, CA, 94305, U.S.A
| | - Francesco C Stingo
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, U.S.A
| | - Marina Vannucci
- Department of Statistics, Rice University, Houston, TX, 77005, U.S.A
| |
Collapse
|
24
|
|
25
|
Abstract
We consider the problem of estimating the parameters in a pairwise graphical model in which the distribution of each node, conditioned on the others, may have a different exponential family form. We identify restrictions on the parameter space required for the existence of a well-defined joint density, and establish the consistency of the neighbourhood selection approach for graph reconstruction in high dimensions when the true underlying graph is sparse. Motivated by our theoretical results, we investigate the selection of edges between nodes whose conditional distributions take different parametric forms, and show that efficiency can be gained if edge estimates obtained from the regressions of particular nodes are used to reconstruct the graph. These results are illustrated with examples of Gaussian, Bernoulli, Poisson and exponential distributions. Our theoretical findings are corroborated by evidence from simulation studies.
Collapse
Affiliation(s)
- Shizhe Chen
- Department of Biostatistics, University of Washington, Box 357232, Seattle, Washington 98195, U.S.A
| | - Daniela M. Witten
- Department of Biostatistics, University of Washington, Box 357232, Seattle, Washington 98195, U.S.A
| | - Ali shojaie
- Department of Biostatistics, University of Washington, Box 357232, Seattle, Washington 98195, U.S.A
| |
Collapse
|
26
|
Gadaleta F, Van Steen K. Discovering main genetic interactions with LABNet LAsso-based network inference. PLoS One 2014; 9:e110451. [PMID: 25369052 PMCID: PMC4219691 DOI: 10.1371/journal.pone.0110451] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 09/04/2014] [Indexed: 01/08/2023] Open
Abstract
Genome-wide association studies can potentially unravel the mechanisms behind complex traits and common genetic diseases. Despite the valuable results produced thus far, many questions remain unanswered. For instance, which specific genetic compounds are linked to the risk of the disease under investigation; what biological mechanism do they act through; or how do they interact with environmental and other external factors? The driving force of computational biology is the constantly growing amount of big data generated by high-throughput technologies. A practical framework that can deal with this abundance of information and that consent to discovering genetic associations and interactions is provided by means of networks. Unfortunately, high dimensionality, the presence of noise and the geometry of data can make the aforementioned problem extremely challenging. We propose a penalised linear regression approach that can deal with the aforementioned issues that affect genetic data. We analyse the gene expression profiles of individuals with a common trait to infer the network structure of interactions among genes. The permutation-based approach leads to more stable and reliable networks inferred from synthetic microarray data. We show that a higher number of permutations determines the number of predicted edges, improves the overall sensitivity and controls the number of false positives.
Collapse
|
27
|
Vogel D, Tyler DE. Robust estimators for nondecomposable elliptical graphical models. Biometrika 2014. [DOI: 10.1093/biomet/asu041] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
28
|
Affiliation(s)
- Martin Bilodeau
- Department of Mathematics and Statistics; University of Montreal; P. O. Box 6128 Station Centre-ville Montreal, Canada H3C 3J7
| |
Collapse
|
29
|
|
30
|
Noor A, Ahmad A, Serpedin E, Nounou M, Nounou H. ROBNCA: robust network component analysis for recovering transcription factor activities. ACTA ACUST UNITED AC 2013; 29:2410-8. [PMID: 23940252 DOI: 10.1093/bioinformatics/btt433] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Network component analysis (NCA) is an efficient method of reconstructing the transcription factor activity (TFA), which makes use of the gene expression data and prior information available about transcription factor (TF)-gene regulations. Most of the contemporary algorithms either exhibit the drawback of inconsistency and poor reliability, or suffer from prohibitive computational complexity. In addition, the existing algorithms do not possess the ability to counteract the presence of outliers in the microarray data. Hence, robust and computationally efficient algorithms are needed to enable practical applications. RESULTS We propose ROBust Network Component Analysis (ROBNCA), a novel iterative algorithm that explicitly models the possible outliers in the microarray data. An attractive feature of the ROBNCA algorithm is the derivation of a closed form solution for estimating the connectivity matrix, which was not available in prior contributions. The ROBNCA algorithm is compared with FastNCA and the non-iterative NCA (NI-NCA). ROBNCA estimates the TF activity profiles as well as the TF-gene control strength matrix with a much higher degree of accuracy than FastNCA and NI-NCA, irrespective of varying noise, correlation and/or amount of outliers in case of synthetic data. The ROBNCA algorithm is also tested on Saccharomyces cerevisiae data and Escherichia coli data, and it is observed to outperform the existing algorithms. The run time of the ROBNCA algorithm is comparable with that of FastNCA, and is hundreds of times faster than NI-NCA. AVAILABILITY The ROBNCA software is available at http://people.tamu.edu/∼amina/ROBNCA
Collapse
Affiliation(s)
- Amina Noor
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA, Corporate Research and Development, Qualcomm Technologies Inc., San Diego, CA 92121, USA, Department of Chemical Engineering and Department of Electrical Engineering, Texas A&M University at Qatar, Doha Qatar
| | | | | | | | | |
Collapse
|
31
|
|
32
|
He Y, Jia J, Yu B. Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs. Ann Stat 2013. [DOI: 10.1214/13-aos1125] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
33
|
A unified framework for association analysis with multiple related phenotypes. PLoS One 2013; 8:e65245. [PMID: 23861737 PMCID: PMC3702528 DOI: 10.1371/journal.pone.0065245] [Citation(s) in RCA: 157] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2012] [Accepted: 04/25/2013] [Indexed: 02/06/2023] Open
Abstract
We consider the problem of assessing associations between multiple related outcome variables, and a single explanatory variable of interest. This problem arises in many settings, including genetic association studies, where the explanatory variable is genotype at a genetic variant. We outline a framework for conducting this type of analysis, based on Bayesian model comparison and model averaging for multivariate regressions. This framework unifies several common approaches to this problem, and includes both standard univariate and standard multivariate association tests as special cases. The framework also unifies the problems of testing for associations and explaining associations – that is, identifying which outcome variables are associated with genotype. This provides an alternative to the usual, but conceptually unsatisfying, approach of resorting to univariate tests when explaining and interpreting significant multivariate findings. The method is computationally tractable genome-wide for modest numbers of phenotypes (e.g. 5–10), and can be applied to summary data, without access to raw genotype and phenotype data. We illustrate the methods on both simulated examples, and to a genome-wide association study of blood lipid traits where we identify 18 potential novel genetic associations that were not identified by univariate analyses of the same data.
Collapse
|
34
|
Wagaman A. Efficient k-NN graph construction for graphs on variables. Stat Anal Data Min 2013. [DOI: 10.1002/sam.11186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
35
|
Abstract
Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose a l(1) penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified-likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re-estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso.
Collapse
Affiliation(s)
- Hokeun Sun
- Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA.
| | | |
Collapse
|