1
|
Martínez A, Cuesta MJ, Peralta V. Dependence Graphs Based on Association Rules to Explore Delusional Experiences. MULTIVARIATE BEHAVIORAL RESEARCH 2022; 57:458-477. [PMID: 33538621 DOI: 10.1080/00273171.2020.1870912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Methods to estimate dependence graphs among variables, have quickly gained popularity in psychopathology research. To date, multiple methods have been proposed but recent studies report several drawbacks impacting on the validity of the conclusions as it is argued that assumptions and conditions underlying the methods commonly used and the nature of the data is lacking alignment. A particularly important issue is that underlying dynamics potentially present in heterogeneous datasets are disregarded, as the methods focus on the variables but not on individuals. This work also argues that the networks may lack relevant components as current methods ignore connections beyond pairwise interactions between individual symptoms. This study addresses these issues with a novel method for constructing dependence graphs based on applying Association Rules to binary records, which is often the type of records in the psychopathology domain. To demonstrate the benefits, we examine 12 delusional experiences in a sample of 1423 subjects with psychotic disorders. We show that by extracting Association Rules using an algorithm called apriori, in addition to facilitating an intuitive interpretation, previously unseen relevant dependencies are revealed from higher order interactions among psychotic experiences in subgroups of patients.
Collapse
Affiliation(s)
| | - Manuel J Cuesta
- Psychiatry Service, Complejo Hospitalario de Navarra
- Navarrabiomed and Instituto de Investigación Sanitaria de Navarra (IdISNa)
| | - Victor Peralta
- Mental Health Department, Servicio Navarro de Salud
- Navarrabiomed and Instituto de Investigación Sanitaria de Navarra (IdISNa)
| |
Collapse
|
2
|
Kuismin M, Dodangeh F, Sillanpää MJ. Gap-com: general model selection criterion for sparse undirected gene networks with nontrivial community structure. G3 (BETHESDA, MD.) 2022; 12:jkab437. [PMID: 35100338 PMCID: PMC9210289 DOI: 10.1093/g3journal/jkab437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/06/2021] [Indexed: 06/14/2023]
Abstract
We introduce a new model selection criterion for sparse complex gene network modeling where gene co-expression relationships are estimated from data. This is a novel formulation of the gap statistic and it can be used for the optimal choice of a regularization parameter in graphical models. Our criterion favors gene network structure which differs from a trivial gene interaction structure obtained totally at random. We call the criterion the gap-com statistic (gap community statistic). The idea of the gap-com statistic is to examine the difference between the observed and the expected counts of communities (clusters) where the expected counts are evaluated using either data permutations or reference graph (the Erdős-Rényi graph) resampling. The latter represents a trivial gene network structure determined by chance. We put emphasis on complex network inference because the structure of gene networks is usually nontrivial. For example, some of the genes can be clustered together or some genes can be hub genes. We evaluate the performance of the gap-com statistic in graphical model selection and compare its performance to some existing methods using simulated and real biological data examples.
Collapse
Affiliation(s)
- Markku Kuismin
- Research Unit of Mathematical Sciences, University of Oulu, Oulu FI-90014, Finland
- Biocenter Oulu, University of Oulu, Oulu FI-90014, Finland
- School of Computing, University of Eastern Finland, Joensuu FI-80101, Finland
| | - Fatemeh Dodangeh
- Research Unit of Mathematical Sciences, University of Oulu, Oulu FI-90014, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu FI-90014, Finland
- Biocenter Oulu, University of Oulu, Oulu FI-90014, Finland
- Infotech Oulu, University of Oulu, Oulu FI-90014, Finland
| |
Collapse
|
3
|
Artner R, Wellingerhof PP, Lafit G, Loossens T, Vanpaemel W, Tuerlinckx F. The shape of partial correlation matrices. COMMUN STAT-THEOR M 2020. [DOI: 10.1080/03610926.2020.1811338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Richard Artner
- KU Leuven - Faculty of Psychology and Educational Sciences, Leuven, Belgium
| | - Paul P. Wellingerhof
- Eberhard Karls Universität Tübingen - Department of Psychology, Tubingen, Germany
| | - Ginette Lafit
- KU Leuven - Faculty of Psychology and Educational Sciences, Leuven, Belgium
| | - Tim Loossens
- KU Leuven - Faculty of Psychology and Educational Sciences, Leuven, Belgium
| | - Wolf Vanpaemel
- KU Leuven - Faculty of Psychology and Educational Sciences, Leuven, Belgium
| | - Francis Tuerlinckx
- KU Leuven - Faculty of Psychology and Educational Sciences, Leuven, Belgium
| |
Collapse
|
4
|
Williams DR, Rast P. Back to the basics: Rethinking partial correlation network methodology. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2020; 73:187-212. [PMID: 31206621 PMCID: PMC8572131 DOI: 10.1111/bmsp.12173] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 03/02/2019] [Indexed: 05/08/2023]
Abstract
The Gaussian graphical model (GGM) is an increasingly popular technique used in psychology to characterize relationships among observed variables. These relationships are represented as elements in the precision matrix. Standardizing the precision matrix and reversing the sign yields corresponding partial correlations that imply pairwise dependencies in which the effects of all other variables have been controlled for. The graphical lasso (glasso) has emerged as the default estimation method, which uses ℓ1 -based regularization. The glasso was developed and optimized for high-dimensional settings where the number of variables (p) exceeds the number of observations (n), which is uncommon in psychological applications. Here we propose to go 'back to the basics', wherein the precision matrix is first estimated with non-regularized maximum likelihood and then Fisher Z transformed confidence intervals are used to determine non-zero relationships. We first show the exact correspondence between the confidence level and specificity, which is due to 1 minus specificity denoting the false positive rate (i.e., α). With simulations in low-dimensional settings (p ≪ n), we then demonstrate superior performance compared to the glasso for detecting the non-zero effects. Further, our results indicate that the glasso is inconsistent for the purpose of model selection and does not control the false discovery rate, whereas the proposed method converges on the true model and directly controls error rates. We end by discussing implications for estimating GGMs in psychology.
Collapse
|
5
|
Lafit G, Tuerlinckx F, Myin-Germeys I, Ceulemans E. A Partial Correlation Screening Approach for Controlling the False Positive Rate in Sparse Gaussian Graphical Models. Sci Rep 2019; 9:17759. [PMID: 31780817 PMCID: PMC6882820 DOI: 10.1038/s41598-019-53795-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 11/05/2019] [Indexed: 12/28/2022] Open
Abstract
Gaussian Graphical Models (GGMs) are extensively used in many research areas, such as genomics, proteomics, neuroimaging, and psychology, to study the partial correlation structure of a set of variables. This structure is visualized by drawing an undirected network, in which the variables constitute the nodes and the partial correlations the edges. In many applications, it makes sense to impose sparsity (i.e., some of the partial correlations are forced to zero) as sparsity is theoretically meaningful and/or because it improves the predictive accuracy of the fitted model. However, as we will show by means of extensive simulations, state-of-the-art estimation approaches for imposing sparsity on GGMs, such as the Graphical lasso, ℓ1 regularized nodewise regression, and joint sparse regression, fall short because they often yield too many false positives (i.e., partial correlations that are not properly set to zero). In this paper we present a new estimation approach that allows to control the false positive rate better. Our approach consists of two steps: First, we estimate an undirected network using one of the three state-of-the-art estimation approaches. Second, we try to detect the false positives, by flagging the partial correlations that are smaller in absolute value than a given threshold, which is determined through cross-validation; the flagged correlations are set to zero. Applying this new approach to the same simulated data, shows that it indeed performs better. We also illustrate our approach by using it to estimate (1) a gene regulatory network for breast cancer data, (2) a symptom network of patients with a diagnosis within the nonaffective psychotic spectrum and (3) a symptom network of patients with PTSD.
Collapse
Affiliation(s)
- Ginette Lafit
- Research Group on Quantitative Psychology and Individual Differences, KU Leuven-University of Leuven, Leuven, 3000, Belgium.
- Center for Contextual Psychiatry, KU Leuven-University of Leuven, Leuven, 3000, Belgium.
| | - Francis Tuerlinckx
- Research Group on Quantitative Psychology and Individual Differences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Inez Myin-Germeys
- Center for Contextual Psychiatry, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Eva Ceulemans
- Research Group on Quantitative Psychology and Individual Differences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| |
Collapse
|
6
|
Williams DR, Rhemtulla M, Wysocki AC, Rast P. On Nonregularized Estimation of Psychological Networks. MULTIVARIATE BEHAVIORAL RESEARCH 2019; 54:719-750. [PMID: 30957629 PMCID: PMC6736701 DOI: 10.1080/00273171.2019.1575716] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
An important goal for psychological science is developing methods to characterize relationships between variables. Customary approaches use structural equation models to connect latent factors to a number of observed measurements, or test causal hypotheses between observed variables. More recently, regularized partial correlation networks have been proposed as an alternative approach for characterizing relationships among variables through off-diagonal elements in the precision matrix. While the graphical Lasso (glasso) has emerged as the default network estimation method, it was optimized in fields outside of psychology with very different needs, such as high dimensional data where the number of variables (p) exceeds the number of observations (n). In this article, we describe the glasso method in the context of the fields where it was developed, and then we demonstrate that the advantages of regularization diminish in settings where psychological networks are often fitted ( p≪n ). We first show that improved properties of the precision matrix, such as eigenvalue estimation, and predictive accuracy with cross-validation are not always appreciable. We then introduce nonregularized methods based on multiple regression and a nonparametric bootstrap strategy, after which we characterize performance with extensive simulations. Our results demonstrate that the nonregularized methods can be used to reduce the false-positive rate, compared to glasso, and they appear to provide consistent performance across sparsity levels, sample composition (p/n), and partial correlation size. We end by reviewing recent findings in the statistics literature that suggest alternative methods often have superior performance than glasso, as well as suggesting areas for future research in psychology. The nonregularized methods have been implemented in the R package GGMnonreg.
Collapse
Affiliation(s)
- Donald R Williams
- Department of Psychology, University of California , Davis , CA , USA
| | - Mijke Rhemtulla
- Department of Psychology, University of California , Davis , CA , USA
| | - Anna C Wysocki
- Department of Psychology, University of California , Davis , CA , USA
| | - Philippe Rast
- Department of Psychology, University of California , Davis , CA , USA
| |
Collapse
|
7
|
Mejia AF, Nebel MB, Barber AD, Choe AS, Pekar JJ, Caffo BS, Lindquist MA. Improved estimation of subject-level functional connectivity using full and partial correlation with empirical Bayes shrinkage. Neuroimage 2018; 172:478-491. [PMID: 29391241 PMCID: PMC5957759 DOI: 10.1016/j.neuroimage.2018.01.029] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 01/07/2018] [Accepted: 01/12/2018] [Indexed: 02/04/2023] Open
Abstract
Reliability of subject-level resting-state functional connectivity (FC) is determined in part by the statistical techniques employed in its estimation. Methods that pool information across subjects to inform estimation of subject-level effects (e.g., Bayesian approaches) have been shown to enhance reliability of subject-level FC. However, fully Bayesian approaches are computationally demanding, while empirical Bayesian approaches typically rely on using repeated measures to estimate the variance components in the model. Here, we avoid the need for repeated measures by proposing a novel measurement error model for FC describing the different sources of variance and error, which we use to perform empirical Bayes shrinkage of subject-level FC towards the group average. In addition, since the traditional intra-class correlation coefficient (ICC) is inappropriate for biased estimates, we propose a new reliability measure denoted the mean squared error intra-class correlation coefficient (ICCMSE) to properly assess the reliability of the resulting (biased) estimates. We apply the proposed techniques to test-retest resting-state fMRI data on 461 subjects from the Human Connectome Project to estimate connectivity between 100 regions identified through independent components analysis (ICA). We consider both correlation and partial correlation as the measure of FC and assess the benefit of shrinkage for each measure, as well as the effects of scan duration. We find that shrinkage estimates of subject-level FC exhibit substantially greater reliability than traditional estimates across various scan durations, even for the most reliable connections and regardless of connectivity measure. Additionally, we find partial correlation reliability to be highly sensitive to the choice of penalty term, and to be generally worse than that of full correlations except for certain connections and a narrow range of penalty values. This suggests that the penalty needs to be chosen carefully when using partial correlations.
Collapse
Affiliation(s)
| | - Mary Beth Nebel
- Center for Neurodevelopmental and Imaging Research, Kennedy Krieger Institute, USA; Department of Neurology, Johns Hopkins University, USA
| | - Anita D Barber
- Center for Psychiatric Neuroscience, Feinstein Institute for Medical Research, USA
| | - Ann S Choe
- Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, USA; F.M. Kirby Research Center for Functional Brain Imaging, Kennedy Krieger Institute, USA
| | - James J Pekar
- Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, USA; F.M. Kirby Research Center for Functional Brain Imaging, Kennedy Krieger Institute, USA
| | - Brian S Caffo
- Department of Biostatistics, Johns Hopkins University, USA
| | | |
Collapse
|
8
|
Tissier R, Houwing-Duistermaat J, Rodríguez-Girondo M. Improving stability of prediction models based on correlated omics data by using network approaches. PLoS One 2018; 13:e0192853. [PMID: 29462177 PMCID: PMC5819809 DOI: 10.1371/journal.pone.0192853] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 01/31/2018] [Indexed: 12/13/2022] Open
Abstract
Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.
Collapse
Affiliation(s)
- Renaud Tissier
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, Leiden, The Netherlands
- Developmental and Educational Psychology, Universiteit Leiden Faculteit Sociale Wetenschappen, Leiden, The Netherlands
- * E-mail:
| | | | - Mar Rodríguez-Girondo
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, Leiden, The Netherlands
| |
Collapse
|
9
|
Kuismin MO, Sillanpää MJ. Estimation of covariance and precision matrix, network structure, and a view toward systems biology. ACTA ACUST UNITED AC 2017. [DOI: 10.1002/wics.1415] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Markku O. Kuismin
- Department of Mathematical Sciences; University of Oulu; Oulu Finland
| | - Mikko J. Sillanpää
- Department of Mathematical Sciences; University of Oulu; Oulu Finland
- Biocenter Oulu; University of Oulu; Oulu Finland
| |
Collapse
|
10
|
Bartzis G, Deelen J, Maia J, Ligterink W, Hilhorst HWM, Houwing-Duistermaat JJ, van Eeuwijk F, Uh HW. Estimation of metabolite networks with regard to a specific covariable: applications to plant and human data. Metabolomics 2017; 13:129. [PMID: 28989335 PMCID: PMC5610247 DOI: 10.1007/s11306-017-1263-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 08/30/2017] [Indexed: 12/17/2022]
Abstract
INTRODUCTION In systems biology, where a main goal is acquiring knowledge of biological systems, one of the challenges is inferring biochemical interactions from different molecular entities such as metabolites. In this area, the metabolome possesses a unique place for reflecting "true exposure" by being sensitive to variation coming from genetics, time, and environmental stimuli. While influenced by many different reactions, often the research interest needs to be focused on variation coming from a certain source, i.e. a certain covariable [Formula: see text]. OBJECTIVE Here, we use network analysis methods to recover a set of metabolite relationships, by finding metabolites sharing a similar relation to [Formula: see text]. Metabolite values are based on information coming from individuals' [Formula: see text] status which might interact with other covariables. METHODS Alternative to using the original metabolite values, the total information is decomposed by utilizing a linear regression model and the part relevant to [Formula: see text] is further used. For two datasets, two different network estimation methods are considered. The first is weighted gene co-expression network analysis based on correlation coefficients. The second method is graphical LASSO based on partial correlations. RESULTS We observed that when using the parts related to the specific covariable of interest, resulting estimated networks display higher interconnectedness. Additionally, several groups of biologically associated metabolites (very large density lipoproteins, lipoproteins, etc.) were identified in the human data example. CONCLUSIONS This work demonstrates how information on the study design can be incorporated to estimate metabolite networks. As a result, sets of interconnected metabolites can be clustered together with respect to their relation to a covariable of interest.
Collapse
Affiliation(s)
- Georgios Bartzis
- 0000000089452978grid.10419.3dDepartment of Medical Statistics and Bioinformatics, Leiden University Medical Center, Einthovenweg 20, 2300 RC Leiden, The Netherlands
| | - Joris Deelen
- 0000 0001 2105 1091grid.4372.2Department of Biological Mechanisms of Ageing, Max Planck Institute for Biology of Aging, Joseph-Stelzmann-Strasse 9b, 50931 Cologne, Germany
| | - Julio Maia
- 0000 0001 2188 478Xgrid.410543.7São Paulo State University, FCA/UNESP, Botucatu, SP CEP 18610-307 Brazil
| | - Wilco Ligterink
- 0000 0001 0791 5666grid.4818.5Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Henk W. M. Hilhorst
- 0000 0001 0791 5666grid.4818.5Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Jeanine-J. Houwing-Duistermaat
- 0000000089452978grid.10419.3dDepartment of Medical Statistics and Bioinformatics, Leiden University Medical Center, Einthovenweg 20, 2300 RC Leiden, The Netherlands
- 0000 0004 1936 8403grid.9909.9Department of Statistics, School of Mathematics, University of Leeds, Leeds, LS2 9JT UK
| | - Fred van Eeuwijk
- 0000 0001 0791 5666grid.4818.5Biometris, Wageningen University, P.O. Box 16, 6700 AC Wageningen, The Netherlands
| | - Hae-Won Uh
- 0000000089452978grid.10419.3dDepartment of Medical Statistics and Bioinformatics, Leiden University Medical Center, Einthovenweg 20, 2300 RC Leiden, The Netherlands
| |
Collapse
|