1
|
Zhang B, Wiedermann W. Covariate selection in causal learning under non-Gaussianity. Behav Res Methods 2024; 56:4019-4037. [PMID: 37704788 DOI: 10.3758/s13428-023-02217-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/04/2023] [Indexed: 09/15/2023]
Abstract
Understanding causal mechanisms is a central goal in the behavioral, developmental, and social sciences. When estimating and probing causal effects using observational data, covariate adjustment is a crucial element to remove dependencies between focal predictors and the error term. Covariate selection, however, constitutes a challenging task because availability alone is not an adequate criterion to decide whether a covariate should be included in the statistical model. The present study introduces a non-Gaussian method for covariate selection and provides a forward selection algorithm for linear models (i.e., non-Gaussian forward selection; nGFS) to select appropriate covariates from a set of potential control variables to avoid inconsistent and biased estimators of the causal effect of interest. Further, we demonstrate that the forward selection algorithm has properties compatible with principles of direction of dependence, i.e., probing whether the causal target model is correctly specified with respect to the causal direction of effects. Results of a Monte Carlo simulation study suggest that the selection algorithm performs well, in particular when sample sizes are large (i.e., n ≥ 250) and data strongly deviate from Gaussianity (e.g., distributions with skewness beyond 1.5). An empirical example is given for illustrative purposes.
Collapse
Affiliation(s)
- Bixi Zhang
- Department of Educational Psychology, CUNY Graduate Center, New York, NY, USA.
| | - Wolfgang Wiedermann
- Department of Educational, School, and Counseling Psychology, University of Missouri, Columbia, MO, USA
| |
Collapse
|
2
|
Yu Q, Liu R. A consistent test of independence and goodness-of-fit in linear regression models. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2020.1728316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Qiqing Yu
- Department of Mathematical Sciences, SUNY, Binghamton, New York, USA
| | - Ruiqi Liu
- Department of Mathematical Sciences, SUNY, Binghamton, New York, USA
| |
Collapse
|
3
|
Morikawa K, Kim JK. Semiparametric optimal estimation with nonignorable nonresponse data. Ann Stat 2021. [DOI: 10.1214/21-aos2070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
4
|
|
5
|
Fan J, Feng Y, Xia L. A Projection-based Conditional Dependence Measure with Applications to High-dimensional Undirected Graphical Models. JOURNAL OF ECONOMETRICS 2020; 218:119-139. [PMID: 33208987 PMCID: PMC7668417 DOI: 10.1016/j.jeconom.2019.12.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Measuring conditional dependence is an important topic in econometrics with broad applications including graphical models. Under a factor model setting, a new conditional dependence measure based on projection is proposed. The corresponding conditional independence test is developed with the asymptotic null distribution unveiled where the number of factors could be high-dimensional. It is also shown that the new test has control over the asymptotic type I error and can be calculated efficiently. A generic method for building dependency graphs without Gaussian assumption using the new test is elaborated. We show the superiority of the new method, implemented in the R package pgraph, through simulation and real data studies.
Collapse
Affiliation(s)
- Jianqing Fan
- Department of Operations Research & Financial Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Yang Feng
- Department of Biostatistics, College of Global Public Health, New York University, New York, NY 10003, USA
| | - Lucy Xia
- Department of ISOM, School of Business and Management, Hong Kong University of Science and Technology, Hong Kong
| |
Collapse
|
6
|
|
7
|
Confounder detection in linear mediation models: Performance of kernel-based tests of independence. Behav Res Methods 2020; 52:342-359. [PMID: 30891713 DOI: 10.3758/s13428-019-01230-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
It is well-known that the identification of direct and indirect effects in mediation analysis requires strong unconfoundedness assumptions. Even when the predictor is under experimental control, unconfoundedness assumptions must be imposed on the mediator-outcome relation in order to guarantee valid indirect-effect identification. Researchers are therefore advised to test for unconfoundedness when estimating mediation effects. Significance tests to evaluate unconfoundedness usually rely on an instrumental variable (IV)-that is, a variable that is nonindependent of the explanatory variable and, at the same time, independent of all exogenous factors that affect the outcome when the explanatory variable is held constant. Because IVs may be hard to come by, the present study shows that confounders of the mediator-outcome relation can be detected without making use of IVs when variables are nonnormal. We show that kernel-based tests of independence are able to detect confounding under nonnormality. Results from a simulation study are presented that suggest that these tests perform well in terms of Type I error protection and statistical power, independent of the distribution or measurement level of the confounder. A real-world data example from the Job Search Intervention Study (JOBS II) illustrates how the presented approach can be used to minimize the risk of obtaining biased indirect-effect estimates. The data requirements and role of unconfoundedness tests as diagnostic tools are discussed. A Monte Carlo-based power analysis tool for sample size planning is also provided.
Collapse
|
8
|
Brunes LC, Baldi F, Lopes FB, Lôbo RB, Espigolan R, Costa MFO, Stafuzza NB, Magnabosco CU. Weighted single-step genome-wide association study and pathway analyses for feed efficiency traits in Nellore cattle. J Anim Breed Genet 2020; 138:23-44. [PMID: 32654373 DOI: 10.1111/jbg.12496] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 06/11/2020] [Accepted: 06/13/2020] [Indexed: 02/06/2023]
Abstract
The aim was to conduct a weighted single-step genome-wide association study to detect genomic regions and putative candidate genes related to residual feed intake, dry matter intake, feed efficiency (FE), feed conversion ratio, residual body weight gain, residual intake and weight gain in Nellore cattle. Several protein-coding genes were identified within the genomic regions that explain more than 0.5% of the additive genetic variance for these traits. These genes were associated with insulin, leptin, glucose, protein and lipid metabolisms; energy balance; heat and oxidative stress; bile secretion; satiety; feed behaviour; salivation; digestion; and nutrient absorption. Enrichment analysis revealed functional pathways (p-value < .05) such as neuropeptide signalling (GO:0007218), negative regulation of canonical Wingless/Int-1 (Wnt) signalling (GO:0090090), bitter taste receptor activity (GO:0033038), neuropeptide hormone activity (GO:0005184), bile secretion (bta04976), taste transduction (bta0742) and glucagon signalling pathway (bta04922). The identification of these genes, pathways and their respective functions should contribute to a better understanding of the genetic and physiological mechanisms regulating Nellore FE-related traits.
Collapse
Affiliation(s)
- Ludmilla C Brunes
- Department of Animal Science, Federal University of Goiás (UFG), Goiânia, Brazil.,Embrapa Rice and Beans, Santo Antônio de Goiás, Brazil
| | - Fernando Baldi
- Department of Animal Science, São Paulo State University (UNESP), Jaboticabal, Brazil
| | | | - Raysildo B Lôbo
- National Association of Breeders and Researchers (ANCP), Ribeirão Preto, Brazil
| | - Rafael Espigolan
- Department of Veterinary Medicine, Faculty of Animal Science and Food Engineering, University of Sao Paulo, Pirassununga, Brazil
| | | | - Nedenia B Stafuzza
- Beef Cattle Research Center, Animal Science Institute, Sertãozinho, Brazil
| | | |
Collapse
|
9
|
Wiedermann W, Sebastian J. Direction Dependence Analysis in the Presence of Confounders: Applications to Linear Mediation Models Using Observational Data. MULTIVARIATE BEHAVIORAL RESEARCH 2020; 55:495-515. [PMID: 30977403 DOI: 10.1080/00273171.2018.1528542] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Statistical methods to identify mis-specifications of linear regression models with respect to the direction of dependence (i.e. whether x→y or y→x better approximates the data-generating mechanism) have received considerable attention. Direction dependence analysis (DDA) constitutes such a statistical tool and makes use of higher-moment information of variables to derive statements concerning directional model mis-specifications in observational data. Previous studies on direction of dependence mainly focused on statistical inference and guidelines for the selection from the two directionally competing candidate models (x→y versus y→x) while assuming the absence of unobserved common causes. The present study describes properties of DDA when confounders are present and extends existing DDA methodology by incorporating the confounder model as a possible explanation. We show that all three explanatory models can be uniquely identified under standard DDA assumptions. Further, we discuss the proposed approach in the context of testing competing mediation models and evaluate an organizational model proposing a mediational relation between school leadership and student achievement via school safety using observational data from an urban school district. Overall, DDA provides strong empirical support that school safety has indeed a causal effect on student achievement but suggests that important confounders are present in the school leadership-safety relation.
Collapse
Affiliation(s)
- Wolfgang Wiedermann
- Statistics, Measurement, and Evaluation in Education, Department of Educational, School, and Counseling Psychology, College of Education, University of Missouri
| | - James Sebastian
- Educational Leadership and Policy Analysis, College of Education, University of Missouri
| |
Collapse
|
10
|
|
11
|
Freeman NLB, Jiang X, Leete OE, Luckett DJ, Pokaprakarn TB, Kosorok MR. Comment: Models as Approximations. Stat Sci 2019; 34:572-574. [PMID: 34526734 DOI: 10.1214/19-sts724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Nikki L B Freeman
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Xiaotong Jiang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Owen E Leete
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Daniel J Luckett
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Teeranan Ben Pokaprakarn
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Michael R Kosorok
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
12
|
Abstract
Summary
We propose a test of independence of two multivariate random vectors, given a sample from the underlying population. Our approach is based on the estimation of mutual information, whose decomposition into joint and marginal entropies facilitates the use of recently developed efficient entropy estimators derived from nearest neighbour distances. The proposed critical values may be obtained by simulation in the case where an approximation to one marginal is available or by permuting the data otherwise. This facilitates size guarantees, and we provide local power analyses, uniformly over classes of densities whose mutual information satisfies a lower bound. Our ideas may be extended to provide new goodness-of-fit tests for normal linear models based on assessing the independence of our vector of covariates and an appropriately defined notion of an error vector. The theory is supported by numerical studies on both simulated and real data.
Collapse
Affiliation(s)
- T B Berrett
- Statistical Laboratory, University of Cambridge, Wilberforce Road, Cambridge CB3 0WB, UK
| | - R J Samworth
- Statistical Laboratory, University of Cambridge, Wilberforce Road, Cambridge CB3 0WB, UK
| |
Collapse
|
13
|
Chakraborty S, Zhang X. Distance Metrics for Measuring Joint Dependence with Application to Causal Inference. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2018.1513364] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
14
|
Direction dependence analysis: A framework to test the direction of effects in linear models with an implementation in SPSS. Behav Res Methods 2019; 50:1581-1601. [PMID: 29663299 DOI: 10.3758/s13428-018-1031-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In nonexperimental data, at least three possible explanations exist for the association of two variables x and y: (1) x is the cause of y, (2) y is the cause of x, or (3) an unmeasured confounder is present. Statistical tests that identify which of the three explanatory models fits best would be a useful adjunct to the use of theory alone. The present article introduces one such statistical method, direction dependence analysis (DDA), which assesses the relative plausibility of the three explanatory models on the basis of higher-moment information about the variables (i.e., skewness and kurtosis). DDA involves the evaluation of three properties of the data: (1) the observed distributions of the variables, (2) the residual distributions of the competing models, and (3) the independence properties of the predictors and residuals of the competing models. When the observed variables are nonnormally distributed, we show that DDA components can be used to uniquely identify each explanatory model. Statistical inference methods for model selection are presented, and macros to implement DDA in SPSS are provided. An empirical example is given to illustrate the approach. Conceptual and empirical considerations are discussed for best-practice applications in psychological data, and sample size recommendations based on previous simulation studies are provided.
Collapse
|
15
|
Teran Hidalgo SJ, Wu MC, Engel SM, Kosorok MR. Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example. Comput Stat Data Anal 2018; 122:135-155. [PMID: 29867285 PMCID: PMC5983390 DOI: 10.1016/j.csda.2018.01.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.
Collapse
Affiliation(s)
| | - Michael C. Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, U.S.A
| | - Stephanie M. Engel
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Michael R. Kosorok
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| |
Collapse
|
16
|
Wiedermann W, Merkle EC, von Eye A. Direction of dependence in measurement error models. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2018; 71:117-145. [PMID: 28872673 DOI: 10.1111/bmsp.12111] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Revised: 05/24/2017] [Indexed: 06/07/2023]
Abstract
Methods to determine the direction of a regression line, that is, to determine the direction of dependence in reversible linear regression models (e.g., x→y vs. y→x), have experienced rapid development within the last decade. However, previous research largely rested on the assumption that the true predictor is measured without measurement error. The present paper extends the direction dependence principle to measurement error models. First, we discuss asymmetric representations of the reliability coefficient in terms of higher moments of variables and the attenuation of skewness and excess kurtosis due to measurement error. Second, we identify conditions where direction dependence decisions are biased due to measurement error and suggest method of moments (MOM) estimation as a remedy. Third, we address data situations in which the true outcome exhibits both regression and measurement error, and propose a sensitivity analysis approach to determining the robustness of direction dependence decisions against unreliably measured outcomes. Monte Carlo simulations were performed to assess the performance of MOM-based direction dependence measures and their robustness to violated measurement error assumptions (i.e., non-independence and non-normality). An empirical example from subjective well-being research is presented. The plausibility of model assumptions and links to modern causal inference methods for observational data are discussed.
Collapse
Affiliation(s)
- Wolfgang Wiedermann
- Department of Educational, School, and Counseling Psychology, University of Missouri, Columbia, Missouri, USA
| | - Edgar C Merkle
- Department of Psychological Sciences, University of Missouri, Columbia, Missouri, USA
| | - Alexander von Eye
- Department of Psychology, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
17
|
Wiedermann W. A note on fourth moment-based direction dependence measures when regression errors are non normal. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2017.1388403] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Wolfgang Wiedermann
- Department of Educational, School, and Counseling Psychology, University of Missouri, Columbia, MO, USA
| |
Collapse
|
18
|
Wiedermann W, Artner R, von Eye A. Heteroscedasticity as a Basis of Direction Dependence in Reversible Linear Regression Models. MULTIVARIATE BEHAVIORAL RESEARCH 2017; 52:222-241. [PMID: 28128999 DOI: 10.1080/00273171.2016.1275498] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Heteroscedasticity is a well-known issue in linear regression modeling. When heteroscedasticity is observed, researchers are advised to remedy possible model misspecification of the explanatory part of the model (e.g., considering alternative functional forms and/or omitted variables). The present contribution discusses another source of heteroscedasticity in observational data: Directional model misspecifications in the case of nonnormal variables. Directional misspecification refers to situations where alternative models are equally likely to explain the data-generating process (e.g., x → y versus y → x). It is shown that the homoscedasticity assumption is likely to be violated in models that erroneously treat true nonnormal predictors as response variables. Recently, Direction Dependence Analysis (DDA) has been proposed as a framework to empirically evaluate the direction of effects in linear models. The present study links the phenomenon of heteroscedasticity with DDA and describes visual diagnostics and nine homoscedasticity tests that can be used to make decisions concerning the direction of effects in linear models. Results of a Monte Carlo simulation that demonstrate the adequacy of the approach are presented. An empirical example is provided, and applicability of the methodology in cases of violated assumptions is discussed.
Collapse
|
19
|
|
20
|
Sen B, Meyer M. Testing against a linear regression model using ideas from shape‐restricted estimation. J R Stat Soc Series B Stat Methodol 2016. [DOI: 10.1111/rssb.12178] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Affiliation(s)
| | - Mary Meyer
- Colorado State University Fort Collins USA
| |
Collapse
|
21
|
|