1
|
Ma W, Tu F, Liu H. Regression analysis for covariate-adaptive randomization: A robust and efficient inference perspective. Stat Med 2022; 41:5645-5661. [PMID: 36134688 DOI: 10.1002/sim.9585] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 07/05/2022] [Accepted: 09/09/2022] [Indexed: 11/08/2022]
Abstract
Linear regression is arguably the most fundamental statistical model; however, the validity of its use in randomized clinical trials, despite being common practice, has never been crystal clear, particularly when stratified or covariate-adaptive randomization is used. In this article, we investigate several of the most intuitive and commonly used regression models for estimating and inferring the treatment effect in randomized clinical trials. By allowing the regression model to be arbitrarily misspecified, we demonstrate that all these regression-based estimators robustly estimate the treatment effect, albeit with possibly different efficiency. We also propose consistent non-parametric variance estimators and compare their performances to those of the model-based variance estimators that are readily available in standard statistical software. Based on the results and taking into account both theoretical efficiency and practical feasibility, we make recommendations for the effective use of regression under various scenarios. For equal allocation, it suffices to use the regression adjustment for the stratum covariates and additional baseline covariates, if available, with the usual ordinary-least-squares variance estimator. For unequal allocation, regression with treatment-by-covariate interactions should be used, together with our proposed variance estimators. These recommendations apply to simple and stratified randomization, and minimization, among others. We hope this work helps to clarify and promote the usage of regression in randomized clinical trials.
Collapse
Affiliation(s)
- Wei Ma
- Institute of Statistics and Big Data, Renmin University of China, Beijing, China
| | - Fuyi Tu
- Institute of Statistics and Big Data, Renmin University of China, Beijing, China
| | - Hanzhong Liu
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China
| |
Collapse
|
2
|
Smith MJ, Mansournia MA, Maringe C, Zivich PN, Cole SR, Leyrat C, Belot A, Rachet B, Luque-Fernandez MA. Introduction to computational causal inference using reproducible Stata, R, and Python code: A tutorial. Stat Med 2022; 41:407-432. [PMID: 34713468 DOI: 10.1002/sim.9234] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 10/08/2021] [Accepted: 10/11/2021] [Indexed: 11/09/2022]
Abstract
The main purpose of many medical studies is to estimate the effects of a treatment or exposure on an outcome. However, it is not always possible to randomize the study participants to a particular treatment, therefore observational study designs may be used. There are major challenges with observational studies; one of which is confounding. Controlling for confounding is commonly performed by direct adjustment of measured confounders; although, sometimes this approach is suboptimal due to modeling assumptions and misspecification. Recent advances in the field of causal inference have dealt with confounding by building on classical standardization methods. However, these recent advances have progressed quickly with a relative paucity of computational-oriented applied tutorials contributing to some confusion in the use of these methods among applied researchers. In this tutorial, we show the computational implementation of different causal inference estimators from a historical perspective where new estimators were developed to overcome the limitations of the previous estimators (ie, nonparametric and parametric g-formula, inverse probability weighting, double-robust, and data-adaptive estimators). We illustrate the implementation of different methods using an empirical example from the Connors study based on intensive care medicine, and most importantly, we provide reproducible and commented code in Stata, R, and Python for researchers to adapt in their own observational study. The code can be accessed at https://github.com/migariane/Tutorial_Computational_Causal_Inference_Estimators.
Collapse
Affiliation(s)
- Matthew J Smith
- Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Mohammad A Mansournia
- Department of Epidemiology and Biostatistics, Tehran University of Medical Sciences, Tehran, Iran
| | - Camille Maringe
- Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Paul N Zivich
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Stephen R Cole
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Clémence Leyrat
- Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Aurélien Belot
- Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Bernard Rachet
- Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Miguel A Luque-Fernandez
- Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
- Non-communicable Disease and Cancer Epidemiology Group, Instituto de Investigacion Biosanitaria de Granada (ibs.GRANADA), Andalusian School of Public Health, University of Granada, Granada, Spain
- Biomedical Network Research Centers of Epidemiology and Public Health (CIBERESP), Madrid, Spain
| |
Collapse
|
3
|
Abstract
Objective To define confounding bias in difference‐in‐difference studies and compare regression‐ and matching‐based estimators designed to correct bias due to observed confounders. Data sources We simulated data from linear models that incorporated different confounding relationships: time‐invariant covariates with a time‐varying effect on the outcome, time‐varying covariates with a constant effect on the outcome, and time‐varying covariates with a time‐varying effect on the outcome. We considered a simple setting that is common in the applied literature: treatment is introduced at a single time point and there is no unobserved treatment effect heterogeneity. Study design We compared the bias and root mean squared error of treatment effect estimates from six model specifications, including simple linear regression models and matching techniques. Data collection Simulation code is provided for replication. Principal findings Confounders in difference‐in‐differences are covariates that change differently over time in the treated and comparison group or have a time‐varying effect on the outcome. When such a confounding variable is measured, appropriately adjusting for this confounder (ie, including the confounder in a regression model that is consistent with the causal model) can provide unbiased estimates with optimal SE. However, when a time‐varying confounder is affected by treatment, recovering an unbiased causal effect using difference‐in‐differences is difficult. Conclusions Confounding in difference‐in‐differences is more complicated than in cross‐sectional settings, from which techniques and intuition to address observed confounding cannot be imported wholesale. Instead, analysts should begin by postulating a causal model that relates covariates, both time‐varying and those with time‐varying effects on the outcome, to treatment. This causal model will then guide the specification of an appropriate analytical model (eg, using regression or matching) that can produce unbiased treatment effect estimates. We emphasize the importance of thoughtful incorporation of covariates to address confounding bias in difference‐in‐difference studies.
Collapse
Affiliation(s)
- Bret Zeldow
- Department of Mathematics and Statistics, Colby College, Waterville, Maine, USA
| | - Laura A Hatfield
- Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
4
|
Huang R, Xu R, Dulai PS. Sensitivity analysis of treatment effect to unmeasured confounding in observational studies with survival and competing risks outcomes. Stat Med 2020; 39:3397-3411. [PMID: 32677758 DOI: 10.1002/sim.8672] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2019] [Revised: 05/30/2020] [Accepted: 06/03/2020] [Indexed: 11/09/2022]
Abstract
No unmeasured confounding is often assumed in estimating treatment effects in observational data, whether using classical regression models or approaches such as propensity scores and inverse probability weighting. However, in many such studies collection of confounders cannot possibly be exhaustive in practice, and it is crucial to examine the extent to which the resulting estimate is sensitive to the unmeasured confounders. We consider this problem for survival and competing risks data. Due to the complexity of models for such data, we adapt the simulated potential confounder approach of Carnegie et al (2016), which provides a general tool for sensitivity analysis due to unmeasured confounding. More specifically, we specify one sensitivity parameter to quantify the association between an unmeasured confounder and the exposure or treatment received, and another set of parameters to quantify the association between the confounder and the time-to-event outcomes. By varying the magnitudes of the sensitivity parameters, we estimate the treatment effect of interest using the stochastic expectation-maximization (EM) and the EM algorithms. We demonstrate the performance of our methods on simulated data, and apply them to a comparative effectiveness study in inflammatory bowel disease. An R package "survSens" is available on CRAN that implements the proposed methodology.
Collapse
Affiliation(s)
- Rong Huang
- Department of Mathematics, University of California San Diego, La Jolla, California, USA
| | - Ronghui Xu
- Department of Mathematics, University of California San Diego, La Jolla, California, USA.,Department of Family Medicine and Public Health, University of California San Diego, La Jolla, California, USA
| | - Parambir S Dulai
- Department of Medicine, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
5
|
Mao H, Li L. Flexible regression approach to propensity score analysis and its relationship with matching and weighting. Stat Med 2020; 39:2017-2034. [PMID: 32185801 DOI: 10.1002/sim.8526] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 01/21/2020] [Accepted: 02/22/2020] [Indexed: 11/10/2022]
Abstract
In propensity score analysis, the frequently used regression adjustment involves regressing the outcome on the estimated propensity score and treatment indicator. This approach can be highly efficient when model assumptions are valid, but can lead to biased results when the assumptions are violated. We extend the simple regression adjustment to a varying coefficient regression model that allows for nonlinear association between outcome and propensity score. We discuss its connection with some propensity score matching and weighting methods, and show that the proposed analytical framework can shed light on the intrinsic connection among some mainstream propensity score approaches (stratification, regression, kernel matching, and inverse probability weighting) and handle commonly used causal estimands. We derive analytic point and variance estimators that properly take into account the sampling variability in the estimated propensity score. Extensive simulations show that the proposed approach possesses desired finite sample properties and demonstrates competitive performance in comparison with other methods estimating the same causal estimand. The proposed methodology is illustrated with a study on right heart catheterization.
Collapse
Affiliation(s)
- Huzhang Mao
- Department of Biostatistics and Data Science, University of Texas School of Public Health, Houston, TX, USA.,Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Liang Li
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
6
|
Sakurai R, Ueki M, Makino S, Hozawa A, Kuriyama S, Takai-Igarashi T, Kinoshita K, Yamamoto M, Tamiya G. Outlier detection for questionnaire data in biobanks. Int J Epidemiol 2020; 48:1305-1315. [PMID: 30848787 DOI: 10.1093/ije/dyz012] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2019] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Biobanks increasingly collect, process and store omics with more conventional epidemiologic information necessitating considerable effort in data cleaning. An efficient outlier detection method that reduces manual labour is highly desirable. METHOD We develop an unsupervised machine-learning method for outlier detection, namely kurPCA, that uses principal component analysis combined with kurtosis to ascertain the existence of outliers. In addition, we propose a novel regression adjustment approach to improve detection, namely the regression adjustment for data by systematic missing patterns (RAMP). RESULT Application to epidemiological record data in a large-scale biobank (Tohoku Medical Megabank Organization, Japan) shows that a combination of kurPCA and RAMP effectively detects known errors or inconsistent patterns. CONCLUSIONS We confirm through the results of the simulation and the application that our methods showed good performance. The proposed methods are useful for many practical analysis scenarios.
Collapse
Affiliation(s)
- Rieko Sakurai
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan.,Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Masao Ueki
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan.,Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Satoshi Makino
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan.,Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Atsushi Hozawa
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Shinichi Kuriyama
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,Graduate School of Medicine, Tohoku University, Sendai, Japan.,International Research Institute of Disaster Science (IRIDeS), Tohoku University, Sendai, Japan
| | - Takako Takai-Igarashi
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Kengo Kinoshita
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | - Masayuki Yamamoto
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Gen Tamiya
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan.,Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| |
Collapse
|
7
|
Abstract
The estimation of causal effects in nonrandomized studies should comprise two distinct phases: design, with no outcome data available; and analysis of the outcome data according to a specified protocol. Here, we review and compare point and interval estimates of common statistical procedures for estimating causal effects (i.e. matching, subclassification, weighting, and model-based adjustment) with a scalar continuous covariate and a scalar continuous outcome. We show, using an extensive simulation, that some highly advocated methods have poor operating characteristics. In many conditions, matching for the point estimate combined with within-group matching for sampling variance estimation, with or without covariance adjustment, appears to be the most efficient valid method of those evaluated. These results provide new conclusions and advice regarding the merits of currently used procedures.
Collapse
Affiliation(s)
- R Gutman
- Department of Biostatistics, Brown University, Providence, RI, USA
| | - DB Rubin
- Department of Statistics, Harvard University, Cambridge, MA, USA
| |
Collapse
|
8
|
Abstract
We study the problem of treatment effect estimation in randomized experiments with high-dimensional covariate information and show that essentially any risk-consistent regression adjustment can be used to obtain efficient estimates of the average treatment effect. Our results considerably extend the range of settings where high-dimensional regression adjustments are guaranteed to provide valid inference about the population average treatment effect. We then propose cross-estimation, a simple method for obtaining finite-sample-unbiased treatment effect estimates that leverages high-dimensional regression adjustments. Our method can be used when the regression model is estimated using the lasso, the elastic net, subset selection, etc. Finally, we extend our analysis to allow for adaptive specification search via cross-validation and flexible nonparametric regression adjustments with machine-learning methods such as random forests or neural networks.
Collapse
Affiliation(s)
- Stefan Wager
- Department of Statistics, Stanford University, Stanford, CA 94305;
- Operations, Information & Technology, Stanford Graduate School of Business, Stanford University, Stanford, CA 94305
| | - Wenfei Du
- Department of Statistics, Stanford University, Stanford, CA 94305
| | - Jonathan Taylor
- Department of Statistics, Stanford University, Stanford, CA 94305
| | - Robert J Tibshirani
- Department of Statistics, Stanford University, Stanford, CA 94305;
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305
| |
Collapse
|
9
|
Linden A, Uysal SD, Ryan A, Adams JL. Estimating causal effects for multivalued treatments: a comparison of approaches. Stat Med 2015; 35:534-52. [PMID: 26482211 DOI: 10.1002/sim.6768] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2015] [Revised: 07/25/2015] [Accepted: 09/28/2015] [Indexed: 11/10/2022]
Abstract
Interventions with multivalued treatments are common in medical and health research, such as when comparing the efficacy of competing drugs or interventions, or comparing between various doses of a particular drug. In recent years, there has been a growing interest in the development of multivalued treatment effect estimators using observational data. In this paper, we compare the performance of commonly used regression-based methods that estimate multivalued treatment effects based on the unconfoundedness assumption. These estimation methods fall into three general categories: (i) estimators based on a model for the outcome variable using conventional regression adjustment; (ii) weighted estimators based on a model for the treatment assignment; and (iii) 'doubly-robust' estimators that model both the treatment assignment and outcome variable within the same framework. We assess the performance of these models using Monte Carlo simulation and demonstrate their application with empirical data. Our results show that (i) when models estimating both the treatment and outcome are correctly specified, all adjustment methods provide similar unbiased estimates; (ii) when the outcome model is misspecified, regression adjustment performs poorly, while all the weighting methods provide unbiased estimates; (iii) when the treatment model is misspecified, methods based solely on modeling the treatment perform poorly, while regression adjustment and the doubly robust models provide unbiased estimates; and (iv) when both the treatment and outcome models are misspecified, all methods perform poorly. Given that researchers will rarely know which of the two models is misspecified, our results support the use of doubly robust estimation.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, Ann Arbor, MI, U.S.A.,Department of Health Management & Policy, University of Michigan School of Public Health, Ann Arbor, MI, U.S.A
| | - S Derya Uysal
- Department of Economics and Finance, IHS, Vienna, Austria
| | - Andrew Ryan
- Department of Health Management & Policy, University of Michigan School of Public Health, Ann Arbor, MI, U.S.A
| | - John L Adams
- Kaiser Permanente, Center for Effectiveness and Safety Research, Pasadena, CA, U.S.A
| |
Collapse
|