1
|
Ma W, Chen C, Gong Y, Chan NY, Jiang M, Mak CHK, Abrigo JM, Dou Q. Causal Effect Estimation on Imaging and Clinical Data for Treatment Decision Support of Aneurysmal Subarachnoid Hemorrhage. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2778-2789. [PMID: 38635381 DOI: 10.1109/tmi.2024.3390812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/20/2024]
Abstract
Aneurysmal subarachnoid hemorrhage is a medical emergency of brain that has high mortality and poor prognosis. Causal effect estimation of treatment strategies on patient outcomes is crucial for aneurysmal subarachnoid hemorrhage treatment decision-making. However, most existing studies on treatment decision-making support of this disease are unable to simultaneously compare the potential outcomes of different treatments for a patient. Furthermore, these studies fail to harmoniously integrate the imaging data with non-imaging clinical data, both of which are useful in clinical scenarios. In this paper, we estimate the causal effect of various treatments on patients with aneurysmal subarachnoid hemorrhage by integrating plain CT with non-imaging clinical data, which is represented using structured tabular data. Specifically, we first propose a novel scheme that uses multi-modality confounders distillation architecture to predict the treatment outcome and treatment assignment simultaneously. With these distilled confounder features, we design an imaging and non-imaging interaction representation learning strategy to use the complementary information extracted from different modalities to balance the feature distribution of different treatment groups. We have conducted extensive experiments using a clinical dataset of 656 subarachnoid hemorrhage cases, which was collected from the Hospital Authority Data Collaboration Laboratory in Hong Kong. Our method shows consistent improvements on the evaluation metrics of treatment effect estimation, achieving state-of-the-art results over strong competitors. Code is released at https://github.com/med-air/TOP-aSAH.
Collapse
|
2
|
Wang T, Zhao H, Yang S, Tang S, Cui Z, Li L, Faries DE. Propensity score matching for estimating a marginal hazard ratio. Stat Med 2024; 43:2783-2810. [PMID: 38705726 PMCID: PMC11178458 DOI: 10.1002/sim.10103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 01/31/2024] [Accepted: 04/24/2024] [Indexed: 05/07/2024]
Abstract
Propensity score matching is commonly used to draw causal inference from observational survival data. However, its asymptotic properties have yet to be established, and variance estimation is still open to debate. We derive the statistical properties of the propensity score matching estimator of the marginal causal hazard ratio based on matching with replacement and a fixed number of matches. We also propose a double-resampling technique for variance estimation that takes into account the uncertainty due to propensity score estimation prior to matching.
Collapse
Affiliation(s)
| | - Honghe Zhao
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Shu Yang
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Shuhan Tang
- Eli Lilly and Company, Indianapolis, Indiana, USA
| | - Zhanglin Cui
- Eli Lilly and Company, Indianapolis, Indiana, USA
| | - Li Li
- Eli Lilly and Company, Indianapolis, Indiana, USA
| | | |
Collapse
|
3
|
Orihara S, Goto A, Taguri M. Valid instrumental variable selection method using negative control outcomes and constructing efficient estimator. Biom J 2024; 66:e2300113. [PMID: 38801216 DOI: 10.1002/bimj.202300113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Revised: 02/29/2024] [Accepted: 03/18/2024] [Indexed: 05/29/2024]
Abstract
In observational studies, instrumental variable (IV) methods are commonly applied when there are unmeasured covariates. In Mendelian randomization, constructing an allele score using many single nucleotide polymorphisms is often implemented; however, estimating biased causal effects by including some invalid IVs poses some risks. Invalid IVs are those IV candidates that are associated with unobserved variables. To solve this problem, we developed a novel strategy using negative control outcomes (NCOs) as auxiliary variables. Using NCOs, we are able to select only valid IVs and exclude invalid IVs without knowing which of the instruments are invalid. We also developed a new two-step estimation procedure and proved the semiparametric efficiency of our estimator. The performance of our proposed method was superior to some previous methods through simulations. Subsequently, we applied the proposed method to the UK Biobank dataset. Our results demonstrate that the use of an auxiliary variable, such as an NCO, enables the selection of valid IVs with assumptions different from those used in previous methods.
Collapse
Affiliation(s)
- Shunichiro Orihara
- Department of Health Data Science, Tokyo Medical University, Tokyo, Japan
- Graduate School of Data Science, Yokohama City University, Kanagawa, Japan
| | - Atsushi Goto
- Graduate School of Data Science, Yokohama City University, Kanagawa, Japan
| | - Masataka Taguri
- Department of Health Data Science, Tokyo Medical University, Tokyo, Japan
- Graduate School of Data Science, Yokohama City University, Kanagawa, Japan
| |
Collapse
|
4
|
Yang S, Gao C, Zeng D, Wang X. Elastic integrative analysis of randomised trial and real-world data for treatment heterogeneity estimation. J R Stat Soc Series B Stat Methodol 2023; 85:575-596. [PMID: 37521165 PMCID: PMC10376438 DOI: 10.1093/jrsssb/qkad017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 05/14/2022] [Accepted: 02/28/2023] [Indexed: 08/01/2023]
Abstract
We propose a test-based elastic integrative analysis of the randomised trial and real-world data to estimate treatment effect heterogeneity with a vector of known effect modifiers. When the real-world data are not subject to bias, our approach combines the trial and real-world data for efficient estimation. Utilising the trial design, we construct a test to decide whether or not to use real-world data. We characterise the asymptotic distribution of the test-based estimator under local alternatives. We provide a data-adaptive procedure to select the test threshold that promises the smallest mean square error and an elastic confidence interval with a good finite-sample coverage property.
Collapse
Affiliation(s)
- Shu Yang
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Chenyin Gao
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Xiaofei Wang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| |
Collapse
|
5
|
Lee D, Yang S, Dong L, Wang X, Zeng D, Cai J. Improving trial generalizability using observational studies. Biometrics 2023; 79:1213-1225. [PMID: 34862966 PMCID: PMC9166225 DOI: 10.1111/biom.13609] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 11/06/2021] [Accepted: 11/22/2021] [Indexed: 11/29/2022]
Abstract
Complementary features of randomized controlled trials (RCTs) and observational studies (OSs) can be used jointly to estimate the average treatment effect of a target population. We propose a calibration weighting estimator that enforces the covariate balance between the RCT and OS, therefore improving the trial-based estimator's generalizability. Exploiting semiparametric efficiency theory, we propose a doubly robust augmented calibration weighting estimator that achieves the efficiency bound derived under the identification assumptions. A nonparametric sieve method is provided as an alternative to the parametric approach, which enables the robust approximation of the nuisance functions and data-adaptive selection of outcome predictors for calibration. We establish asymptotic results and confirm the finite sample performances of the proposed estimators by simulation experiments and an application on the estimation of the treatment effect of adjuvant chemotherapy for early-stage non-small-cell lung patients after surgery.
Collapse
Affiliation(s)
- Dasom Lee
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Shu Yang
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Lin Dong
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Xiaofei Wang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
6
|
Yang S, Zhang Y. Multiply robust matching estimators of average and quantile treatment effects. Scand Stat Theory Appl 2023; 50:235-265. [PMID: 36844478 PMCID: PMC9949738 DOI: 10.1111/sjos.12585] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Propensity score matching has been a long-standing tradition for handling confounding in causal inference, however requiring stringent model assumptions. In this article, we propose novel double score matching (DSM) utilizing both the propensity score and prognostic score. To gain the protection of possible model misspecification, we posit multiple candidate models for each score. We show that the de-biasing DSM estimator achieves the multiple robustness property in that it is consistent if any one of the score models is correctly specified. We characterize the asymptotic distribution for the DSM estimator requiring only one correct model specification based on the martingale representations of the matching estimators and theory for local Normal experiments. We also provide a two-stage replication method for variance estimation and extend DSM for quantile estimation. Simulation demonstrates DSM outperforms single score matching and prevailing multiply robust weighting estimators in the presence of extreme propensity scores.
Collapse
Affiliation(s)
- Shu Yang
- Department of Statistics, North Carolina State University
| | - Yunshu Zhang
- Department of Statistics, North Carolina State University
| |
Collapse
|
7
|
Liu Y, Fan Y. Biased-sample empirical likelihood weighting for missing data problems: an alternative to inverse probability weighting. J R Stat Soc Series B Stat Methodol 2023. [DOI: 10.1093/jrsssb/qkac006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Abstract
Inverse probability weighting (IPW) is widely used in many areas when data are subject to unrepresentativeness, missingness, or selection bias. An inevitable challenge with the use of IPW is that the IPW estimator can be remarkably unstable if some probabilities are very close to zero. To overcome this problem, at least three remedies have been developed in the literature: stabilizing, thresholding, and trimming. However, the final estimators are still IPW-type estimators, and inevitably inherit certain weaknesses of the naive IPW estimator: they may still be unstable or biased. We propose a biased-sample empirical likelihood weighting (ELW) method to serve the same general purpose as IPW, while completely overcoming the instability of IPW-type estimators by circumventing the use of inverse probabilities. The ELW weights are always well defined and easy to implement. We show theoretically that the ELW estimator is asymptotically normal and more efficient than the IPW estimator and its stabilized version for missing data problems. Our simulation results and a real data analysis indicate that the ELW estimator is shift-equivariant, nearly unbiased, and usually outperforms the IPW-type estimators in terms of mean square error.
Collapse
Affiliation(s)
- Yukun Liu
- KLATASDS-MOE, School of Statistics, East China Normal University , Shanghai , China
| | - Yan Fan
- School of Statistics and Information, Shanghai University of International Business and Economics , Shanghai , China
| |
Collapse
|
8
|
Wu L, Yang S. Transfer learning of individualized treatment rules from experimental to real-world data. J Comput Graph Stat 2022; 32:1036-1045. [PMID: 37997592 PMCID: PMC10664843 DOI: 10.1080/10618600.2022.2141752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 10/04/2022] [Indexed: 11/06/2022]
Abstract
Individualized treatment effect lies at the heart of precision medicine. Interpretable individualized treatment rules (ITRs) are desirable for clinicians or policymakers due to their intuitive appeal and transparency. The gold-standard approach to estimating the ITRs is randomized experiments, where subjects are randomized to different treatment groups and the confounding bias is minimized to the extent possible. However, experimental studies are limited in external validity because of their selection restrictions, and therefore the underlying study population is not representative of the target real-world population. Conventional learning methods of optimal interpretable ITRs for a target population based only on experimental data are biased. On the other hand, real-world data (RWD) are becoming popular and provide a representative sample of the target population. To learn the generalizable optimal interpretable ITRs, we propose an integrative transfer learning method based on weighting schemes to calibrate the covariate distribution of the experiment to that of the RWD. Theoretically, we establish the risk consistency for the proposed ITR estimator. Empirically, we evaluate the finite-sample performance of the transfer learner through simulations and apply it to a real data application of a job training program.
Collapse
Affiliation(s)
- Lili Wu
- Department of Statistics, North Carolina State University
| | - Shu Yang
- Department of Statistics, North Carolina State University
| |
Collapse
|
9
|
Gochanour B, Chen S, Beebe L. Multiply robust Bayesian procedures for causal inference problems. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2101065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Affiliation(s)
| | - Sixia Chen
- Department of Biostatistics and Epidemiology, The University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, USA
| | - Laura Beebe
- Department of Biostatistics and Epidemiology, The University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, USA
| |
Collapse
|
10
|
Zhao H, Yang S. Outcome-adjusted balance measure for generalized propensity score model selection. J Stat Plan Inference 2022. [DOI: 10.1016/j.jspi.2022.04.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
11
|
Reich BJ, Yang S, Guan Y, Giffin AB, Miller MJ, Rappold A. A Review of Spatial Causal Inference Methods for Environmental and Epidemiological Applications. Int Stat Rev 2021; 89:605-634. [PMID: 37197445 PMCID: PMC10187770 DOI: 10.1111/insr.12452] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 04/30/2021] [Indexed: 11/30/2022]
Abstract
The scientific rigor and computational methods of causal inference have had great impacts on many disciplines but have only recently begun to take hold in spatial applications. Spatial causal inference poses analytic challenges due to complex correlation structures and interference between the treatment at one location and the outcomes at others. In this paper, we review the current literature on spatial causal inference and identify areas of future work. We first discuss methods that exploit spatial structure to account for unmeasured confounding variables. We then discuss causal analysis in the presence of spatial interference including several common assumptions used to reduce the complexity of the interference patterns under consideration. These methods are extended to the spatiotemporal case where we compare and contrast the potential outcomes framework with Granger causality and to geostatistical analyses involving spatial random fields of treatments and responses. The methods are introduced in the context of observational environmental and epidemiological studies and are compared using both a simulation study and analysis of the effect of ambient air pollution on COVID-19 mortality rate. Code to implement many of the methods using the popular Bayesian software OpenBUGS is provided.
Collapse
Affiliation(s)
- Brian J Reich
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Shu Yang
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Yawen Guan
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | - Andrew B Giffin
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Matthew J Miller
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Ana Rappold
- US Environmental Protection Agency, Research Triangle Park, NC 27709, USA
| |
Collapse
|
12
|
QIU HONGXIANG, LUEDTKE ALEX, CARONE MARCO. Universal sieve-based strategies for efficient estimation using machine learning tools. BERNOULLI 2021; 27:2300-2336. [PMID: 34733110 PMCID: PMC8561841 DOI: 10.3150/20-bej1309] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Suppose that we wish to estimate a finite-dimensional summary of one or more function-valued features of an underlying data-generating mechanism under a nonparametric model. One approach to estimation is by plugging in flexible estimates of these features. Unfortunately, in general, such estimators may not be asymptotically efficient, which often makes these estimators difficult to use as a basis for inference. Though there are several existing methods to construct asymptotically efficient plug-in estimators, each such method either can only be derived using knowledge of efficiency theory or is only valid under stringent smoothness assumptions. Among existing methods, sieve estimators stand out as particularly convenient because efficiency theory is not required in their construction, their tuning parameters can be selected data adaptively, and they are universal in the sense that the same fits lead to efficient plug-in estimators for a rich class of estimands. Inspired by these desirable properties, we propose two novel universal approaches for estimating function-valued features that can be analyzed using sieve estimation theory. Compared to traditional sieve estimators, these approaches are valid under more general conditions on the smoothness of the function-valued features by utilizing flexible estimates that can be obtained, for example, using machine learning.
Collapse
Affiliation(s)
- HONGXIANG QIU
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - ALEX LUEDTKE
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - MARCO CARONE
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| |
Collapse
|
13
|
Garès V, Chauvet G, Hajage D. Variance estimators for weighted and stratified linear dose-response function estimators using generalized propensity score. Biom J 2021; 64:33-56. [PMID: 34327720 DOI: 10.1002/bimj.202000267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 05/07/2021] [Accepted: 06/12/2021] [Indexed: 11/10/2022]
Abstract
Propensity score methods are widely used in observational studies for evaluating marginal treatment effects. The generalized propensity score (GPS) is an extension of the propensity score framework, historically developed in the case of binary exposures, for use with quantitative or continuous exposures. In this paper, we proposed variance estimators for treatment effect estimators on continuous outcomes. Dose-response functions (DRFs) were estimated through weighting on the inverse of the GPS, or using stratification. Variance estimators were evaluated using Monte Carlo simulations. Despite the use of stabilized weights, the variability of the weighted estimator of the DRF was particularly high, and none of the variance estimators (a bootstrap-based estimator, a closed-form estimator especially developed to take into account the estimation step of the GPS, and a sandwich estimator) were able to adequately capture this variability, resulting in coverages below the nominal value, particularly when the proportion of the variation in the quantitative exposure explained by the covariates was large. The stratified estimator was more stable, and variance estimators (a bootstrap-based estimator, a pooled linearized estimator, and a pooled model-based estimator) more efficient at capturing the empirical variability of the parameters of the DRF. The pooled variance estimators tended to overestimate the variance, whereas the bootstrap estimator, which intrinsically takes into account the estimation step of the GPS, resulted in correct variance estimations and coverage rates. These methods were applied to a real data set with the aim of assessing the effect of maternal body mass index on newborn birth weight.
Collapse
Affiliation(s)
- Valérie Garès
- Univ Rennes, INSA, CNRS, IRMAR - UMR 6625, F-35000, Rennes, France
| | | | - David Hajage
- Sorbonne Université, INSERM, Institut Pierre Louis d'Epidémiologie et de Santé Publique, AP-HP, Hôpital Pitié-Salpêtrière, Département de Santé Publique, Centre de Pharmacoépidémiologie, Paris, France
| |
Collapse
|
14
|
Yang S, Kim JK. Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework. Scand Stat Theory Appl 2021; 47:839-861. [PMID: 34305262 DOI: 10.1111/sjos.12429] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Predictive mean matching imputation is popular for handling item nonresponse in survey sampling. In this article, we study the asymptotic properties of the predictive mean matching estimator for finite-population inference using a superpopulation model framework. We also clarify conditions for its robustness. For variance estimation, the conventional bootstrap inference is invalid for matching estimators with a fixed number of matches due to the nonsmoothness nature of the matching estimator. We propose a new replication variance estimator, which is asymptotically valid. The key strategy is to construct replicates directly based on the linear terms of the martingale representation for the matching estimator, instead of individual records of variables. Simulation studies confirm that the proposed method provides valid inference.
Collapse
Affiliation(s)
- Shu Yang
- Department of Statistics, North Carolina State University
| | | |
Collapse
|
15
|
Abstract
The inverse probability weighting is an important propensity score weighting method to estimate the average treatment effect. Recent literature shows that it can be easily combined with covariate balancing constraints to reduce the detrimental effects of excessively large weights and improve balance. Other methods are available to derive weights that balance covariate distributions between the treatment groups without the involvement of propensity scores. We conducted comprehensive Monte Carlo experiments to study whether the use of covariate balancing constraints circumvent the need for correct propensity score model specification, and whether the use of a propensity score model further improves the estimation performance among methods that use similar covariate balancing constraints. We compared simple inverse probability weighting, two propensity score weighting methods with balancing constraints (covariate balancing propensity score, covariate balancing scoring rule), and two weighting methods with balancing constraints but without using the propensity scores (entropy balancing and kernel balancing). We observed that correct specification of the propensity score model remains important even when the constraints effectively balance the covariates. We also observed evidence suggesting that, with similar covariate balance constraints, the use of a propensity score model improves the estimation performance when the dimension of covariates is large. These findings suggest that it is important to develop flexible data-driven propensity score models that satisfy covariate balancing conditions.
Collapse
Affiliation(s)
- Yan Li
- The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, USA
| | - Liang Li
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
16
|
Zhou Y, Matsouaka RA, Thomas L. Propensity score weighting under limited overlap and model misspecification. Stat Methods Med Res 2020; 29:3721-3756. [PMID: 32693715 DOI: 10.1177/0962280220940334] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Propensity score weighting methods are often used in non-randomized studies to adjust for confounding and assess treatment effects. The most popular among them, the inverse probability weighting, assigns weights that are proportional to the inverse of the conditional probability of a specific treatment assignment, given observed covariates. A key requirement for inverse probability weighting estimation is the positivity assumption, i.e. the propensity score must be bounded away from 0 and 1. In practice, violations of the positivity assumption often manifest by the presence of limited overlap in the propensity score distributions between treatment groups. When these practical violations occur, a small number of highly influential inverse probability weights may lead to unstable inverse probability weighting estimators, with biased estimates and large variances. To mitigate these issues, a number of alternative methods have been proposed, including inverse probability weighting trimming, overlap weights, matching weights, and entropy weights. Because overlap weights, matching weights, and entropy weights target the population for whom there is equipoise (and with adequate overlap) and their estimands depend on the true propensity score, a common criticism is that these estimators may be more sensitive to misspecifications of the propensity score model. In this paper, we conduct extensive simulation studies to compare the performances of inverse probability weighting and inverse probability weighting trimming against those of overlap weights, matching weights, and entropy weights under limited overlap and misspecified propensity score models. Across the wide range of scenarios we considered, overlap weights, matching weights, and entropy weights consistently outperform inverse probability weighting in terms of bias, root mean squared error, and coverage probability.
Collapse
Affiliation(s)
- Yunji Zhou
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.,Duke Global Health Institute, Duke University, Durham, NC, USA
| | - Roland A Matsouaka
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.,Program for Comparative Effectiveness Methodology, Duke Clinical Research Institute, Durham, NC, USA
| | - Laine Thomas
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.,Program for Comparative Effectiveness Methodology, Duke Clinical Research Institute, Durham, NC, USA
| |
Collapse
|
17
|
A Pareto-smoothing method for causal inference using generalized Pareto distribution. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.09.095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
18
|
Nethery RC, Mealli F, Dominici F. ESTIMATING POPULATION AVERAGE CAUSAL EFFECTS IN THE PRESENCE OF NON-OVERLAP: THE EFFECT OF NATURAL GAS COMPRESSOR STATION EXPOSURE ON CANCER MORTALITY. Ann Appl Stat 2019; 13:1242-1267. [PMID: 31346355 PMCID: PMC6658123 DOI: 10.1214/18-aoas1231] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Most causal inference studies rely on the assumption of overlap to estimate population or sample average causal effects. When data suffer from non-overlap, estimation of these estimands requires reliance on model specifications, due to poor data support. All existing methods to address non-overlap, such as trimming or down-weighting data in regions of poor data support, change the estimand so that inference cannot be made on the sample or the underlying population. In environmental health research settings, where study results are often intended to influence policy, population-level inference may be critical, and changes in the estimand can diminish the impact of the study results, because estimates may not be representative of effects in the population of interest to policymakers. Researchers may be willing to make additional, minimal modeling assumptions in order to preserve the ability to estimate population average causal effects. We seek to make two contributions on this topic. First, we propose a flexible, data-driven definition of propensity score overlap and non-overlap regions. Second, we develop a novel Bayesian framework to estimate population average causal effects with minor model dependence and appropriately large uncertainties in the presence of non-overlap and causal effect heterogeneity. In this approach, the tasks of estimating causal effects in the overlap and non-overlap regions are delegated to two distinct models, suited to the degree of data support in each region. Tree ensembles are used to non-parametrically estimate individual causal effects in the overlap region, where the data can speak for themselves. In the non-overlap region, where insufficient data support means reliance on model specification is necessary, individual causal effects are estimated by extrapolating trends from the overlap region via a spline model. The promising performance of our method is demonstrated in simulations. Finally, we utilize our method to perform a novel investigation of the causal effect of natural gas compressor station exposure on cancer outcomes. Code and data to implement the method and reproduce all simulations and analyses is available on Github (https://github.com/rachelnethery/overlap).
Collapse
Affiliation(s)
- Rachel C Nethery
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Fabrizia Mealli
- Department of Statistics, Informatics, Applications, University of Florence, Florence, Italy
| | - Francesca Dominici
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|