1
|
Practical Considerations for Sandwich Variance Estimation in 2-Stage Regression Settings. Am J Epidemiol 2024; 193:798-810. [PMID: 38012109 DOI: 10.1093/aje/kwad234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 11/09/2023] [Accepted: 11/16/2023] [Indexed: 11/29/2023] Open
Abstract
In this paper, we present a practical approach for computing the sandwich variance estimator in 2-stage regression model settings. As a motivating example for 2-stage regression, we consider regression calibration, a popular approach for addressing covariate measurement error. The sandwich variance approach has rarely been applied in regression calibration, despite its requiring less computation time than popular resampling approaches for variance estimation, specifically the bootstrap. This is probably because it requires specialized statistical coding. Here we first outline the steps needed to compute the sandwich variance estimator. We then develop a convenient method of computation in R for sandwich variance estimation, which leverages standard regression model outputs and existing R functions and can be applied in the case of a simple random sample or complex survey design. We use a simulation study to compare the sandwich estimator to a resampling variance approach for both settings. Finally, we further compare these 2 variance estimation approaches in data examples from the Women's Health Initiative (1993-2005) and the Hispanic Community Health Study/Study of Latinos (2008-2011). In our simulations, the sandwich variance estimator typically had good numerical performance, but simple Wald bootstrap confidence intervals were unstable or overcovered in certain settings, particularly when there was high correlation between covariates or large measurement error.
Collapse
|
2
|
M-estimation for common epidemiological measures: introduction and applied examples. Int J Epidemiol 2024; 53:dyae030. [PMID: 38423105 PMCID: PMC10904145 DOI: 10.1093/ije/dyae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 02/13/2024] [Indexed: 03/02/2024] Open
Abstract
M-estimation is a statistical procedure that is particularly advantageous for some comon epidemiological analyses, including approaches to estimate an adjusted marginal risk contrast (i.e. inverse probability weighting and g-computation) and data fusion. In such settings, maximum likelihood variance estimates are not consistent. Thus, epidemiologists often resort to bootstrap to estimate the variance. In contrast, M-estimation allows for consistent variance estimates in these settings without requiring the computational complexity of the bootstrap. In this paper, we introduce M-estimation and provide four illustrative examples of implementation along with software code in multiple languages. M-estimation is a flexible and computationally efficient estimation procedure that is a powerful addition to the epidemiologist's toolbox.
Collapse
|
3
|
Estimating Subgroup Effects in Generalizability and Transportability Analyses. Am J Epidemiol 2024; 193:149-158. [PMID: 35225329 DOI: 10.1093/aje/kwac036] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 02/17/2022] [Accepted: 02/23/2022] [Indexed: 11/13/2022] Open
Abstract
Methods for extending-generalizing or transporting-inferences from a randomized trial to a target population involve conditioning on a large set of covariates that is sufficient for rendering the randomized and nonrandomized groups exchangeable. Yet, decision makers are often interested in examining treatment effects in subgroups of the target population defined in terms of only a few discrete covariates. Here, we propose methods for estimating subgroup-specific potential outcome means and average treatment effects in generalizability and transportability analyses, using outcome model--based (g-formula), weighting, and augmented weighting estimators. We consider estimating subgroup-specific average treatment effects in the target population and its nonrandomized subset, and we provide methods that are appropriate both for nested and non-nested trial designs. As an illustration, we apply the methods to data from the Coronary Artery Surgery Study (North America, 1975-1996) to compare the effect of surgery plus medical therapy versus medical therapy alone for chronic coronary artery disease in subgroups defined by history of myocardial infarction.
Collapse
|
4
|
Universal Difference-in-Differences for Causal Inference in Epidemiology. Epidemiology 2024; 35:16-22. [PMID: 38032801 PMCID: PMC10683972 DOI: 10.1097/ede.0000000000001676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 09/21/2023] [Indexed: 12/02/2023]
Abstract
Difference-in-differences is undoubtedly one of the most widely used methods for evaluating the causal effect of an intervention in observational (i.e., nonrandomized) settings. The approach is typically used when pre- and postexposure outcome measurements are available, and one can reasonably assume that the association of the unobserved confounder with the outcome has the same absolute magnitude in the two exposure arms and is constant over time; a so-called parallel trends assumption. The parallel trends assumption may not be credible in many practical settings, for example, if the outcome is binary, a count, or polytomous, as well as when an uncontrolled confounder exhibits nonadditive effects on the distribution of the outcome, even if such effects are constant over time. We introduce an alternative approach that replaces the parallel trends assumption with an odds ratio equi-confounding assumption under which an association between treatment and the potential outcome under no treatment is identified with a well-specified generalized linear model relating the pre-exposure outcome and the exposure. Because the proposed method identifies any causal effect that is conceivably identified in the absence of confounding bias, including nonlinear effects such as quantile treatment effects, the approach is aptly called universal difference-in-differences. We describe and illustrate both fully parametric and more robust semiparametric universal difference-in-differences estimators in a real-world application concerning the causal effects of a Zika virus outbreak on birth rate in Brazil. A supplementary digital video is available at: http://links.lww.com/EDE/C90.
Collapse
|
5
|
Causally interpretable meta-analysis: Clearly defined causal effects and two case studies. Res Synth Methods 2024; 15:61-72. [PMID: 37696604 DOI: 10.1002/jrsm.1671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 08/30/2023] [Accepted: 08/31/2023] [Indexed: 09/13/2023]
Abstract
Meta-analysis is commonly used to combine results from multiple clinical trials, but traditional meta-analysis methods do not refer explicitly to a population of individuals to whom the results apply and it is not clear how to use their results to assess a treatment's effect for a population of interest. We describe recently-introduced causally interpretable meta-analysis methods and apply their treatment effect estimators to two individual-participant data sets. These estimators transport estimated treatment effects from studies in the meta-analysis to a specified target population using the individuals' potentially effect-modifying covariates. We consider different regression and weighting methods within this approach and compare the results to traditional aggregated-data meta-analysis methods. In our applications, certain versions of the causally interpretable methods performed somewhat better than the traditional methods, but the latter generally did well. The causally interpretable methods offer the most promise when covariates modify treatment effects and our results suggest that traditional methods work well when there is little effect heterogeneity. The causally interpretable approach gives meta-analysis an appealing theoretical framework by relating an estimator directly to a specific population and lays a solid foundation for future developments.
Collapse
|
6
|
A robust method to improve the regression accuracy of LIBS data: determination of heavy metal Cu in Tegillarca granosa. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2023; 15:6460-6467. [PMID: 37982179 DOI: 10.1039/d3ay01411h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2023]
Abstract
Tegillarca granosa (T. granosa) is susceptible to contamination by heavy metals, which poses potential health risks for consumers. Laser-induced breakdown spectroscopy (LIBS) combined with the classical partial least squares (PLS) model has shown promise in determining heavy metal concentrations in T. granosa. However, the presence of outliers during calibration can compromise the model's integrity and diminish its predictive capabilities. To address this issue, we propose using a robust method for partial least squares, RSIMPLS, to improve the accuracy of Cu prediction in T. granosa. The RSIMPLS algorithm was employed to analyze and process the high-dimensional LIBS data and utilized diagnostic plots to identify various types of outliers. By selectively eliminating certain outliers, a robust calibration method was achieved. The results showed that LIBS spectroscopy has the potential to predict Cu in T. granosa, with a coefficient of determination (Rp2) of 0.79 and a root mean square error of prediction (RMSEP) of 11.28. RSIMPLS significantly improved the prediction accuracy of Cu concentrations with a 43% decrease in RMSEP compared to the PLS. These findings validated the effectiveness of combining LIBS data with the RSIMPLS algorithm for the prediction of Cu concentrations in T. granosa.
Collapse
|
7
|
Reweighting estimators to extend the external validity of clinical trials: methodological considerations. J Biopharm Stat 2023; 33:515-543. [PMID: 36688658 DOI: 10.1080/10543406.2022.2162067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 12/10/2022] [Indexed: 01/24/2023]
Abstract
Methods to extend the strong internal validity of randomized controlled trials to reliably estimate treatment effects in target populations are gaining attention. This paper enumerates steps recommended for undertaking such extended inference, discusses currently viable choices for each one, and provides recommendations. We demonstrate a complete extended inference from a clinical trial studying a pharmaceutical treatment for Alzheimer's disease (AD) to a realistic target population of European residents diagnosed with AD. This case study highlights approaches to overcoming practical difficulties and demonstrates limitations of reliably extending inference from a trial to a real-world population.
Collapse
|
8
|
Sensitivity analysis using bias functions for studies extending inferences from a randomized trial to a target population. Stat Med 2023; 42:2029-2043. [PMID: 36847107 PMCID: PMC10219839 DOI: 10.1002/sim.9550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 05/20/2022] [Accepted: 07/21/2022] [Indexed: 03/01/2023]
Abstract
Extending (i.e., generalizing or transporting) causal inferences from a randomized trial to a target population requires assumptions that randomized and nonrandomized individuals are exchangeable conditional on baseline covariates. These assumptions are made on the basis of background knowledge, which is often uncertain or controversial, and need to be subjected to sensitivity analysis. We present simple methods for sensitivity analyses that directly parameterize violations of the assumptions using bias functions and do not require detailed background knowledge about specific unknown or unmeasured determinants of the outcome or modifiers of the treatment effect. We show how the methods can be applied to non-nested trial designs, where the trial data are combined with a separately obtained sample of nonrandomized individuals, as well as to nested trial designs, where the trial is embedded within a cohort sampled from the target population.
Collapse
|
9
|
Illustration of 2 Fusion Designs and Estimators. Am J Epidemiol 2023; 192:467-474. [PMID: 35388406 PMCID: PMC10372880 DOI: 10.1093/aje/kwac067] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 03/25/2022] [Accepted: 03/31/2022] [Indexed: 11/12/2022] Open
Abstract
"Fusion" study designs combine data from different sources to answer questions that could not be answered (as well) by subsets of the data. Studies that augment main study data with validation data, as in measurement-error correction studies or generalizability studies, are examples of fusion designs. Fusion estimators, here solutions to stacked estimating functions, produce consistent answers to identified research questions using data from fusion designs. In this paper, we describe a pair of examples of fusion designs and estimators, one where we generalize a proportion to a target population and one where we correct measurement error in a proportion. For each case, we present an example motivated by human immunodeficiency virus research and summarize results from simulation studies. Simulations demonstrate that the fusion estimators provide approximately unbiased results with appropriate 95% confidence interval coverage. Fusion estimators can be used to appropriately combine data in answering important questions that benefit from multiple sources of information.
Collapse
|
10
|
Potential Effects of Prolonged Water-Only Fasting Followed by a Whole-Plant-Food Diet on Salty and Sweet Taste Sensitivity and Perceived Intensity, Food Liking, and Dietary Intake. Cureus 2022; 14:e24689. [PMID: 35663685 PMCID: PMC9161620 DOI: 10.7759/cureus.24689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/02/2022] [Indexed: 11/22/2022] Open
Abstract
The overconsumption of calorie-dense foods high in added salt, sugar, and fat is a major contributor to current rates of obesity, and methods to reduce consumption are needed. Prolonged water-only fasting followed by an exclusively whole-plant-food diet free of added salt, oil, and sugar may reduce the consumption of these hyper-palatable foods, but such effects have not been quantified. Therefore, we conducted a preliminary study to estimate the effects of this intervention on salty and sweet taste detection and recognition thresholds and perceived taste intensity after at least five days of fasting and at refeed day three. We also assessed the effects on sweet, salty, and fatty food preference and overall dietary consumption 30 days after the day three refeed visit. Based on this data, we estimated that 10 days after the start of the fasting, salty taste recognition, sweet taste detection, and sweet taste recognition thresholds decreased significantly, salty taste intensity ratings increased significantly, and sweet taste intensity ratings decreased significantly. We also have preliminary data that prolonged water-only fasting followed by refeeding on an exclusively whole-food-plant diet may reduce salty/fatty and sweet/fatty food liking, reduce sugar intake, and increase vegetable intake. These results support further research into the effects of fasting and diet on taste function and food likability and consumption.
Collapse
|
11
|
Abstract
BACKGROUND With many disease-modifying therapies currently approved for the management of multiple sclerosis, there is a growing need to evaluate the comparative effectiveness and safety of those therapies from real-world data sources. Propensity score methods have recently gained popularity in multiple sclerosis research to generate real-world evidence. Recent evidence suggests, however, that the conduct and reporting of propensity score analyses are often suboptimal in multiple sclerosis studies. OBJECTIVES To provide practical guidance to clinicians and researchers on the use of propensity score methods within the context of multiple sclerosis research. METHODS We summarize recommendations on the use of propensity score matching and weighting based on the current methodological literature, and provide examples of good practice. RESULTS Step-by-step recommendations are presented, starting with covariate selection and propensity score estimation, followed by guidance on the assessment of covariate balance and implementation of propensity score matching and weighting. Finally, we focus on treatment effect estimation and sensitivity analyses. CONCLUSION This comprehensive set of recommendations highlights key elements that require careful attention when using propensity score methods.
Collapse
|
12
|
Novel disease associations with schizophrenia genetic risk revealed in ~400,000 UK Biobank participants. Mol Psychiatry 2022; 27:1448-1454. [PMID: 34799693 PMCID: PMC9106855 DOI: 10.1038/s41380-021-01387-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 10/18/2021] [Accepted: 10/28/2021] [Indexed: 01/09/2023]
Abstract
Schizophrenia is a serious mental disorder with considerable somatic and psychiatric morbidity. It is unclear whether comorbid health conditions predominantly arise due to shared genetic risk or consequent to having schizophrenia. To explore the contribution of genetic risk for schizophrenia, we analysed the effect of schizophrenia polygenic risk scores (PRS) on a broad range of health problems in 406 929 individuals with no schizophrenia diagnosis from the UK Biobank. Diagnoses were derived from linked health data including primary care, hospital inpatient records, and registers with information on cancer and deaths. Schizophrenia PRS were generated and tested for associations with general health conditions, 16 ICD10 main chapters, and 603 diseases using linear and logistic regressions. Higher schizophrenia PRS was significantly associated with poorer overall health ratings, more hospital inpatient diagnoses, and more unique illnesses. It was also significantly positively associated with 4 ICD10 chapters: mental disorders; respiratory diseases; digestive diseases; and pregnancy, childbirth and the puerperium, but negatively associated with musculoskeletal disorders. Thirty-one specific phenotypes were significantly associated with schizophrenia PRS, and the 19 novel findings include several musculoskeletal diseases, respiratory diseases, digestive diseases, varicose veins, pituitary hyperfunction, and other peripheral nerve disorders. These findings extend knowledge of the pleiotropic effect of genetic risk for schizophrenia and offer insight into how some conditions often comorbid with schizophrenia arise. Additional studies incorporating the genetic basis of hormone regulation and involvement of immune mechanisms in the pathophysiology of schizophrenia may further elucidate the biological mechanisms underlying schizophrenia and its comorbid conditions.
Collapse
|
13
|
Power and sample size for observational studies of point exposure effects. Biometrics 2022; 78:388-398. [PMID: 33226116 PMCID: PMC8141060 DOI: 10.1111/biom.13405] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 09/12/2020] [Accepted: 11/11/2020] [Indexed: 11/29/2022]
Abstract
Inverse probability of treatment weights (IPTWs) are commonly used to control for confounding when estimating causal effects of point exposures from observational data. When planning a study that will be analyzed with IPTWs, determining the required sample size for a given level of statistical power is challenging because of the effect of weighting on the variance of the estimated causal means. This paper considers the utility of the design effect to quantify the effect of weighting on the precision of causal estimates. The design effect is defined as the ratio of the variance of the causal mean estimator divided by the variance of a naïve estimator if, counter to fact, no confounding had been present and weights were not needed. A simple, closed-form approximation of the design effect is derived that is outcome invariant and can be estimated during the study design phase. Once the design effect is approximated for each treatment group, sample size calculations are conducted as for a randomized trial, but with variances inflated by the design effects to account for weighting. Simulations demonstrate the accuracy of the design effect approximation, and practical considerations are discussed.
Collapse
|
14
|
On Variance of the Treatment Effect in the Treated When Estimated by Inverse Probability Weighting. Am J Epidemiol 2022; 191:1092-1097. [PMID: 35106534 PMCID: PMC9271225 DOI: 10.1093/aje/kwac014] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 01/09/2022] [Accepted: 01/21/2022] [Indexed: 02/04/2023] Open
Abstract
In the analysis of observational studies, inverse probability weighting (IPW) is commonly used to consistently estimate the average treatment effect (ATE) or the average treatment effect in the treated (ATT). The variance of the IPW ATE estimator is often estimated by assuming that the weights are known and then using the so-called "robust" (Huber-White) sandwich estimator, which results in conservative standard errors (SEs). Here we show that using such an approach when estimating the variance of the IPW ATT estimator does not necessarily result in conservative SE estimates. That is, assuming the weights are known, the robust sandwich estimator may be either conservative or anticonservative. Thus, confidence intervals for the ATT using the robust SE estimate will not be valid, in general. Instead, stacked estimating equations which account for the weight estimation can be used to compute a consistent, closed-form variance estimator for the IPW ATT estimator. The 2 variance estimators are compared via simulation studies and in a data analysis of the association between smoking and gene expression.
Collapse
|
15
|
Inverse probability weighted estimators of vaccine effects accommodating partial interference and censoring. Biometrics 2021; 78:777-788. [PMID: 33768557 DOI: 10.1111/biom.13459] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 11/10/2020] [Accepted: 03/12/2021] [Indexed: 12/01/2022]
Abstract
Estimating population-level effects of a vaccine is challenging because there may be interference, that is, the outcome of one individual may depend on the vaccination status of another individual. Partial interference occurs when individuals can be partitioned into groups such that interference occurs only within groups. In the absence of interference, inverse probability weighted (IPW) estimators are commonly used to draw inference about causal effects of an exposure or treatment. Tchetgen Tchetgen and VanderWeele proposed a modified IPW estimator for causal effects in the presence of partial interference. Motivated by a cholera vaccine study in Bangladesh, this paper considers an extension of the Tchetgen Tchetgen and VanderWeele IPW estimator to the setting where the outcome is subject to right censoring using inverse probability of censoring weights (IPCW). Censoring weights are estimated using proportional hazards frailty models. The large sample properties of the IPCW estimators are derived, and simulation studies are presented demonstrating the estimators' performance in finite samples. The methods are then used to analyze data from the cholera vaccine study.
Collapse
|
16
|
Assessing exposure effects on gene expression. Genet Epidemiol 2020; 44:601-610. [PMID: 32511796 PMCID: PMC7429346 DOI: 10.1002/gepi.22324] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 04/09/2020] [Accepted: 05/19/2020] [Indexed: 12/26/2022]
Abstract
In observational genomics data sets, there is often confounding of the effect of an exposure on gene expression. To adjust for confounding when estimating the exposure effect, a common approach involves including potential confounders as covariates with the exposure in a regression model of gene expression. However, when the exposure and confounders interact to influence gene expression, the fitted regression model does not necessarily estimate the overall effect of the exposure. Using inverse probability weighting (IPW) or the parametric g-formula in these instances is straightforward to apply and yields consistent effect estimates. IPW can readily be integrated into a genomics data analysis pipeline with upstream data processing and normalization, while the g-formula can be implemented by making simple alterations to the regression model. The regression, IPW, and g-formula approaches to exposure effect estimation are compared herein using simulations; advantages and disadvantages of each approach are explored. The methods are applied to a case study estimating the effect of current smoking on gene expression in adipose tissue.
Collapse
|