1
|
Rabideau DJ, Li F, Wang R. Multiply robust generalized estimating equations for cluster randomized trials with missing outcomes. Stat Med 2024; 43:1458-1474. [PMID: 38488532 DOI: 10.1002/sim.10027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 01/04/2024] [Accepted: 01/18/2024] [Indexed: 03/19/2024]
Abstract
Generalized estimating equations (GEEs) provide a useful framework for estimating marginal regression parameters based on data from cluster randomized trials (CRTs), but they can result in inaccurate parameter estimates when some outcomes are informatively missing. Existing techniques to handle missing outcomes in CRTs rely on correct specification of a propensity score model, a covariate-conditional mean outcome model, or require at least one of these two models to be correct, which can be challenging in practice. In this article, we develop new weighted GEEs to simultaneously estimate the marginal mean, scale, and correlation parameters in CRTs with missing outcomes, allowing for multiple propensity score models and multiple covariate-conditional mean models to be specified. The resulting estimators are consistent provided that any one of these models is correct. An iterative algorithm is provided for implementing this more robust estimator and practical considerations for specifying multiple models are discussed. We evaluate the performance of the proposed method through Monte Carlo simulations and apply the proposed multiply robust estimator to analyze the Botswana Combination Prevention Project, a large HIV prevention CRT designed to evaluate whether a combination of HIV-prevention measures can reduce HIV incidence.
Collapse
Affiliation(s)
- Dustin J Rabideau
- Biostatistics, Massachusetts General Hospital, Boston, Massachusetts, USA
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA
| | - Fan Li
- Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut, USA
- Center for Methods in Implementation and Prevention Science, Yale University, New Haven, Connecticut, USA
| | - Rui Wang
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
2
|
Offorha BC, Walters SJ, Jacques RM. Analysing cluster randomised controlled trials using GLMM, GEE1, GEE2, and QIF: results from four case studies. BMC Med Res Methodol 2023; 23:293. [PMID: 38093221 PMCID: PMC10717070 DOI: 10.1186/s12874-023-02107-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 11/17/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND Using four case studies, we aim to provide practical guidance and recommendations for the analysis of cluster randomised controlled trials. METHODS Four modelling approaches (Generalized Linear Mixed Models with parameters estimated by maximum likelihood/restricted maximum likelihood; Generalized Linear Models with parameters estimated by Generalized Estimating Equations (1st order or second order) and Quadratic Inference Function, for analysing correlated individual participant level outcomes in cluster randomised controlled trials were identified after we reviewed the literature. We systematically searched the online bibliography databases of MEDLINE, EMBASE, PsycINFO (via OVID), CINAHL (via EBSCO), and SCOPUS. We identified the above-mentioned four statistical analytical approaches and applied them to four case studies of cluster randomised controlled trials with the number of clusters ranging from 10 to 100, and individual participants ranging from 748 to 9,207. Results were obtained for both continuous and binary outcomes using R and SAS statistical packages. RESULTS The intracluster correlation coefficient (ICC) estimates for the case studies were less than 0.05 and are consistent with the observed ICC values commonly reported in primary care and community-based cluster randomised controlled trials. In most cases, the four methods produced similar results. However, in a few analyses, quadratic inference function produced different results compared to the generalized linear mixed model, first-order generalized estimating equations, and second-order generalized estimating equations, especially in trials with small to moderate numbers of clusters. CONCLUSION This paper demonstrates the analysis of cluster randomised controlled trials with four modelling approaches. The results obtained were similar in most cases, however, for trials with few clusters we do recommend that the quadratic inference function should be used with caution, and where possible a small sample correction should be used. The generalisability of our results is limited to studies with similar features to our case studies, for example, studies with a similar-sized ICC. It is important to conduct simulation studies to comprehensively evaluate the performance of the four modelling approaches.
Collapse
Affiliation(s)
- Bright C Offorha
- Division of Population Health, School of Medicine & Population Health, University of Sheffield, Sheffield, UK.
| | - Stephen J Walters
- Division of Population Health, School of Medicine & Population Health, University of Sheffield, Sheffield, UK
| | - Richard M Jacques
- Division of Population Health, School of Medicine & Population Health, University of Sheffield, Sheffield, UK
| |
Collapse
|
3
|
Kang C, Zhang D, Schuster J, Kogan J, Nikolajski C, Reynolds CF. Bias-corrected and doubly robust inference for the three-level longitudinal cluster-randomized trials with missing continuous outcomes and small number of clusters: Simulation study and application to a study for adults with serious mental illnesses. Contemp Clin Trials Commun 2023; 35:101194. [PMID: 37588771 PMCID: PMC10425901 DOI: 10.1016/j.conctc.2023.101194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 04/21/2023] [Accepted: 07/19/2023] [Indexed: 08/18/2023] Open
Abstract
Longitudinal cluster-randomized designs have been popular tools for comparative effective research in clinical trials. The methodologies for the three-level hierarchical design with longitudinal outcomes need to be better understood under more pragmatic settings; that is, with a small number of clusters, heterogeneous cluster sizes, and missing outcomes. Generalized estimating equations (GEEs) have been frequently used when the distribution of data and the correlation model are unknown. Standard GEEs lead to bias and an inflated type I error rate due to the small number of available clinics and non-completely random missing data in longitudinal outcomes. We evaluate the performance of inverse probability weighted (IPW) estimating equations, with and without augmentation, for two types of missing data in continuous outcomes and individual-level treatment allocation mechanisms combined with two bias-corrected variance estimators. Our intensive simulation results suggest that the proposed augmented IPW method with bias-corrected variance estimation successfully prevents the inflation of false positive findings and improves efficiency when the number of clinics is small, with moderate to severe missing outcomes. Our findings are expected to aid researchers in choosing appropriate analysis methods for three-level longitudinal cluster-randomized designs. The proposed approaches were applied to analyze data from a longitudinal cluster-randomized clinical trial involving adults with serious mental illnesses.
Collapse
Affiliation(s)
- Chaeryon Kang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Di Zhang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | | | - Jane Kogan
- UPMC Center for High-Value Health Care, Pittsburgh, PA 15219, USA
| | - Cara Nikolajski
- UPMC Center for High-Value Health Care, Pittsburgh, PA 15219, USA
| | - Charles F. Reynolds
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| |
Collapse
|
4
|
Chang CR, Song Y, Li F, Wang R. Covariate adjustment in randomized clinical trials with missing covariate and outcome data. Stat Med 2023; 42:3919-3935. [PMID: 37394874 DOI: 10.1002/sim.9840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 04/27/2023] [Accepted: 06/15/2023] [Indexed: 07/04/2023]
Abstract
When analyzing data from randomized clinical trials, covariate adjustment can be used to account for chance imbalance in baseline covariates and to increase precision of the treatment effect estimate. A practical barrier to covariate adjustment is the presence of missing data. In this article, in the light of recent theoretical advancement, we first review several covariate adjustment methods with incomplete covariate data. We investigate the implications of the missing data mechanism on estimating the average treatment effect in randomized clinical trials with continuous or binary outcomes. In parallel, we consider settings where the outcome data are fully observed or are missing at random; in the latter setting, we propose a full weighting approach that combines inverse probability weighting for adjusting missing outcomes and overlap weighting for covariate adjustment. We highlight the importance of including the interaction terms between the missingness indicators and covariates as predictors in the models. We conduct comprehensive simulation studies to examine the finite-sample performance of the proposed methods and compare with a range of common alternatives. We find that conducting the proposed adjustment methods generally improves the precision of treatment effect estimates regardless of the imputation methods when the adjusted covariate is associated with the outcome. We apply the methods to the Childhood Adenotonsillectomy Trial to assess the effect of adenotonsillectomy on neurocognitive functioning scores.
Collapse
Affiliation(s)
- Chia-Rui Chang
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Yue Song
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Fan Li
- Department of Statistical Science, Duke University, Durham, North Carolina, USA
| | - Rui Wang
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
5
|
Harrison LJ, Wang R. Sample size calculation for randomized trials via inverse probability of response weighting when outcome data are missing at random. Stat Med 2023; 42:1802-1821. [PMID: 36880120 PMCID: PMC10368173 DOI: 10.1002/sim.9700] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 02/19/2023] [Accepted: 02/21/2023] [Indexed: 03/08/2023]
Abstract
Randomized trials are an established method to evaluate the causal effects of interventions. Despite concerted efforts to retain all trial participants, some missing outcome data are often inevitable. It is unclear how best to account for missing outcome data in sample size calculations. A standard approach is to inflate the sample size by the inverse of one minus the anticipated dropout probability. However, the performance of this approach in the presence of informative outcome missingness has not been well-studied. We investigate sample size calculation when outcome data are missing at random given the randomized intervention group and fully observed baseline covariates under an inverse probability of response weighted (IPRW) estimating equations approach. Using M-estimation theory, we derive sample size formulas for both individually randomized and cluster randomized trials (CRTs). We illustrate the proposed method by calculating a sample size for a CRT designed to detect a difference in HIV testing strategies under an IPRW approach. We additionally develop an R shiny app to facilitate implementation of the sample size formulas.
Collapse
Affiliation(s)
- Linda J Harrison
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, USA
| | - Rui Wang
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, USA
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, USA
| |
Collapse
|
6
|
Wang MH, Staples P, Prague M, Goyal R, DeGruttola V, Onnela JP. Leveraging Contact Network Information in Clustered Randomized Studies of Contagion Processes. OBSERVATIONAL STUDIES 2023; 9:157-175. [PMID: 37325081 PMCID: PMC10270696 DOI: 10.1353/obs.2023.0021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
In a randomized study, leveraging covariates related to the outcome (e.g. disease status) may produce less variable estimates of the effect of exposure. For contagion processes operating on a contact network, transmission can only occur through ties that connect affected and unaffected individuals; the outcome of such a process is known to depend intimately on the structure of the network. In this paper, we investigate the use of contact network features as efficiency covariates in exposure effect estimation. Using augmented generalized estimating equations (GEE), we estimate how gains in efficiency depend on the network structure and spread of the contagious agent or behavior. We apply this approach to simulated randomized trials using a stochastic compartmental contagion model on a collection of model-based contact networks and compare the bias, power, and variance of the estimated exposure effects using an assortment of network covariate adjustment strategies. We also demonstrate the use of network-augmented GEEs on a clustered randomized trial evaluating the effects of wastewater monitoring on COVID-19 cases in residential buildings at the the University of California San Diego.
Collapse
|
7
|
Balzer LB, van der Laan M, Ayieko J, Kamya M, Chamie G, Schwab J, Havlir DV, Petersen ML. Two-Stage TMLE to reduce bias and improve efficiency in cluster randomized trials. Biostatistics 2021; 24:502-517. [PMID: 34939083 PMCID: PMC10102904 DOI: 10.1093/biostatistics/kxab043] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 10/19/2021] [Accepted: 11/15/2021] [Indexed: 11/14/2022] Open
Abstract
Cluster randomized trials (CRTs) randomly assign an intervention to groups of individuals (e.g., clinics or communities) and measure outcomes on individuals in those groups. While offering many advantages, this experimental design introduces challenges that are only partially addressed by existing analytic approaches. First, outcomes are often missing for some individuals within clusters. Failing to appropriately adjust for differential outcome measurement can result in biased estimates and inference. Second, CRTs often randomize limited numbers of clusters, resulting in chance imbalances on baseline outcome predictors between arms. Failing to adaptively adjust for these imbalances and other predictive covariates can result in efficiency losses. To address these methodological gaps, we propose and evaluate a novel two-stage targeted minimum loss-based estimator to adjust for baseline covariates in a manner that optimizes precision, after controlling for baseline and postbaseline causes of missing outcomes. Finite sample simulations illustrate that our approach can nearly eliminate bias due to differential outcome measurement, while existing CRT estimators yield misleading results and inferences. Application to real data from the SEARCH community randomized trial demonstrates the gains in efficiency afforded through adaptive adjustment for baseline covariates, after controlling for missingness on individual-level outcomes.
Collapse
Affiliation(s)
- Laura B Balzer
- Department of Biostatistics & Epidemiology, University of Massachusetts Amherst, 715 North Pleasant St, Amherst, MA, USA
| | - Mark van der Laan
- Division of Biostatistics, University of California Berkeley, 2121 Berkeley Way, Berkeley, CA, USA
| | - James Ayieko
- Center for Microbiology Research, Kenya Medical Research Institute, P.O. BOX 54840 00200 Off Raila Odinga Way, Nairobi, Kenya
| | - Moses Kamya
- Department of Medicine, Makerere University and the Infectious Diseases Research Collaboration, P.O Box 7475, Kampala, Uganda
| | - Gabriel Chamie
- Department of Medicine, University of California San Francisco, 995 Potrero Ave, San Francisco, CA, USA
| | - Joshua Schwab
- Division of Biostatistics, University of California Berkeley, 2121 Berkeley Way, Berkeley, CA, USA
| | - Diane V Havlir
- Department of Medicine, University of California San Francisco, 995 Potrero Ave, San Francisco, CA, USA
| | - Maya L Petersen
- Division of Biostatistics, University of California Berkeley, 2121 Berkeley Way, Berkeley, CA, USA
| |
Collapse
|
8
|
Chen T, Tchetgen EJT, Wang R. A stochastic second-order generalized estimating equations approach for estimating association parameters. J Comput Graph Stat 2020; 29:547-561. [PMID: 33041613 PMCID: PMC7540735 DOI: 10.1080/10618600.2019.1710156] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 11/26/2019] [Accepted: 12/20/2019] [Indexed: 10/25/2022]
Abstract
Design and analysis of cluster randomized trials must take into account the intraclass correlation coefficient (ICC), which quantifies the correlation among outcomes from the same cluster. Second-order generalized estimating equations (GEE2) provides a statistically robust way in estimating this quantity and other association parameters. However, GEE2 becomes computationally infeasible as cluster sizes grow. This paper proposes a stochastic variant to fitting GEE2 which alleviates reliance on parameter starting values and provides substantially faster speeds and higher convergence rates than the widely used deterministic Newton-Raphson method. We also propose new estimators for the ICC which account for informative missing outcome data through the use of GEE2, for which we incorporate a "second-order" inverse probability weighting scheme and "second-order" doubly robust (DR) estimating equations that guard against partial model misspecification. Our proposed methods are evaluated through simulations and applied to data from a cluster randomized trial in Bangladesh evaluating the effect of different marketing interventions on the use of hygienic latrines.
Collapse
Affiliation(s)
- Tom Chen
- Department of Population Medicine, Harvard Pilgrim Health
Care Institute and Harvard Medical School
| | | | - Rui Wang
- Department of Population Medicine, Harvard Pilgrim Health
Care Institute and Harvard Medical School
- Department of Biostatistics, Harvard T.H.Chan School of
Public Health
| |
Collapse
|
9
|
Turner EL, Yao L, Li F, Prague M. Properties and pitfalls of weighting as an alternative to multilevel multiple imputation in cluster randomized trials with missing binary outcomes under covariate-dependent missingness. Stat Methods Med Res 2019; 29:1338-1353. [DOI: 10.1177/0962280219859915] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
The generalized estimating equation (GEE) approach can be used to analyze cluster randomized trial data to obtain population-averaged intervention effects. However, most cluster randomized trials have some missing outcome data and a GEE analysis of available data may be biased when outcome data are not missing completely at random. Although multilevel multiple imputation for GEE (MMI-GEE) has been widely used, alternative approaches such as weighted GEE are less common in practice. Using both simulations and a real data example, we evaluate the performance of inverse probability weighted GEE vs. MMI-GEE for binary outcomes. Simulated data are generated assuming a covariate-dependent missing data pattern across a range of missingness clustering (from none to high), where all covariates are measured at baseline and are fully observed (i.e. a type of missing-at-random mechanism). Two types of weights are estimated and used in the weighted GEE: (1) assuming no clustering of missingness (W-GEE) and (2) accounting for such clustering (CW-GEE). Results show that, even in settings with high missingness clustering, CW-GEE can lead to more bias and lower coverage than W-GEE, whereas W-GEE and MMI-GEE provide comparable results. W-GEE should be considered a viable strategy to account for missing outcomes in cluster randomized trials.
Collapse
Affiliation(s)
- Elizabeth L Turner
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
- Duke Global Health Institute, Duke University, Durham, NC, USA
| | - Lanqiu Yao
- Department of Population Health, New York University, New York, NY, USA
| | - Fan Li
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Melanie Prague
- INRIA SISTM, Inserm U1219 Bordeaux Population Health, Université Bordeaux, ISPED, Bordeaux, France
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
10
|
Balzer LB, Zheng W, van der Laan MJ, Petersen ML. A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure. Stat Methods Med Res 2019; 28:1761-1780. [PMID: 29921160 PMCID: PMC6173669 DOI: 10.1177/0962280218774936] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment.
Collapse
Affiliation(s)
- Laura B Balzer
- Department of Biostatistics & Epidemiology, School of Public Health & Health Sciences, University of Massachusetts, Amherst, MA, USA
| | | | - Mark J van der Laan
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA
| | - Maya L Petersen
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA
| |
Collapse
|
11
|
Xu C, Li Z, Xue Y, Zhang L, Wang M. An R package for model fitting, model selection and the simulation for longitudinal data with dropout missingness. COMMUN STAT-SIMUL C 2018; 48:2812-2829. [PMID: 32346220 PMCID: PMC7188076 DOI: 10.1080/03610918.2018.1468457] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 03/08/2018] [Accepted: 04/15/2018] [Indexed: 01/10/2023]
Abstract
Missing data arise frequently in clinical and epidemiological fields, in particular in longitudinal studies. This paper describes the core features of an R package wgeesel, which implements marginal model fitting (i.e., weighted generalized estimating equations, WGEE; doubly robust GEE) for longitudinal data with dropouts under the assumption of missing at random. More importantly, this package comprehensively provide existing information criteria for WGEE model selection on marginal mean or correlation structures. Also, it can serve as a valuable tool for simulating longitudinal data with missing outcomes. Lastly, a real data example and simulations are presented to illustrate and validate our package.
Collapse
Affiliation(s)
- Cong Xu
- Vertex Pharmaceuticals, Boston, Massachusetts, USA
| | - Zheng Li
- Department of Public Health Sciences, Division of Biostatistics and Bioinformatics, College of Medicine, Penn State Hershey Medical Center, Hershey, Pennsylvania, USA
| | - Yuan Xue
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Lijun Zhang
- Department of Biochemistry and Molecular Biology, Institute of Personalized Medicine, Penn State Hershey Medical Center, Hershey, Pennsylvania, USA
| | - Ming Wang
- Department of Public Health Sciences, Division of Biostatistics and Bioinformatics, College of Medicine, Penn State Hershey Medical Center, Hershey, Pennsylvania, USA
| |
Collapse
|
12
|
Wang R, De Gruttola V. The use of permutation tests for the analysis of parallel and stepped-wedge cluster-randomized trials. Stat Med 2017; 36:2831-2843. [PMID: 28464567 PMCID: PMC5507602 DOI: 10.1002/sim.7329] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Revised: 04/05/2017] [Accepted: 04/10/2017] [Indexed: 11/07/2022]
Abstract
We investigate the use of permutation tests for the analysis of parallel and stepped-wedge cluster-randomized trials. Permutation tests for parallel designs with exponential family endpoints have been extensively studied. The optimal permutation tests developed for exponential family alternatives require information on intraclass correlation, a quantity not yet defined for time-to-event endpoints. Therefore, it is unclear how efficient permutation tests can be constructed for cluster-randomized trials with such endpoints. We consider a class of test statistics formed by a weighted average of pair-specific treatment effect estimates and offer practical guidance on the choice of weights to improve efficiency. We apply the permutation tests to a cluster-randomized trial evaluating the effect of an intervention to reduce the incidence of hospital-acquired infection. In some settings, outcomes from different clusters may be correlated, and we evaluate the validity and efficiency of permutation test in such settings. Lastly, we propose a permutation test for stepped-wedge designs and compare its performance with mixed-effect modeling and illustrate its superiority when sample sizes are small, the underlying distribution is skewed, or there is correlation across clusters. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Rui Wang
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, 401 Park Drive, Suite 401 East, Boston, MA 02215, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA
| | - Victor De Gruttola
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA
| |
Collapse
|
13
|
Turner EL, Prague M, Gallis JA, Li F, Murray DM. Review of Recent Methodological Developments in Group-Randomized Trials: Part 2-Analysis. Am J Public Health 2017; 107:1078-1086. [PMID: 28520480 PMCID: PMC5463203 DOI: 10.2105/ajph.2017.303707] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/05/2017] [Indexed: 12/13/2022]
Abstract
In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have updated that review with developments in analysis of the past 13 years, with a companion article to focus on developments in design. We discuss developments in the topics of the earlier review (e.g., methods for parallel-arm GRTs, individually randomized group-treatment trials, and missing data) and in new topics, including methods to account for multiple-level clustering and alternative estimation methods (e.g., augmented generalized estimating equations, targeted maximum likelihood, and quadratic inference functions). In addition, we describe developments in analysis of alternative group designs (including stepped-wedge GRTs, network-randomized trials, and pseudocluster randomized trials), which require clustering to be accounted for in their design and analysis.
Collapse
Affiliation(s)
- Elizabeth L Turner
- Elizabeth L. Turner and John A. Gallis are with the Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, and the Duke Global Health Institute, Duke University. Melanie Prague is with the Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, and Inria, project team SISTM, Bordeaux, France. Fan Li is with the Department of Biostatistics and Bioinformatics, Duke University. David M. Murray is with the Office of Disease Prevention, Division of Program Coordination and Strategic Planning, and the Office of the Director, National Institutes of Health, Rockville, MD
| | - Melanie Prague
- Elizabeth L. Turner and John A. Gallis are with the Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, and the Duke Global Health Institute, Duke University. Melanie Prague is with the Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, and Inria, project team SISTM, Bordeaux, France. Fan Li is with the Department of Biostatistics and Bioinformatics, Duke University. David M. Murray is with the Office of Disease Prevention, Division of Program Coordination and Strategic Planning, and the Office of the Director, National Institutes of Health, Rockville, MD
| | - John A Gallis
- Elizabeth L. Turner and John A. Gallis are with the Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, and the Duke Global Health Institute, Duke University. Melanie Prague is with the Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, and Inria, project team SISTM, Bordeaux, France. Fan Li is with the Department of Biostatistics and Bioinformatics, Duke University. David M. Murray is with the Office of Disease Prevention, Division of Program Coordination and Strategic Planning, and the Office of the Director, National Institutes of Health, Rockville, MD
| | - Fan Li
- Elizabeth L. Turner and John A. Gallis are with the Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, and the Duke Global Health Institute, Duke University. Melanie Prague is with the Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, and Inria, project team SISTM, Bordeaux, France. Fan Li is with the Department of Biostatistics and Bioinformatics, Duke University. David M. Murray is with the Office of Disease Prevention, Division of Program Coordination and Strategic Planning, and the Office of the Director, National Institutes of Health, Rockville, MD
| | - David M Murray
- Elizabeth L. Turner and John A. Gallis are with the Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, and the Duke Global Health Institute, Duke University. Melanie Prague is with the Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, and Inria, project team SISTM, Bordeaux, France. Fan Li is with the Department of Biostatistics and Bioinformatics, Duke University. David M. Murray is with the Office of Disease Prevention, Division of Program Coordination and Strategic Planning, and the Office of the Director, National Institutes of Health, Rockville, MD
| |
Collapse
|