1
|
Xue W, Zhang X, Chan KCG, Wong RKW. RKHS-based covariate balancing for survival causal effect estimation. LIFETIME DATA ANALYSIS 2024; 30:34-58. [PMID: 36821062 DOI: 10.1007/s10985-023-09590-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 01/10/2023] [Indexed: 06/18/2023]
Abstract
Survival causal effect estimation based on right-censored data is of key interest in both survival analysis and causal inference. Propensity score weighting is one of the most popular methods in the literature. However, since it involves the inverse of propensity score estimates, its practical performance may be very unstable, especially when the covariate overlap is limited between treatment and control groups. To address this problem, a covariate balancing method is developed in this paper to estimate the counterfactual survival function. The proposed method is nonparametric and balances covariates in a reproducing kernel Hilbert space (RKHS) via weights that are counterparts of inverse propensity scores. The uniform rate of convergence for the proposed estimator is shown to be the same as that for the classical Kaplan-Meier estimator. The appealing practical performance of the proposed method is demonstrated by a simulation study as well as two real data applications to study the causal effect of smoking on survival time of stroke patients and that of endotoxin on survival time for female patients with lung cancer respectively.
Collapse
Affiliation(s)
- Wu Xue
- Meta Platforms Inc., Menlo Park, CA, 94025, USA
| | - Xiaoke Zhang
- Department of Statistics, George Washington University, Washington, DC, 20052, USA.
| | | | - Raymond K W Wong
- Department of Statistics, Texas A &M University, College Station, TX, 77843, USA
| |
Collapse
|
2
|
Rohani R, Yarnold PR, Scheetz MH, Neely MN, Kang M, Donnelly HK, Dedicatoria K, Nozick SH, Medernach RL, Hauser AR, Ozer EA, Diaz E, Misharin AV, Wunderink RG, Rhodes NJ. Individual meropenem epithelial lining fluid and plasma PK/PD target attainment. Antimicrob Agents Chemother 2023; 67:e0072723. [PMID: 37975660 PMCID: PMC10720524 DOI: 10.1128/aac.00727-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/15/2023] [Indexed: 11/19/2023] Open
Abstract
It is unclear whether plasma is a reliable surrogate for target attainment in the epithelial lining fluid (ELF). The objective of this study was to characterize meropenem target attainment in plasma and ELF using prospective samples. The first 24-hour T>MIC was evaluated vs 1xMIC and 4xMIC targets at the patient (i.e., fixed MIC of 2 mg/L) and population [i.e., cumulative fraction of response (CFR) according to EUCAST MIC distributions] levels for both plasma and ELF. Among 65 patients receiving ≥24 hours of treatment, 40% of patients failed to achieve >50% T>4xMIC in plasma and ELF, and 30% of patients who achieved >50% T>4xMIC in plasma had <50% T>4xMIC in ELF. At 1xMIC and 4xMIC targets, 3% and 25% of patients with >95% T>MIC in plasma had <50% T>MIC in ELF, respectively. Those with a CRCL >115 mL/min were less likely to achieve >50%T>4xMIC in ELF (P < 0.025). In the population, CFR for Escherichia coli at 1xMIC and 4xMIC was >97%. For Pseudomonas aeruginosa, CFR was ≥90% in plasma and ranged 80%-85% in ELF at 1xMIC when a loading dose was applied. CFR was reduced in plasma (range: 75%-81%) and ELF (range: 44%-60%) in the absence of a loading dose at 1xMIC. At 4xMIC, CFR for P. aeruginosa was 60%-86% with a loading dose and 18%-62% without a loading dose. We found that plasma overestimated ELF target attainment inup to 30% of meropenem-treated patients, CRCL >115 mL/min decreased target attainment in ELF, and loading doses increased CFR in the population.
Collapse
Affiliation(s)
- Roxane Rohani
- Discipline of Cellular and Molecular Pharmacology, Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, Illinois, USA
| | | | - Marc H. Scheetz
- Department of Pharmacy Practice, Midwestern University, Chicago College of Pharmacy, Downers Grove, Illinois, USA
- Pharmacometrics Center of Excellence, Midwestern University, Downers Grove, Illinois, USA
- Department of Pharmacy, Northwestern Memorial Hospital, Chicago, Illinois, USA
| | - Michael N. Neely
- Laboratory of Applied Pharmacokinetics and Bioinformatics, The Saban Research Institute, Children’s Hospital of Los Angeles, Los Angeles, California, USA
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Mengjia Kang
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Helen K. Donnelly
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Kay Dedicatoria
- Department of Pharmacy Practice, Midwestern University, Chicago College of Pharmacy, Downers Grove, Illinois, USA
| | - Sophie H. Nozick
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Rachel L. Medernach
- Division of Infectious Diseases, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Alan R. Hauser
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
- Division of Infectious Diseases, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Egon A. Ozer
- Division of Infectious Diseases, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
- Center for Pathogen Genomics and Microbial Evolution, Havey Institute for Global Health, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Estefani Diaz
- Robert H. Lurie Comprehensive Cancer Research Center, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Alexander V. Misharin
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Richard G. Wunderink
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Nathaniel J. Rhodes
- Department of Pharmacy Practice, Midwestern University, Chicago College of Pharmacy, Downers Grove, Illinois, USA
- Pharmacometrics Center of Excellence, Midwestern University, Downers Grove, Illinois, USA
- Department of Pharmacy, Northwestern Memorial Hospital, Chicago, Illinois, USA
| |
Collapse
|
3
|
Zang C, Zhang H, Xu J, Zhang H, Fouladvand S, Havaldar S, Cheng F, Chen K, Chen Y, Glicksberg BS, Chen J, Bian J, Wang F. High-throughput target trial emulation for Alzheimer's disease drug repurposing with real-world data. Nat Commun 2023; 14:8180. [PMID: 38081829 PMCID: PMC10713627 DOI: 10.1038/s41467-023-43929-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Accepted: 11/24/2023] [Indexed: 12/18/2023] Open
Abstract
Target trial emulation is the process of mimicking target randomized trials using real-world data, where effective confounding control for unbiased treatment effect estimation remains a main challenge. Although various approaches have been proposed for this challenge, a systematic evaluation is still lacking. Here we emulated trials for thousands of medications from two large-scale real-world data warehouses, covering over 10 years of clinical records for over 170 million patients, aiming to identify new indications of approved drugs for Alzheimer's disease. We assessed different propensity score models under the inverse probability of treatment weighting framework and suggested a model selection strategy for improved baseline covariate balancing. We also found that the deep learning-based propensity score model did not necessarily outperform logistic regression-based methods in covariate balancing. Finally, we highlighted five top-ranked drugs (pantoprazole, gabapentin, atorvastatin, fluticasone, and omeprazole) originally intended for other indications with potential benefits for Alzheimer's patients.
Collapse
Affiliation(s)
- Chengxi Zang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY, USA
| | - Hao Zhang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Jie Xu
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Hansi Zhang
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Sajjad Fouladvand
- Institude for Biomedical Informatics (IBI) and Department of Computer Science, University of Kentucky, Lexington, KY, USA
| | - Shreyas Havaldar
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Kun Chen
- Department of Statistics, University of Connecticut, Storrs, CT, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics (DBEI), the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Benjamin S Glicksberg
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jin Chen
- Institude for Biomedical Informatics (IBI) and Department of Computer Science, University of Kentucky, Lexington, KY, USA
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
4
|
Zhao Y, Yu Y, Wang H, Li Y, Deng Y, Jiang G, Luo Y. Machine Learning in Causal Inference: Application in Pharmacovigilance. Drug Saf 2022; 45:459-476. [PMID: 35579811 PMCID: PMC9114053 DOI: 10.1007/s40264-022-01155-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/09/2022] [Indexed: 01/28/2023]
Abstract
Monitoring adverse drug events or pharmacovigilance has been promoted by the World Health Organization to assure the safety of medicines through a timely and reliable information exchange regarding drug safety issues. We aim to discuss the application of machine learning methods as well as causal inference paradigms in pharmacovigilance. We first reviewed data sources for pharmacovigilance. Then, we examined traditional causal inference paradigms, their applications in pharmacovigilance, and how machine learning methods and causal inference paradigms were integrated to enhance the performance of traditional causal inference paradigms. Finally, we summarized issues with currently mainstream correlation-based machine learning models and how the machine learning community has tried to address these issues by incorporating causal inference paradigms. Our literature search revealed that most existing data sources and tasks for pharmacovigilance were not designed for causal inference. Additionally, pharmacovigilance was lagging in adopting machine learning-causal inference integrated models. We highlight several currently trending directions or gaps to integrate causal inference with machine learning in pharmacovigilance research. Finally, our literature search revealed that the adoption of causal paradigms can mitigate known issues with machine learning models. We foresee that the pharmacovigilance domain can benefit from the progress in the machine learning field.
Collapse
Affiliation(s)
- Yiqing Zhao
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N Lake Shore Drive, Room 11-189, Chicago, IL, 60611, USA
| | - Yue Yu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, 55902, USA
| | - Hanyin Wang
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N Lake Shore Drive, Room 11-189, Chicago, IL, 60611, USA
| | - Yikuan Li
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N Lake Shore Drive, Room 11-189, Chicago, IL, 60611, USA
| | - Yu Deng
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N Lake Shore Drive, Room 11-189, Chicago, IL, 60611, USA
| | - Guoqian Jiang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, 55902, USA
| | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 750 N Lake Shore Drive, Room 11-189, Chicago, IL, 60611, USA.
| |
Collapse
|
5
|
Characterizing Risk Factors for Clostridioides difficile Infection among Hospitalized Patients with Community-Acquired Pneumonia. Antimicrob Agents Chemother 2021; 65:e0041721. [PMID: 33875439 DOI: 10.1128/aac.00417-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Hospitalized patients with community-acquired pneumonia (CAP) are at risk of developing Clostridioides difficile infection (CDI). We developed and tested clinical decision rules for identifying CDI risk in this patient population. The study was a single-center retrospective, case-control analysis of hospitalized adult patients empirically treated for CAP between 1 January 2014 and 3 March 2018. Differences between cases (CDI diagnosed within 180 days following admission) and controls (no test result indicating CDI during the study period) with respect to prehospitalization variables were modeled to generate propensity scores. Postadmission variables were used to predict case status on each postadmission day where (i) ≥1 additional case was identified and (ii) each model stratum contained ≥15 subjects. Models were developed and tested using optimal discriminant analysis and classification tree analysis. Forty-four cases and 181 controls were included. The median time to diagnosis was 50 days postadmission. After weighting, three models were identified (20, 117, and 165 days postadmission). The day 20 model yielded the greatest (weighted [w]) accuracy (weighted area under the receiver operating characteristic curve [wROC area] = 0.826) and the highest chance-corrected accuracy (weighted effect strength for sensitivity [wESS] = 65.3). Having a positive culture (odds, 1:4; P = 0.001), receipt of ceftriaxone plus azithromycin for a defined infection (odds, 3:5; P = 0.006), and continuation of empirical broad-spectrum antibiotics with activity against P. aeruginosa when no pathogen was identified (odds, 1:8; P = 0.013) were associated with CDI on day 20. Three models were identified that accurately predicted CDI in hospitalized patients treated for CAP. Antibiotic use increased the risk of CDI in all models, underscoring the importance of antibiotic stewardship.
Collapse
|
6
|
Ferri-García R, Rueda MDM. Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys. PLoS One 2020; 15:e0231500. [PMID: 32320429 PMCID: PMC7176094 DOI: 10.1371/journal.pone.0231500] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 03/24/2020] [Indexed: 11/18/2022] Open
Abstract
Modern survey methods may be subject to non-observable bias, from various sources. Among online surveys, for example, selection bias is prevalent, due to the sampling mechanism commonly used, whereby participants self-select from a subgroup whose characteristics differ from those of the target population. Several techniques have been proposed to tackle this issue. One such is Propensity Score Adjustment (PSA), which is widely used and has been analysed in various studies. The usual method of estimating the propensity score is logistic regression, which requires a reference probability sample in addition to the online nonprobability sample. The predicted propensities can be used for reweighting using various estimators. However, in the online survey context, there are alternatives that might outperform logistic regression regarding propensity estimation. The aim of the present study is to determine the efficiency of some of these alternatives, involving Machine Learning (ML) classification algorithms. PSA is applied in two simulation scenarios, representing situations commonly found in online surveys, using logistic regression and ML models for propensity estimation. The results obtained show that ML algorithms remove selection bias more effectively than logistic regression when used for PSA, but that their efficacy depends largely on the selection mechanism employed and the dimensionality of the data.
Collapse
Affiliation(s)
- Ramón Ferri-García
- Department of Statistics and Operations Research, Faculty of Sciences, University of Granada, Granada, Spain
| | - María del Mar Rueda
- Department of Statistics and Operations Research, Faculty of Sciences, University of Granada, Granada, Spain
| |
Collapse
|
7
|
Schneeweiss S. Automated data-adaptive analytics for electronic healthcare data to study causal treatment effects. Clin Epidemiol 2018; 10:771-788. [PMID: 30013400 PMCID: PMC6039060 DOI: 10.2147/clep.s166545] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Decision makers in health care increasingly rely on nonrandomized database analyses to assess the effectiveness, safety, and value of medical products. Health care data scientists use data-adaptive approaches that automatically optimize confounding control to study causal treatment effects. This article summarizes relevant experiences and extensions. METHODS The literature was reviewed on the uses of high-dimensional propensity score (HDPS) and related approaches for health care database analyses, including methodological articles on their performance and improvement. Articles were grouped into applications, comparative performance studies, and statistical simulation experiments. RESULTS The HDPS algorithm has been referenced frequently with a variety of clinical applications and data sources from around the world. The appeal of HDPS for database research rests in 1) its superior performance in situations of unobserved confounding through proxy adjustment, 2) its predictable efficiency in extracting confounding information from a given data source, 3) its ability to automate estimation of causal treatment effects to the extent achievable in a given data source, and 4) its independence of data source and coding system. Extensions of the HDPS approach have focused on improving variable selection when exposure is sparse, using free text information and time-varying confounding adjustment. CONCLUSION Semiautomated and optimized confounding adjustment in health care database analyses has proven successful across a wide range of settings. Machine-learning extensions further automate its use in estimating causal treatment effects across a range of data scenarios.
Collapse
Affiliation(s)
- Sebastian Schneeweiss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital,
- Harvard Medical School, Boston, MA, USA,
| |
Collapse
|
8
|
Linden A, Yarnold PR. Identifying causal mechanisms in health care interventions using classification tree analysis. J Eval Clin Pract 2018; 24:353-361. [PMID: 29105259 DOI: 10.1111/jep.12848] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 10/05/2017] [Indexed: 11/27/2022]
Abstract
RATIONALE, AIMS, AND OBJECTIVES Mediation analysis identifies causal pathways by testing the relationships between the treatment, the outcome, and an intermediate variable that mediates the relationship between the treatment and outcome. This paper introduces classification tree analysis (CTA), a machine-learning procedure, as an alternative to conventional methods for analysing mediation effects. METHOD Using data from the JOBS II study, we compare CTA to structural equation models (SEMs) by assessing their consistency in revealing mediation effects on 2 outcomes; reemployment (a binary variable) and depressive symptoms (a continuous variable). Because study participants were not randomized sequentially to both treatment and mediator, an additional model was generated including baseline covariates to strengthen the validity of some key identifying assumptions required of all mediation analyses. RESULTS Using SEM, no statistically significant treatment or mediated effects were found for either outcome. In contrast, CTA found a significant treatment effect for reemployment (P = .047) and a mediated pathway for individuals in the treatment group (P = .014). No CTA model could be generated for the reemployment outcome. When covariates were added to the model, CTA found numerous interactions, while SEM found no effects. CONCLUSIONS CTA may uncover mediation effects where conventional approaches do not, because CTA does not require any assumptions about the distribution of variables nor of the functional form of the model, and CTA will systematically identify all statistically viable interactions. The versatility of CTA enables the investigator to explore the theorized underlying causal mechanism of an intervention in a much more comprehensive manner than conventional mediation analytic approaches.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, San Francisco, California, USA
| | | |
Collapse
|
9
|
Linden A, Yarnold PR. Estimating causal effects for survival (time-to-event) outcomes by combining classification tree analysis and propensity score weighting. J Eval Clin Pract 2018; 24:380-387. [PMID: 29230910 DOI: 10.1111/jep.12859] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 11/09/2017] [Indexed: 10/18/2022]
Abstract
RATIONALE, AIMS AND OBJECTIVES A common approach to assessing treatment effects in nonrandomized studies with time-to-event outcomes is to estimate propensity scores and compute weights using logistic regression, test for covariate balance, and then estimate treatment effects using Cox regression. A machine-learning alternative-classification tree analysis (CTA)-used to generate propensity scores and to estimate treatment effects in time-to-event data may identify complex relationships between covariates not found using conventional regression-based approaches. METHOD Using empirical data, we identify all statistically valid CTA propensity score models and then use them to compute strata-specific, observation-level propensity score weights that are subsequently applied in outcomes analyses. We compare findings obtained using this framework to the conventional method, by evaluating covariate balance and treatment effect estimates obtained using Cox regression and a weighted CTA outcomes model. RESULTS All models had some imbalanced covariates. Nevertheless, treatment effect estimates were generally consistent across all weighted models. CONCLUSIONS In the study sample, given that all approaches elicited similar results, using CTA increased confidence that bias could not be reduced any further. Because the CTA algorithm identifies all statistically valid propensity score models for a sample, it is most likely to identify a correctly specified propensity score model-and therefore should be used either to confirm results using traditional methods, or to reveal biases that may be missed by traditional methods. Moreover, given that the true treatment effect is never known in observational data, CTA should be considered for estimating outcomes because no statistical assumptions are required.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, San Francisco, California, USA
| | | |
Collapse
|
10
|
Linden A, Yarnold PR. Minimizing imbalances on patient characteristics between treatment groups in randomized trials using classification tree analysis. J Eval Clin Pract 2017; 23:1309-1315. [PMID: 28675602 DOI: 10.1111/jep.12792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Accepted: 06/05/2017] [Indexed: 11/30/2022]
Abstract
RATIONALE, AIMS, AND OBJECTIVES Randomization ensures that treatment groups do not differ systematically in their characteristics, thereby reducing threats to validity that may otherwise explain differences in outcomes. Large observed imbalances in patient characteristics may indicate that selection bias is being introduced into the treatment allocation process. We introduce classification tree analysis (CTA) as a novel algorithmic approach for identifying potential imbalances in characteristics and their interactions when provisionally assigning each new participant to one or the other treatment group. The participant is then permanently assigned to the treatment group that elicits either no or less imbalance than if assigned to the alternate group. METHOD Using data on participant characteristics from a clinical trial, we compare 3 different treatment allocation approaches: permuted block randomization (the original allocation method), minimization, and CTA. Treatment allocation performance is assessed by examining balance of all 17 patient characteristics between study groups for each of the allocation techniques. RESULTS While all 3 treatment allocation techniques achieved excellent balance on main effect variables, Classification tree analysis further identified imbalances on interactions and in the distributions of some of the continuous variables. CONCLUSIONS Classification tree analysis offers an algorithmic procedure that may be used with any randomization methodology to identify and then minimize linear, nonlinear, and interactive effects that induce covariate imbalance between groups. Investigators should consider using the CTA approach as a real-time complement to randomization for any clinical trial to safeguard the treatment allocation process against bias.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, Ann Arbor, Michigan, USA.,Division of General Medicine, Medical School--University of Michigan, Ann Arbor, Michigan, USA
| | | |
Collapse
|
11
|
Linden A, Yarnold PR. Modeling time-to-event (survival) data using classification tree analysis. J Eval Clin Pract 2017; 23:1299-1308. [PMID: 28670833 DOI: 10.1111/jep.12779] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Accepted: 05/10/2017] [Indexed: 11/27/2022]
Abstract
RATIONALE, AIMS, AND OBJECTIVES Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. METHOD Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. RESULTS The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. CONCLUSIONS Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, Ann Arbor, MI, USA.,Division of General Medicine, Medical School, University of Michigan, Ann Arbor, MI, USA
| | | |
Collapse
|