1
|
Maghsoudi R, Mirzarezaee M, Sadeghi M, Nadjar-Araabi B. Determining the adjusted initial treatment dose of warfarin anticoagulant medicine using kernel-based support vector regression. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 214:106589. [PMID: 34963093 DOI: 10.1016/j.cmpb.2021.106589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 09/22/2021] [Accepted: 12/14/2021] [Indexed: 06/14/2023]
Abstract
BACKGROUND AND OBJECTIVE A novel research field in bioinformatics is pharmacogenomics and the corresponding applications of artificial intelligence tools. Pharmacogenomics is the study of the relationship between genotype and responses to medical measures such as drug use. One of the most effective drugs is warfarin anticoagulant, but determining its initial treatment dose is challenging. Mistakes in the determination of the initial treatment dose can result directly in patient death. METHODS Some of the most successful techniques for estimating the initial treatment dose are kernel-based methods. However, all the available studies use pre-defined and constant kernels that might not necessarily address the problem's intended requirements. The present study seeks to define and present a new computational kernel extracted from a data set. This process aims to utilize all the data-related statistical features to generate a dose determination tool proportional to the data set with minimum error rate. The kernel-based version of the least square support vector regression estimator was defined. Through this method, a more appropriate approach was proposed for predicting the adjusted dose of warfarin. RESULTS AND CONCLUSION This paper benefits from the International Warfarin Pharmacogenomics Consortium (IWPC) Database. The results obtained in this study demonstrate that the support vector regression with the proposed new kernel can successfully estimate the ideal dosage of warfarin for approximately 68% of patients.
Collapse
Affiliation(s)
- Rouhollah Maghsoudi
- Department of Computer Engineering, Science and Research Branch,Islamic Azad University, Tehran, Iran
| | - Mitra Mirzarezaee
- Department of Computer Engineering, Science and Research Branch,Islamic Azad University, Tehran, Iran.
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| | - Babak Nadjar-Araabi
- School of Electrical and Computer Eng, College of Eng, University of Tehran, Iran
| |
Collapse
|
2
|
Ugarte S, Yarnold P, Ray P, Knopf K, Hoque S, Taylor M, Bennett CL. Maximum Accuracy Machine Learning Statistical Analysis-A Novel Approach. Cancer Treat Res 2022; 184:113-127. [PMID: 36449192 DOI: 10.1007/978-3-031-04402-1_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Logistic regression is a statistical tool of paramount significance in the field of epidemiology1 and ranks as one of the most frequently published multivariable analyses for designs involving a single binary dependent variable and one or more independent variables in the fields of public health2,3 and medical4 research.
Collapse
Affiliation(s)
- Shannon Ugarte
- SONAR (Southern Network on Adverse Reactions) Program, University of South Carolina College of Pharmacy, Columbia, SC, 29208, USA
| | - Paul Yarnold
- SONAR (Southern Network on Adverse Reactions) Program, University of South Carolina College of Pharmacy, Columbia, SC, 29208, USA
| | - Paul Ray
- SONAR (Southern Network on Adverse Reactions) Program, University of South Carolina College of Pharmacy, Columbia, SC, 29208, USA
| | - Kevin Knopf
- SONAR (Southern Network on Adverse Reactions) Program, University of South Carolina College of Pharmacy, Columbia, SC, 29208, USA
| | - Shamia Hoque
- SONAR (Southern Network on Adverse Reactions) Program, University of South Carolina College of Pharmacy, Columbia, SC, 29208, USA
| | - Matthew Taylor
- SONAR (Southern Network on Adverse Reactions) Program, University of South Carolina College of Pharmacy, Columbia, SC, 29208, USA
| | - Charles L Bennett
- SONAR (Southern Network on Adverse Reactions) Program, University of South Carolina College of Pharmacy, Columbia, SC, 29208, USA.
| |
Collapse
|
3
|
Abstract
In research, policy, and practice, continuous variables are often categorized. Statisticians have generally advised against categorization for many reasons, such as loss of information and precision as well as distortion of estimated statistics. Here, a different kind of problem with categorization is considered: the idea that, for a given continuous variable, there is a unique set of cut points that is the objectively correct or best categorization. It is shown that this is unlikely to be the case because categorized variables typically exist in webs of statistical relationships with other variables. The choice of cut points for a categorized variable can influence the values of many statistics relating that variable to others. This essay explores the substantive trade‐offs that can arise between different possible cut points to categorize a continuous variable, making it difficult to say that any particular categorization is objectively best. Limitations of different approaches to selecting cut points are discussed. Contextual trade‐offs may often be an argument against categorization. At the very least, such trade‐offs mean that research inferences, or decisions about policy or practice, that involve categorized variables should be framed and acted upon with flexibility and humility. In practical settings, the choice of cut points for categorizing a continuous variable is likely to entail trade‐offs across multiple statistical relationships between the categorized variable and other variables. These trade‐offs mean that no single categorization is objectively best or correct.
Collapse
Affiliation(s)
- Evan L Busch
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts.,Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts
| |
Collapse
|
4
|
Linden A. Using forecast modelling to evaluate treatment effects in single-group interrupted time series analysis. J Eval Clin Pract 2018; 24:695-700. [PMID: 29749091 DOI: 10.1111/jep.12946] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2018] [Accepted: 04/24/2018] [Indexed: 11/28/2022]
Abstract
RATIONALE, AIMS, AND OBJECTIVES Interrupted time series analysis (ITSA) is an evaluation methodology in which a single treatment unit's outcome is studied serially over time and the intervention is expected to "interrupt" the level and/or trend of that outcome. ITSA is commonly evaluated using methods which may produce biased results if model assumptions are violated. In this paper, treatment effects are alternatively assessed by using forecasting methods to closely fit the preintervention observations and then forecast the post-intervention trend. A treatment effect may be inferred if the actual post-intervention observations diverge from the forecasts by some specified amount. METHOD The forecasting approach is demonstrated using the effect of California's Proposition 99 for reducing cigarette sales. Three forecast models are fit to the preintervention series-linear regression (REG), Holt-Winters (HW) non-seasonal smoothing, and autoregressive moving average (ARIMA)-and forecasts are generated into the post-intervention period. The actual observations are then compared with the forecasts to assess intervention effects. RESULTS The preintervention data were fit best by HW, followed closely by ARIMA. REG fit the data poorly. The actual post-intervention observations were above the forecasts in HW and ARIMA, suggesting no intervention effect, but below the forecasts in the REG (suggesting a treatment effect), thereby raising doubts about any definitive conclusion of a treatment effect. CONCLUSIONS In a single-group ITSA, treatment effects are likely to be biased if the model is misspecified. Therefore, evaluators should consider using forecast models to accurately fit the preintervention data and generate plausible counterfactual forecasts, thereby improving causal inference of treatment effects in single-group ITSA studies.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, San Francisco, CA, USA
| |
Collapse
|
5
|
Linden A, Yarnold PR. Using machine learning to evaluate treatment effects in multiple-group interrupted time series analysis. J Eval Clin Pract 2018; 24:740-744. [PMID: 29888469 DOI: 10.1111/jep.12966] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Accepted: 05/22/2018] [Indexed: 11/29/2022]
Abstract
RATIONALE, AIMS, AND OBJECTIVES Interrupted time series analysis (ITSA) is a popular evaluation methodology in which a single treatment unit's outcome is studied over time, and the intervention is expected to "interrupt" the level and/or trend of the outcome, subsequent to its introduction. The internal validity of this analysis is strengthened considerably if the treated unit is contrasted with a comparable control group. In this paper, we introduce a novel machine learning approach using optimal discriminant analysis (ODA) to evaluate treatment effects in multiple-group ITSA. METHOD We evaluate the effect of California's Proposition 99 (passed in 1988) for reducing cigarette sales, by comparing California (CA) to Montana (MT)-the best matching control state not exposed to any smoking reduction initiatives. We contrast results from ODA to those of ITSA regression (ITSAREG)-a commonly used approach for evaluating treatment effects in ITSA studies. RESULTS Both approaches found CA and MT to be comparable on their preintervention time series, and both approaches equally found CA to have statistically lower cigarette sales in the post-intervention period (P < 0.0001). The ODA model achieved very high effect strength of sensitivity (a measure of classification accuracy) of 91.67%, which remained high (75.00%) after conducting leave-one-out analysis to assess generalizability. CONCLUSIONS The ODA framework achieved results comparable to ITSAREG, bolstering confidence in the intervention effect. In addition, ODA confers several advantages over conventional approaches that may make it a better approach to use in multiple group ITSA studies: insensitivity to skewed data, model-free permutation tests to derive P values, identification of the threshold value which best discriminates intervention and control groups, a chance- and maximum-corrected index of classification accuracy, and cross-validation to assess generalizability.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, San Francisco, California, USA
| | | |
Collapse
|
6
|
Linden A. Using group-based trajectory modelling to enhance causal inference in interrupted time series analysis. J Eval Clin Pract 2018; 24:502-507. [PMID: 29658192 DOI: 10.1111/jep.12934] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Accepted: 03/23/2018] [Indexed: 11/30/2022]
Abstract
RATIONALE, AIMS, AND OBJECTIVES Several enhancements have been proposed for interrupted time series analysis (ITSA) to improve causal inference. Presently, group-based trajectory modelling (GBTM) is introduced as a complement to ITSA. GBTM assumes a certain number of discrete groups in the sample have unique trajectories of the outcome. GBTM is used herein for 2 purposes: (1) to compare outcomes across all trajectory groups via a stand-alone GBTM and (2) to identify comparable non-treated units to serve as controls in the ITSA outcome model. Examples of each are offered. METHOD The effect of California's Proposition 99 (passed in 1988) for reducing cigarette sales is evaluated by comparing California to other states not exposed to smoking reduction initiatives. In the stand-alone GBTM, distinct trajectory groups are identified based on cigarette sales for the entire observation period (1970-2000). In the second approach, a GBTM is generated using only baseline period observations (1970-1988), and treatment effects (difference in post-intervention trends) are then estimated for the treatment unit versus non-treated units in the treated unit's trajectory group. RESULTS In the stand-alone GBTM, 3 distinct trajectory groups were identified: low-decreasing, medium-decreasing, and high-decreasing (California and 26 other states were in the low-decreasing group). When using baseline data for matching, California and 19 non-treated states comprised the low group. California had a significantly larger decrease in post-intervention cigarette sales than these controls (P < 0.01). CONCLUSIONS GBTM enhances ITSA by providing perspective for the outcome trajectory in the treated unit's group versus all others and can identify non-treated units to be used for estimating treatment effects.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, San Francisco, CA, USA
| |
Collapse
|
7
|
Linden A. Using permutation tests to enhance causal inference in interrupted time series analysis. J Eval Clin Pract 2018; 24:496-501. [PMID: 29460383 DOI: 10.1111/jep.12899] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 02/01/2018] [Indexed: 11/28/2022]
Abstract
RATIONALE, AIMS AND OBJECTIVES Interrupted time series analysis (ITSA) is an evaluation methodology in which a single treatment unit's outcome is studied serially over time and the intervention is expected to "interrupt" the level and/or trend of that outcome. The internal validity is strengthened considerably when the treated unit is contrasted with a comparable control group. In this paper, we introduce a robustness check based on permutation tests to further improve causal inference. METHOD We evaluate the effect of California's Proposition 99 for reducing cigarette sales by iteratively casting each nontreated state into the role of "treated," creating a comparable control group using the ITSAMATCH package in Stata, and then evaluating treatment effects using ITSA regression. If statistically significant "treatment effects" are estimated for pseudotreated states, then any significant changes in the outcome of the actual treatment unit (California) cannot be attributed to the intervention. We perform these analyses setting the cutpoint significance level to P > .40 for identifying balanced matches (the highest threshold possible for which controls could still be found for California) and use the difference in differences of trends as the treatment effect estimator. RESULTS Only California attained a statistically significant treatment effect, strengthening confidence in the conclusion that Proposition 99 reduced cigarette sales. CONCLUSIONS The proposed permutation testing framework provides an additional robustness check to either support or refute a treatment effect identified in for the true treated unit in ITSA. Given its value and ease of implementation, this framework should be considered as a standard robustness test in all multiple group interrupted time series analyses.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, San Francisco, CA, USA
| |
Collapse
|
8
|
Linden A. Combining synthetic controls and interrupted time series analysis to improve causal inference in program evaluation. J Eval Clin Pract 2018; 24:447-453. [PMID: 29356225 DOI: 10.1111/jep.12882] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/30/2017] [Accepted: 01/02/2018] [Indexed: 11/27/2022]
Abstract
RATIONALE, AIMS AND OBJECTIVES Interrupted time series analysis (ITSA) is an evaluation methodology in which a single treatment unit's outcome is studied over time and the intervention is expected to "interrupt" the level and/or trend of the outcome. The internal validity is strengthened considerably when the treated unit is contrasted with a comparable control group. In this paper, we introduce a robust evaluation framework that combines the synthetic controls method (SYNTH) to generate a comparable control group and ITSA regression to assess covariate balance and estimate treatment effects. METHODS We evaluate the effect of California's Proposition 99 for reducing cigarette sales, by comparing California to other states not exposed to smoking reduction initiatives. SYNTH is used to reweight nontreated units to make them comparable to the treated unit. These weights are then used in ITSA regression models to assess covariate balance and estimate treatment effects. RESULTS Covariate balance was achieved for all but one covariate. While California experienced a significant decrease in the annual trend of cigarette sales after Proposition 99, there was no statistically significant treatment effect when compared to synthetic controls. CONCLUSIONS The advantage of using this framework over regression alone is that it ensures that a comparable control group is generated. Additionally, it offers a common set of statistical measures familiar to investigators, the capability for assessing covariate balance, and enhancement of the evaluation with a comprehensive set of postestimation measures. Therefore, this robust framework should be considered as a primary approach for evaluating treatment effects in multiple group time series analysis.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, San Francisco, CA, USA
| |
Collapse
|
9
|
Linden A, Yarnold PR. Identifying causal mechanisms in health care interventions using classification tree analysis. J Eval Clin Pract 2018; 24:353-361. [PMID: 29105259 DOI: 10.1111/jep.12848] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 10/05/2017] [Indexed: 11/27/2022]
Abstract
RATIONALE, AIMS, AND OBJECTIVES Mediation analysis identifies causal pathways by testing the relationships between the treatment, the outcome, and an intermediate variable that mediates the relationship between the treatment and outcome. This paper introduces classification tree analysis (CTA), a machine-learning procedure, as an alternative to conventional methods for analysing mediation effects. METHOD Using data from the JOBS II study, we compare CTA to structural equation models (SEMs) by assessing their consistency in revealing mediation effects on 2 outcomes; reemployment (a binary variable) and depressive symptoms (a continuous variable). Because study participants were not randomized sequentially to both treatment and mediator, an additional model was generated including baseline covariates to strengthen the validity of some key identifying assumptions required of all mediation analyses. RESULTS Using SEM, no statistically significant treatment or mediated effects were found for either outcome. In contrast, CTA found a significant treatment effect for reemployment (P = .047) and a mediated pathway for individuals in the treatment group (P = .014). No CTA model could be generated for the reemployment outcome. When covariates were added to the model, CTA found numerous interactions, while SEM found no effects. CONCLUSIONS CTA may uncover mediation effects where conventional approaches do not, because CTA does not require any assumptions about the distribution of variables nor of the functional form of the model, and CTA will systematically identify all statistically viable interactions. The versatility of CTA enables the investigator to explore the theorized underlying causal mechanism of an intervention in a much more comprehensive manner than conventional mediation analytic approaches.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, San Francisco, California, USA
| | | |
Collapse
|
10
|
Linden A. A matching framework to improve causal inference in interrupted time-series analysis. J Eval Clin Pract 2018; 24:408-415. [PMID: 29266646 DOI: 10.1111/jep.12874] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 11/30/2017] [Indexed: 11/30/2022]
Abstract
RATIONALE, AIMS, AND OBJECTIVES Interrupted time-series analysis (ITSA) is a popular evaluation methodology in which a single treatment unit's outcome is studied over time and the intervention is expected to "interrupt" the level and/or trend of the outcome, subsequent to its introduction. When ITSA is implemented without a comparison group, the internal validity may be quite poor. Therefore, adding a comparable control group to serve as the counterfactual is always preferred. This paper introduces a novel matching framework, ITSAMATCH, to create a comparable control group by matching directly on covariates and then use these matches in the outcomes model. METHOD We evaluate the effect of California's Proposition 99 (passed in 1988) for reducing cigarette sales, by comparing California to other states not exposed to smoking reduction initiatives. We compare ITSAMATCH results to 2 commonly used matching approaches, synthetic controls (SYNTH), and regression adjustment; SYNTH reweights nontreated units to make them comparable to the treated unit, and regression adjusts covariates directly. Methods are compared by assessing covariate balance and treatment effects. RESULTS Both ITSAMATCH and SYNTH achieved covariate balance and estimated similar treatment effects. The regression model found no treatment effect and produced inconsistent covariate adjustment. CONCLUSIONS While the matching framework achieved results comparable to SYNTH, it has the advantage of being technically less complicated, while producing statistical estimates that are straightforward to interpret. Conversely, regression adjustment may "adjust away" a treatment effect. Given its advantages, ITSAMATCH should be considered as a primary approach for evaluating treatment effects in multiple-group time-series analysis.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, San Francisco, CA, USA
| |
Collapse
|
11
|
Linden A, Yarnold PR. Estimating causal effects for survival (time-to-event) outcomes by combining classification tree analysis and propensity score weighting. J Eval Clin Pract 2018; 24:380-387. [PMID: 29230910 DOI: 10.1111/jep.12859] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 11/09/2017] [Indexed: 10/18/2022]
Abstract
RATIONALE, AIMS AND OBJECTIVES A common approach to assessing treatment effects in nonrandomized studies with time-to-event outcomes is to estimate propensity scores and compute weights using logistic regression, test for covariate balance, and then estimate treatment effects using Cox regression. A machine-learning alternative-classification tree analysis (CTA)-used to generate propensity scores and to estimate treatment effects in time-to-event data may identify complex relationships between covariates not found using conventional regression-based approaches. METHOD Using empirical data, we identify all statistically valid CTA propensity score models and then use them to compute strata-specific, observation-level propensity score weights that are subsequently applied in outcomes analyses. We compare findings obtained using this framework to the conventional method, by evaluating covariate balance and treatment effect estimates obtained using Cox regression and a weighted CTA outcomes model. RESULTS All models had some imbalanced covariates. Nevertheless, treatment effect estimates were generally consistent across all weighted models. CONCLUSIONS In the study sample, given that all approaches elicited similar results, using CTA increased confidence that bias could not be reduced any further. Because the CTA algorithm identifies all statistically valid propensity score models for a sample, it is most likely to identify a correctly specified propensity score model-and therefore should be used either to confirm results using traditional methods, or to reveal biases that may be missed by traditional methods. Moreover, given that the true treatment effect is never known in observational data, CTA should be considered for estimating outcomes because no statistical assumptions are required.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, San Francisco, California, USA
| | | |
Collapse
|
12
|
Linden A, Yarnold PR. Minimizing imbalances on patient characteristics between treatment groups in randomized trials using classification tree analysis. J Eval Clin Pract 2017; 23:1309-1315. [PMID: 28675602 DOI: 10.1111/jep.12792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Accepted: 06/05/2017] [Indexed: 11/30/2022]
Abstract
RATIONALE, AIMS, AND OBJECTIVES Randomization ensures that treatment groups do not differ systematically in their characteristics, thereby reducing threats to validity that may otherwise explain differences in outcomes. Large observed imbalances in patient characteristics may indicate that selection bias is being introduced into the treatment allocation process. We introduce classification tree analysis (CTA) as a novel algorithmic approach for identifying potential imbalances in characteristics and their interactions when provisionally assigning each new participant to one or the other treatment group. The participant is then permanently assigned to the treatment group that elicits either no or less imbalance than if assigned to the alternate group. METHOD Using data on participant characteristics from a clinical trial, we compare 3 different treatment allocation approaches: permuted block randomization (the original allocation method), minimization, and CTA. Treatment allocation performance is assessed by examining balance of all 17 patient characteristics between study groups for each of the allocation techniques. RESULTS While all 3 treatment allocation techniques achieved excellent balance on main effect variables, Classification tree analysis further identified imbalances on interactions and in the distributions of some of the continuous variables. CONCLUSIONS Classification tree analysis offers an algorithmic procedure that may be used with any randomization methodology to identify and then minimize linear, nonlinear, and interactive effects that induce covariate imbalance between groups. Investigators should consider using the CTA approach as a real-time complement to randomization for any clinical trial to safeguard the treatment allocation process against bias.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, Ann Arbor, Michigan, USA.,Division of General Medicine, Medical School--University of Michigan, Ann Arbor, Michigan, USA
| | | |
Collapse
|
13
|
Linden A, Yarnold PR. Modeling time-to-event (survival) data using classification tree analysis. J Eval Clin Pract 2017; 23:1299-1308. [PMID: 28670833 DOI: 10.1111/jep.12779] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Accepted: 05/10/2017] [Indexed: 11/27/2022]
Abstract
RATIONALE, AIMS, AND OBJECTIVES Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. METHOD Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. RESULTS The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. CONCLUSIONS Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, Ann Arbor, MI, USA.,Division of General Medicine, Medical School, University of Michigan, Ann Arbor, MI, USA
| | | |
Collapse
|
14
|
Kiguradze T, Temps WH, Yarnold PR, Cashy J, Brannigan RE, Nardone B, Micali G, West DP, Belknap SM. Persistent erectile dysfunction in men exposed to the 5α-reductase inhibitors, finasteride, or dutasteride. PeerJ 2017; 5:e3020. [PMID: 28289563 PMCID: PMC5346286 DOI: 10.7717/peerj.3020] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 01/23/2017] [Indexed: 11/20/2022] Open
Abstract
Importance Case reports describe persistent erectile dysfunction (PED) associated with exposure to 5α-reductase inhibitors (5α-RIs). Clinical trial reports and the manufacturers’ full prescribing information (FPI) for finasteride and dutasteride state that risk of sexual adverse effects is not increased by longer duration of 5α-RI exposure and that sexual adverse effects of 5α-RIs resolve in men who discontinue exposure. Objective Our chief objective was to assess whether longer duration of 5α-RI exposure increases risk of PED, independent of age and other known risk factors. Men with shorter 5α-RI exposure served as a comparison control group for those with longer exposure. Design We used a single-group study design and classification tree analysis (CTA) to model PED (lasting ≥90 days after stopping 5α-RI). Covariates included subject attributes, diseases, and drug exposures associated with sexual dysfunction. Setting Our data source was the electronic medical record data repository for Northwestern Medicine. Subjects The analysis cohorts comprised all men exposed to finasteride or dutasteride or combination products containing one of these drugs, and the subgroup of men 16–42 years old and exposed to finasteride ≤1.25 mg/day. Main outcome and measures Our main outcome measure was diagnosis of PED beginning after first 5α-RI exposure, continuing for at least 90 days after stopping 5α-RI, and with contemporaneous treatment with a phosphodiesterase-5 inhibitor (PDE5I). Other outcome measures were erectile dysfunction (ED) and low libido. PED was determined by manual review of medical narratives for all subjects with ED. Risk of an adverse effect was expressed as number needed to harm (NNH). Results Among men with 5α-RI exposure, 167 of 11,909 (1.4%) developed PED (persistence median 1,348 days after stopping 5α-RI, interquartile range (IQR) 631.5–2320.5 days); the multivariable model predicting PED had four variables: prostate disease, duration of 5α-RI exposure, age, and nonsteroidal anti-inflammatory drug (NSAID) use. Of 530 men with new ED, 167 (31.5%) had new PED. Men without prostate disease who combined NSAID use with >208.5 days of 5α-RI exposure had 4.8-fold higher risk of PED than men with shorter exposure (NNH 59.8, all p < 0.002). Among men 16–42 years old and exposed to finasteride ≤1.25 mg/day, 34 of 4,284 (0.8%) developed PED (persistence median 1,534 days, IQR 651–2,351 days); the multivariable model predicting PED had one variable: duration of 5α-RI exposure. Of 103 young men with new ED, 34 (33%) had new PED. Young men with >205 days of finasteride exposure had 4.9-fold higher risk of PED (NNH 108.2, p < 0.004) than men with shorter exposure. Conclusion and relevance Risk of PED was higher in men with longer exposure to 5α-RIs. Among young men, longer exposure to finasteride posed a greater risk of PED than all other assessed risk factors.
Collapse
Affiliation(s)
- Tina Kiguradze
- Department of Dermatology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - William H Temps
- Department of Dermatology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | | | - John Cashy
- Department of Urology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.,Department of Medicine, Division of General Internal Medicine and Geriatrics, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Robert E Brannigan
- Department of Urology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Beatrice Nardone
- Department of Dermatology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Giuseppe Micali
- Department of Dermatology, Faculty of Medicine and Surgery, University of Catania, Catania, Italy
| | - Dennis Paul West
- Department of Dermatology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Steven M Belknap
- Department of Dermatology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.,Department of Medicine, Division of General Internal Medicine and Geriatrics, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| |
Collapse
|
15
|
Linden A, Yarnold PR. Using machine learning to identify structural breaks in single-group interrupted time series designs. J Eval Clin Pract 2016; 22:851-855. [PMID: 27091355 DOI: 10.1111/jep.12544] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 03/23/2016] [Indexed: 11/28/2022]
Abstract
RATIONALE, AIMS AND OBJECTIVES Single-group interrupted time series analysis (ITSA) is a popular evaluation methodology in which a single unit of observation is being studied, the outcome variable is serially ordered as a time series and the intervention is expected to 'interrupt' the level and/or trend of the time series, subsequent to its introduction. Given that the internal validity of the design rests on the premise that the interruption in the time series is associated with the introduction of the treatment, treatment effects may seem less plausible if a parallel trend already exists in the time series prior to the actual intervention. Thus, sensitivity analyses should focus on detecting structural breaks in the time series before the intervention. METHOD In this paper, we introduce a machine-learning algorithm called optimal discriminant analysis (ODA) as an approach to determine if structural breaks can be identified in years prior to the initiation of the intervention, using data from California's 1988 voter-initiated Proposition 99 to reduce smoking rates. RESULTS The ODA analysis indicates that numerous structural breaks occurred prior to the actual initiation of Proposition 99 in 1989, including perfect structural breaks in 1983 and 1985, thereby casting doubt on the validity of treatment effects estimated for the actual intervention when using a single-group ITSA design. CONCLUSIONS Given the widespread use of ITSA for evaluating observational data and the increasing use of machine-learning techniques in traditional research, we recommend that structural break sensitivity analysis is routinely incorporated in all research using the single-group ITSA design.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, Ann Arbor, MI, USA.,Division of General Medicine, Medical School, University of Michigan, Ann Arbor, MI, USA
| | | |
Collapse
|
16
|
Linden A, Yarnold PR, Nallamothu BK. Using machine learning to model dose-response relationships. J Eval Clin Pract 2016; 22:856-863. [PMID: 27240883 DOI: 10.1111/jep.12573] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Accepted: 05/03/2016] [Indexed: 11/27/2022]
Abstract
RATIONALE, AIMS AND OBJECTIVES Establishing the relationship between various doses of an exposure and a response variable is integral to many studies in health care. Linear parametric models, widely used for estimating dose-response relationships, have several limitations. This paper employs the optimal discriminant analysis (ODA) machine-learning algorithm to determine the degree to which exposure dose can be distinguished based on the distribution of the response variable. By framing the dose-response relationship as a classification problem, machine learning can provide the same functionality as conventional models, but can additionally make individual-level predictions, which may be helpful in practical applications like establishing responsiveness to prescribed drug regimens. METHOD Using data from a study measuring the responses of blood flow in the forearm to the intra-arterial administration of isoproterenol (separately for 9 black and 13 white men, and pooled), we compare the results estimated from a generalized estimating equations (GEE) model with those estimated using ODA. RESULTS Generalized estimating equations and ODA both identified many statistically significant dose-response relationships, separately by race and for pooled data. Post hoc comparisons between doses indicated ODA (based on exact P values) was consistently more conservative than GEE (based on estimated P values). Compared with ODA, GEE produced twice as many instances of paradoxical confounding (findings from analysis of pooled data that are inconsistent with findings from analyses stratified by race). CONCLUSIONS Given its unique advantages and greater analytic flexibility, maximum-accuracy machine-learning methods like ODA should be considered as the primary analytic approach in dose-response applications.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, Ann Arbor, MI, USA.,Division of General Medicine, Medical School-University of Michigan, Ann Arbor, MI, USA
| | - Paul R Yarnold
- Optimal Data Analysis, LLC, Chicago, IL, USA.,Southern Network on Adverse Reactions (SONAR), College of Pharmacy, University of South Carolina, Columbia, SC, USA
| | - Brahmajee K Nallamothu
- Division of Cardiovascular Diseases, Department of Internal Medicine, Medical School-University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
17
|
Linden A, Yarnold PR. Combining machine learning and propensity score weighting to estimate causal effects in multivalued treatments. J Eval Clin Pract 2016; 22:871-881. [PMID: 27421786 DOI: 10.1111/jep.12610] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Accepted: 06/27/2016] [Indexed: 12/30/2022]
Abstract
RATIONALE, AIMS AND OBJECTIVES Interventions with multivalued treatments are common in medical and health research; examples include comparing the efficacy of competing interventions and contrasting various doses of a drug. In recent years, there has been growing interest in the development of methods that estimate multivalued treatment effects using observational data. This paper extends a previously described analytic framework for evaluating binary treatments to studies involving multivalued treatments utilizing a machine learning algorithm called optimal discriminant analysis (ODA). METHOD We describe the differences between regression-based treatment effect estimators and effects estimated using the ODA framework. We then present an empirical example using data from an intervention including three study groups to compare corresponding effects. RESULTS The regression-based estimators produced statistically significant mean differences between the two intervention groups, and between one of the treatment groups and controls. In contrast, ODA was unable to discriminate between distributions of any of the three study groups. CONCLUSIONS Optimal discriminant analysis offers an appealing alternative to conventional regression-based models for estimating effects in multivalued treatment studies because of its insensitivity to skewed data and use of accuracy measures applicable to all prognostic analyses. If these analytic approaches produce consistent treatment effect P values, this bolsters confidence in the validity of the results. If the approaches produce conflicting treatment effect P values, as they do in our empirical example, the investigator should consider the ODA-derived estimates to be most robust, given that ODA uses permutation P values that require no distributional assumptions and are thus, always valid.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, Ann Arbor, Michigan, USA.,Division of General Medicine, Medical School - University of Michigan, Ann Arbor, Michigan, USA
| | | |
Collapse
|
18
|
Linden A, Yarnold PR. Combining machine learning and matching techniques to improve causal inference in program evaluation. J Eval Clin Pract 2016; 22:864-870. [PMID: 27353301 DOI: 10.1111/jep.12592] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Accepted: 05/30/2016] [Indexed: 11/30/2022]
Abstract
RATIONALE, AIMS AND OBJECTIVES Program evaluations often utilize various matching approaches to emulate the randomization process for group assignment in experimental studies. Typically, the matching strategy is implemented, and then covariate balance is assessed before estimating treatment effects. This paper introduces a novel analytic framework utilizing a machine learning algorithm called optimal discriminant analysis (ODA) for assessing covariate balance and estimating treatment effects, once the matching strategy has been implemented. This framework holds several key advantages over the conventional approach: application to any variable metric and number of groups; insensitivity to skewed data or outliers; and use of accuracy measures applicable to all prognostic analyses. Moreover, ODA accepts analytic weights, thereby extending the methodology to any study design where weights are used for covariate adjustment or more precise (differential) outcome measurement. METHOD One-to-one matching on the propensity score was used as the matching strategy. Covariate balance was assessed using standardized difference in means (conventional approach) and measures of classification accuracy (ODA). Treatment effects were estimated using ordinary least squares regression and ODA. RESULTS Using empirical data, ODA produced results highly consistent with those obtained via the conventional methodology for assessing covariate balance and estimating treatment effects. CONCLUSIONS When ODA is combined with matching techniques within a treatment effects framework, the results are consistent with conventional approaches. However, given that it provides additional dimensions and robustness to the analysis versus what can currently be achieved using conventional approaches, ODA offers an appealing alternative.
Collapse
Affiliation(s)
- Ariel Linden
- Linden Consulting Group, LLC, Ann Arbor, MI, USA.,Division of General Medicine, Medical School, University of Michigan, Ann Arbor, MI, USA
| | - Paul R Yarnold
- Optimal Data Analysis, LLC, Chicago, IL, USA.,Southern Network on Adverse Reactions (SONAR), College of Pharmacy, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
19
|
Bryant FB. Enhancing predictive accuracy and reproducibility in clinical evaluation research: Commentary on the special section of the Journal of Evaluation in Clinical Practice. J Eval Clin Pract 2016; 22:829-834. [PMID: 27870286 DOI: 10.1111/jep.12669] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Accepted: 09/05/2016] [Indexed: 12/19/2022]
Abstract
This paper introduces a special section of the current issue of the Journal of Evaluation in Clinical Practice that includes a set of 6 empirical articles showcasing a versatile, new machine-learning statistical method, known as optimal data (or discriminant) analysis (ODA), specifically designed to produce statistical models that maximize predictive accuracy. As this set of papers clearly illustrates, ODA offers numerous important advantages over traditional statistical methods-advantages that enhance the validity and reproducibility of statistical conclusions in empirical research. This issue of the journal also includes a review of a recently published book that provides a comprehensive introduction to the logic, theory, and application of ODA in empirical research. It is argued that researchers have much to gain by using ODA to analyze their data.
Collapse
Affiliation(s)
- Fred B Bryant
- Professor, Department of Psychology, Loyola University Chicago, Chicago, Illinois, USA
| |
Collapse
|