1
|
Multiobjective tree-based reinforcement learning for estimating tolerant dynamic treatment regimes. Biometrics 2024; 80:ujad017. [PMID: 38364801 PMCID: PMC10871869 DOI: 10.1093/biomtc/ujad017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 09/16/2023] [Accepted: 11/17/2023] [Indexed: 02/18/2024]
Abstract
A dynamic treatment regime (DTR) is a sequence of treatment decision rules that dictate individualized treatments based on evolving treatment and covariate history. It provides a vehicle for optimizing a clinical decision support system and fits well into the broader paradigm of personalized medicine. However, many real-world problems involve multiple competing priorities, and decision rules differ when trade-offs are present. Correspondingly, there may be more than one feasible decision that leads to empirically sufficient optimization. In this paper, we propose a concept of "tolerant regime," which provides a set of individualized feasible decision rules under a prespecified tolerance rate. A multiobjective tree-based reinforcement learning (MOT-RL) method is developed to directly estimate the tolerant DTR (tDTR) that optimizes multiple objectives in a multistage multitreatment setting. At each stage, MOT-RL constructs an unsupervised decision tree by modeling the counterfactual mean outcome of each objective via semiparametric regression and maximizing a purity measure constructed by the scalarized augmented inverse probability weighted estimators (SAIPWE). The algorithm is implemented in a backward inductive manner through multiple decision stages, and it estimates the optimal DTR and tDTR depending on the decision-maker's preferences. Multiobjective tree-based reinforcement learning is robust, efficient, easy-to-interpret, and flexible to different settings. We apply MOT-RL to evaluate 2-stage chemotherapy regimes that reduce disease burden and prolong survival for advanced prostate cancer patients using a dataset collected at MD Anderson Cancer Center.
Collapse
|
2
|
Nonparametric Bayesian Q-learning for optimization of dynamic treatment regimes in the presence of partial compliance. Stat Methods Med Res 2023; 32:1649-1663. [PMID: 37322885 DOI: 10.1177/09622802231181223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Existing methods for estimation of dynamic treatment regimes are mostly limited to intention-to-treat analyses-which estimate the effect of randomization to a particular treatment regime without considering the compliance behavior of patients. In this article, we propose a novel nonparametric Bayesian Q-learning approach to construct optimal sequential treatment regimes that adjust for partial compliance. We consider the popular potential compliance framework, where some potential compliances are latent and need to be imputed. The key challenge is learning the joint distribution of the potential compliances, which we accomplish using a Dirichlet process mixture model. Our approach provides two kinds of treatment regimes: (1) conditional regimes that depend on the potential compliance values; and (2) marginal regimes where the potential compliances are marginalized. Extensive simulation studies highlight the usefulness of our method compared to intention-to-treat analyses. We apply our method to the Adaptive Treatment for Alcohol and Cocaine Dependence (ENGAGE) study , where the goal is to construct optimal treatment regimes to engage patients in therapy.
Collapse
|
3
|
Estimating individualized treatment rules in longitudinal studies with covariate-driven observation times. Stat Methods Med Res 2023; 32:868-884. [PMID: 36927216 PMCID: PMC10248307 DOI: 10.1177/09622802231158733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
The sequential treatment decisions made by physicians to treat chronic diseases are formalized in the statistical literature as dynamic treatment regimes. To date, methods for dynamic treatment regimes have been developed under the assumption that observation times, that is, treatment and outcome monitoring times, are determined by study investigators. That assumption is often not satisfied in electronic health records data in which the outcome, the observation times, and the treatment mechanism are associated with patients' characteristics. The treatment and observation processes can lead to spurious associations between the treatment of interest and the outcome to be optimized under the dynamic treatment regime if not adequately considered in the analysis. We address these associations by incorporating two inverse weights that are functions of a patient's covariates into dynamic weighted ordinary least squares to develop optimal single stage dynamic treatment regimes, known as individualized treatment rules. We show empirically that our methodology yields consistent, multiply robust estimators. In a cohort of new users of antidepressant drugs from the United Kingdom's Clinical Practice Research Datalink, the proposed method is used to develop an optimal treatment rule that chooses between two antidepressants to optimize a utility function related to the change in body mass index.
Collapse
|
4
|
Optimal Treatment Regimes: A Review and Empirical Comparison. Int Stat Rev 2023. [DOI: 10.1111/insr.12536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
|
5
|
Step-adjusted tree-based reinforcement learning for evaluating nested dynamic treatment regimes using test-and-treat observational data. Stat Med 2021; 40:6164-6177. [PMID: 34490942 DOI: 10.1002/sim.9177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 07/31/2021] [Accepted: 08/09/2021] [Indexed: 11/08/2022]
Abstract
Dynamic treatment regimes (DTRs) include a sequence of treatment decision rules, in which treatment is adapted over time in response to the changes in an individual's disease progression and health care history. In medical practice, nested test-and-treat strategies are common to improve cost-effectiveness. For example, for patients at risk of prostate cancer, only patients who have high prostate-specific antigen (PSA) need a biopsy, which is costly and invasive, to confirm the diagnosis and help determine the treatment if needed. A decision about treatment happens after the biopsy, and is thus nested within the decision of whether to do the test. However, current existing statistical methods are not able to accommodate such a naturally embedded property of the treatment decision within the test decision. Therefore, we developed a new statistical learning method, step-adjusted tree-based reinforcement learning, to evaluate DTRs within such a nested multistage dynamic decision framework using observational data. At each step within each stage, we combined the robust semiparametric estimation via augmented inverse probability weighting with a tree-based reinforcement learning method to deal with the counterfactual optimization. The simulation studies demonstrated robust performance of the proposed methods under different scenarios. We further applied our method to evaluate the necessity of prostate biopsy and identify the optimal test-and-treat regimes for prostate cancer patients using data from the Johns Hopkins University prostate cancer active surveillance dataset.
Collapse
|
6
|
Adaptive randomization in a two-stage sequential multiple assignment randomized trial. Biostatistics 2021; 23:1182-1199. [PMID: 34052847 DOI: 10.1093/biostatistics/kxab020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 04/22/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
Sequential multiple assignment randomized trials (SMARTs) are systematic and efficient media for comparing dynamic treatment regimes (DTRs), where each patient is involved in multiple stages of treatment with the randomization at each stage depending on the patient's previous treatment history and interim outcomes. Generally, patients enrolled in SMARTs are randomized equally to ethically acceptable treatment options regardless of how effective those treatments were during the previous stages, which results in some undesirable consequences in practice, such as low recruitment, less retention, and lower treatment adherence. In this article, we propose a response-adaptive SMART (RA-SMART) design where the allocation probabilities are imbalanced in favor of more promising treatments based on the accumulated information on treatment efficacy from previous patients and stages. The operating characteristics of the RA-SMART design relative to SMART design, including the consistency and efficiency of estimated response rate under each DTR, the power of identifying the optimal DTR, and the number of patients treated with the optimal and the worst DTRs, are assessed through extensive simulation studies. Some practical suggestions are discussed in the conclusion.
Collapse
|
7
|
A Quantitative Paradigm for Decision-Making in Precision Oncology. Trends Cancer 2021; 7:293-300. [PMID: 33637444 DOI: 10.1016/j.trecan.2021.01.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 01/16/2021] [Accepted: 01/20/2021] [Indexed: 11/24/2022]
Abstract
The complexity and variability of cancer progression necessitate a quantitative paradigm for therapeutic decision-making that is dynamic, personalized, and capable of identifying optimal treatment strategies for individual patients under substantial uncertainty. Here, we discuss the core components and challenges of such an approach and highlight the need for comprehensive longitudinal clinical and molecular data integration in its development. We describe the complementary and varied roles of mathematical modeling and machine learning in constructing dynamic optimal cancer treatment strategies and highlight the potential of reinforcement learning approaches in this endeavor.
Collapse
|
8
|
|
9
|
Joint modeling and multiple comparisons with the best of data from a SMART with survival outcomes. Biostatistics 2020; 23:294-313. [PMID: 32659784 PMCID: PMC9770092 DOI: 10.1093/biostatistics/kxaa025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Revised: 03/19/2020] [Accepted: 03/19/2020] [Indexed: 12/25/2022] Open
Abstract
A dynamic treatment regimen (DTR) is a sequence of decision rules that can alter treatments or doses based on outcomes from prior treatment. In the case of two lines of treatment, a DTR specifies first-line treatment, and second-line treatment for responders and treatment for non-responders to the first-line treatment. A sequential, multiple assignment, randomized trial (SMART) is one such type of trial that has been designed to assess DTRs. The primary goal of our project is to identify the treatments, covariates, and their interactions result in the best overall survival rate. Many previously proposed methods to analyze data with survival outcomes from a SMART use inverse probability weighting and provide non-parametric estimation of survival rates, but no other information. Other methods have been proposed to identify and estimate the optimal DTR, but inference issues were seldom addressed. We apply a joint modeling approach to provide unbiased survival estimates as a mechanism to quantify baseline and time-varying covariate effects, treatment effects, and their interactions within regimens. The issue of multiple comparisons at specific time points is addressed using multiple comparisons with the best method.
Collapse
|
10
|
Abstract
Precision medicine seeks to maximize the quality of healthcare by individualizing the healthcare process to the uniquely evolving health status of each patient. This endeavor spans a broad range of scientific areas including drug discovery, genetics/genomics, health communication, and causal inference all in support of evidence-based, i.e., data-driven, decision making. Precision medicine is formalized as a treatment regime which comprises a sequence of decision rules, one per decision point, which map up-to-date patient information to a recommended action. The potential actions could be the selection of which drug to use, the selection of dose, timing of administration, specific diet or exercise recommendation, or other aspects of treatment or care. Statistics research in precision medicine is broadly focused on methodological development for estimation of and inference for treatment regimes which maximize some cumulative clinical outcome. In this review, we provide an overview of this vibrant area of research and present important and emerging challenges.
Collapse
|
11
|
Abstract
Medical therapy often consists of multiple stages, with a treatment chosen by the physician at each stage based on the patient's history of treatments and clinical outcomes. These decisions can be formalized as a dynamic treatment regime. This paper describes a new approach for optimizing dynamic treatment regimes that bridges the gap between Bayesian inference and existing approaches, like Q-learning. The proposed approach fits a series of Bayesian regression models, one for each stage, in reverse sequential order. Each model uses as a response variable the remaining payoff assuming optimal actions are taken at subsequent stages, and as covariates the current history and relevant actions at that stage. The key difficulty is that the optimal decision rules at subsequent stages are unknown, and even if these decision rules were known the relevant response variables may be counterfactual. However, posterior distributions can be derived from the previously fitted regression models for the optimal decision rules and the counterfactual response variables under a particular set of rules. The proposed approach averages over these posterior distributions when fitting each regression model. An efficient sampling algorithm for estimation is presented, along with simulation studies that compare the proposed approach with Q-learning.
Collapse
|
12
|
Abstract
Dynamic treatment regimes (DTRs) are sequences of treatment decision rules, in which treatment may be adapted over time in response to the changing course of an individual. Motivated by the substance use disorder (SUD) study, we propose a tree-based reinforcement learning (T-RL) method to directly estimate optimal DTRs in a multi-stage multi-treatment setting. At each stage, T-RL builds an unsupervised decision tree that directly handles the problem of optimization with multiple treatment comparisons, through a purity measure constructed with augmented inverse probability weighted estimators. For the multiple stages, the algorithm is implemented recursively using backward induction. By combining semiparametric regression with flexible tree-based learning, T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs, as shown in the simulation studies. With the proposed method, we identify dynamic SUD treatment regimes for adolescents.
Collapse
|
13
|
A Bayesian analysis of small n sequential multiple assignment randomized trials (snSMARTs). Stat Med 2018; 37:3723-3732. [DOI: 10.1002/sim.7900] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2017] [Revised: 03/23/2018] [Accepted: 06/08/2018] [Indexed: 11/10/2022]
|
14
|
Sequential, Multiple Assignment, Randomized Trial Designs in Immuno-oncology Research. Clin Cancer Res 2017; 24:730-736. [PMID: 28835379 DOI: 10.1158/1078-0432.ccr-17-1355] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Revised: 07/03/2017] [Accepted: 08/17/2017] [Indexed: 01/13/2023]
Abstract
Clinical trials investigating immune checkpoint inhibitors have led to the approval of anti-CTLA-4 (cytotoxic T-lymphocyte antigen-4), anti-PD-1 (programmed death-1), and anti-PD-L1 (PD-ligand 1) drugs by the FDA for numerous tumor types. In the treatment of metastatic melanoma, combinations of checkpoint inhibitors are more effective than single-agent inhibitors, but combination immunotherapy is associated with increased frequency and severity of toxicity. There are questions about the use of combination immunotherapy or single-agent anti-PD-1 as initial therapy and the number of doses of either approach required to sustain a response. In this article, we describe a novel use of sequential, multiple assignment, randomized trial (SMART) design to evaluate immune checkpoint inhibitors to find treatment regimens that adapt within an individual based on intermediate response and lead to the longest overall survival. We provide a hypothetical example SMART design for BRAF wild-type metastatic melanoma as a framework for investigating immunotherapy treatment regimens. We compare implementing a SMART design to implementing multiple traditional randomized clinical trials. We illustrate the benefits of a SMART over traditional trial designs and acknowledge the complexity of a SMART. SMART designs may be an optimal way to find treatment strategies that yield durable response, longer survival, and lower toxicity. Clin Cancer Res; 24(4); 730-6. ©2017 AACR.
Collapse
|
15
|
Robust treatment comparison based on utilities of semi-competing risks in non-small-cell lung cancer. J Am Stat Assoc 2017; 112:11-23. [PMID: 28943681 PMCID: PMC5607962 DOI: 10.1080/01621459.2016.1176926] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 01/01/2016] [Indexed: 12/25/2022]
Abstract
A design is presented for a randomized clinical trial comparing two second-line treatments, chemotherapy versus chemotherapy plus reirradiation, for treatment of recurrent non-small-cell lung cancer. The central research question is whether the potential efficacy benefit that adding reirradiation to chemotherapy may provide justifies its potential for increasing the risk of toxicity. The design uses two co-primary outcomes: time to disease progression or death, and time to severe toxicity. Because patients may be given an active third-line treatment at disease progression that confounds second-line treatment effects on toxicity and survival following disease progression, for the purpose of this comparative study follow-up ends at disease progression or death. In contrast, follow-up for disease progression or death continues after severe toxicity, so these are semi-competing risks. A conditionally conjugate Bayesian model that is robust to misspecification is formulated using piecewise exponential distributions. A numerical utility function is elicited from the physicians that characterizes desirabilities of the possible co-primary outcome realizations. A comparative test based on posterior mean utilities is proposed. A simulation study is presented to evaluate test performance for a variety of treatment differences, and a sensitivity assessment to the elicited utility function is performed. General guidelines are given for constructing a design in similar settings, and a computer program for simulation and trial conduct is provided.
Collapse
|
16
|
Semiparametric Single-Index Model for Estimating Optimal Individualized Treatment Strategy. Electron J Stat 2017; 11:364-384. [PMID: 28959371 PMCID: PMC5612500 DOI: 10.1214/17-ejs1226] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Different from the standard treatment discovery framework which is used for finding single treatments for a homogenous group of patients, personalized medicine involves finding therapies that are tailored to each individual in a heterogeneous group. In this paper, we propose a new semiparametric additive single-index model for estimating individualized treatment strategy. The model assumes a flexible and nonparametric link function for the interaction between treatment and predictive covariates. We estimate the rule via monotone B-splines and establish the asymptotic properties of the estimators. Both simulations and an real data application demonstrate that the proposed method has a competitive performance.
Collapse
|
17
|
Bayesian Nonparametric Estimation for Dynamic Treatment Regimes with Sequential Transition Times. J Am Stat Assoc 2016; 111:921-935. [PMID: 28018015 PMCID: PMC5175473 DOI: 10.1080/01621459.2015.1086353] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2014] [Revised: 06/01/2015] [Indexed: 10/23/2022]
Abstract
We analyze a dataset arising from a clinical trial involving multi-stage chemotherapy regimes for acute leukemia. The trial design was a 2 × 2 factorial for frontline therapies only. Motivated by the idea that subsequent salvage treatments affect survival time, we model therapy as a dynamic treatment regime (DTR), that is, an alternating sequence of adaptive treatments or other actions and transition times between disease states. These sequences may vary substantially between patients, depending on how the regime plays out. To evaluate the regimes, mean overall survival time is expressed as a weighted average of the means of all possible sums of successive transitions times. We assume a Bayesian nonparametric survival regression model for each transition time, with a dependent Dirichlet process prior and Gaussian process base measure (DDP-GP). Posterior simulation is implemented by Markov chain Monte Carlo (MCMC) sampling. We provide general guidelines for constructing a prior using empirical Bayes methods. The proposed approach is compared with inverse probability of treatment weighting, including a doubly robust augmented version of this approach, for both single-stage and multi-stage regimes with treatment assignment depending on baseline covariates. The simulations show that the proposed nonparametric Bayesian approach can substantially improve inference compared to existing methods. An R program for implementing the DDP-GP-based Bayesian nonparametric analysis is freely available at https://www.ma.utexas.edu/users/yxu/.
Collapse
|
18
|
SMART Thinking: a Review of Recent Developments in Sequential Multiple Assignment Randomized Trials. CURR EPIDEMIOL REP 2016. [DOI: 10.1007/s40471-016-0079-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
19
|
Adaptive Interventions in Child and Adolescent Mental Health. JOURNAL OF CLINICAL CHILD AND ADOLESCENT PSYCHOLOGY 2016; 45:383-95. [PMID: 27310565 DOI: 10.1080/15374416.2016.1152555] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The treatment or prevention of child and adolescent mental health (CAMH) disorders often requires an individualized, sequential approach to intervention, whereby treatments (or prevention efforts) are adapted over time based on the youth's evolving status (e.g., early response, adherence). Adaptive interventions are intended to provide a replicable guide for the provision of individualized sequences of interventions in actual clinical practice. Recently, there has been great interest in the development of adaptive intervenions by investigators working in CAMH. The development of such replicable, real-world, individualized sequences of decision rules to guide the treatment or prevention of CAMH disorders represents an important "next step" in interventions research. The primary purpose of this special issue is to showcase some recent work on the science of adaptive interventions in CAMH. In this overview article, we review why individualized sequences of interventions are needed in CAMH, provide an introduction to adaptive interventions, briefly describe each of the articles included in this special issue, and describe some exciting areas of ongoing and future research. A hopeful outcome of this special issue is that it encourages other researchers in CAMH to pursue creative and significant research on adaptive interventions.
Collapse
|
20
|
Adaptive contrast weighted learning for multi-stage multi-treatment decision-making. Biometrics 2016; 73:145-155. [DOI: 10.1111/biom.12539] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Revised: 03/01/2016] [Accepted: 04/01/2016] [Indexed: 11/28/2022]
|
21
|
Design of sequentially randomized trials for testing adaptive treatment strategies. Stat Med 2016; 35:840-58. [PMID: 26412033 PMCID: PMC5150988 DOI: 10.1002/sim.6747] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2014] [Accepted: 08/31/2015] [Indexed: 11/07/2022]
Abstract
An adaptive treatment strategy (ATS) is an outcome-guided algorithm that allows personalized treatment of complex diseases based on patients' disease status and treatment history. Conditions such as AIDS, depression, and cancer usually require several stages of treatment because of the chronic, multifactorial nature of illness progression and management. Sequential multiple assignment randomized (SMAR) designs permit simultaneous inference about multiple ATSs, where patients are sequentially randomized to treatments at different stages depending upon response status. The purpose of the article is to develop a sample size formula to ensure adequate power for comparing two or more ATSs. Based on a Wald-type statistic for comparing multiple ATSs with a continuous endpoint, we develop a sample size formula and test it through simulation studies. We show via simulation that the proposed sample size formula maintains the nominal power. The proposed sample size formula is not applicable to designs with time-to-event endpoints but the formula will be useful for practitioners while designing SMAR trials to compare adaptive treatment strategies.
Collapse
|
22
|
Abstract
Dynamic treatment regimes (DTRs) are sequential decision rules for individual patients that can adapt over time to an evolving illness. The goal is to accommodate heterogeneity among patients and find the DTR which will produce the best long term outcome if implemented. We introduce two new statistical learning methods for estimating the optimal DTR, termed backward outcome weighted learning (BOWL), and simultaneous outcome weighted learning (SOWL). These approaches convert individualized treatment selection into an either sequential or simultaneous classification problem, and can thus be applied by modifying existing machine learning techniques. The proposed methods are based on directly maximizing over all DTRs a nonparametric estimator of the expected long-term outcome; this is fundamentally different than regression-based methods, for example Q-learning, which indirectly attempt such maximization and rely heavily on the correctness of postulated regression models. We prove that the resulting rules are consistent, and provide finite sample bounds for the errors using the estimated rules. Simulation results suggest the proposed methods produce superior DTRs compared with Q-learning especially in small samples. We illustrate the methods using data from a clinical trial for smoking cessation.
Collapse
|
23
|
Bayesian Dose-Finding in Two Treatment Cycles Based on the Joint Utility of Efficacy and Toxicity. J Am Stat Assoc 2015; 110:711-722. [PMID: 26366026 PMCID: PMC4562700 DOI: 10.1080/01621459.2014.926815] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
A phase I/II clinical trial design is proposed for adaptively and dynamically optimizing each patient's dose in each of two cycles of therapy based on the joint binary efficacy and toxicity outcomes in each cycle. A dose-outcome model is assumed that includes a Bayesian hierarchical latent variable structure to induce association among the outcomes and also facilitate posterior computation. Doses are chosen in each cycle based on posteriors of a model-based objective function, similar to a reinforcement learning or Q-learning function, defined in terms of numerical utilities of the joint outcomes in each cycle. For each patient, the procedure outputs a sequence of two actions, one for each cycle, with each action being the decision to either treat the patient at a chosen dose or not to treat. The cycle 2 action depends on the individual patient's cycle 1 dose and outcomes. In addition, decisions are based on posterior inference using other patients' data, and therefore the proposed method is adaptive both within and between patients. A simulation study of the method is presented, including comparison to two-cycle extensions of the conventional 3+3 algorithm, continual reassessment method, and a Bayesian model-based design, and evaluation of robustness.
Collapse
|
24
|
On Sparse representation for Optimal Individualized Treatment Selection with Penalized Outcome Weighted Learning. Stat (Int Stat Inst) 2015; 4:59-68. [PMID: 25883393 DOI: 10.1002/sta4.78] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
As a new strategy for treatment which takes individual heterogeneity into consideration, personalized medicine is of growing interest. Discovering individualized treatment rules (ITRs) for patients who have heterogeneous responses to treatment is one of the important areas in developing personalized medicine. As more and more information per individual is being collected in clinical studies and not all of the information is relevant for treatment discovery, variable selection becomes increasingly important in discovering individualized treatment rules. In this article, we develop a variable selection method based on penalized outcome weighted learning through which an optimal treatment rule is considered as a classification problem where each subject is weighted proportional to his or her clinical outcome. We show that the resulting estimator of the treatment rule is consistent and establish variable selection consistency and the asymptotic distribution of the estimators. The performance of the proposed approach is demonstrated via simulation studies and an analysis of chronic depression data.
Collapse
|
25
|
Abstract
A dynamic treatment regimen incorporates both accrued information and long-term effects of treatment from specially designed clinical trials. As these trials become more and more popular in conjunction with longitudinal data from clinical studies, the development of statistical inference for optimal dynamic treatment regimens is a high priority. In this paper, we propose a new machine learning framework called penalized Q-learning, under which valid statistical inference is established. We also propose a new statistical procedure: individual selection and corresponding methods for incorporating individual selection within penalized Q-learning. Extensive numerical studies are presented which compare the proposed methods with existing methods, under a variety of scenarios, and demonstrate that the proposed approach is both inferentially and computationally superior. It is illustrated with a depression clinical trial study.
Collapse
|
26
|
Abstract
In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data, a key goal is estimating the optimal regime, that, if followed by the patient population, would yield the most favorable outcome on average. Q- and A-learning are two main approaches for this purpose. We provide a detailed account of these methods, study their performance, and illustrate them using data from a depression study.
Collapse
|
27
|
Sequential multiple assignment randomized trial (SMART) with adaptive randomization for quality improvement in depression treatment program. Biometrics 2014; 71:450-9. [PMID: 25354029 DOI: 10.1111/biom.12258] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 08/01/2014] [Accepted: 09/01/2014] [Indexed: 11/28/2022]
Abstract
Implementation study is an important tool for deploying state-of-the-art treatments from clinical efficacy studies into a treatment program, with the dual goals of learning about effectiveness of the treatments and improving the quality of care for patients enrolled into the program. In this article, we deal with the design of a treatment program of dynamic treatment regimens (DTRs) for patients with depression post-acute coronary syndrome. We introduce a novel adaptive randomization scheme for a sequential multiple assignment randomized trial of DTRs. Our approach adapts the randomization probabilities to favor treatment sequences having comparatively superior Q-functions used in Q-learning. The proposed approach addresses three main concerns of an implementation study: it allows incorporation of historical data or opinions, it includes randomization for learning purposes, and it aims to improve care via adaptation throughout the program. We demonstrate how to apply our method to design a depression treatment program using data from a previous study. By simulation, we illustrate that the inputs from historical data are important for the program performance measured by the expected outcomes of the enrollees, but also show that the adaptive randomization scheme is able to compensate poorly specified historical inputs by improving patient outcomes within a reasonable horizon. The simulation results also confirm that the proposed design allows efficient learning of the treatments by alleviating the curse of dimensionality.
Collapse
|
28
|
Abstract
BACKGROUND Cancer affects millions of people worldwide each year. Patients require sequences of treatment based on their response to previous treatments to combat cancer and fight metastases. Physicians provide treatment based on clinical characteristics, changing over time. Guidelines for these individualized sequences of treatments are known as dynamic treatment regimens (DTRs) where the initial treatment and subsequent modifications depend on the response to previous treatments, disease progression, and other patient characteristics or behaviors. To provide evidence-based DTRs, the Sequential Multiple Assignment Randomized Trial (SMART) has emerged over the past few decades. PURPOSE To examine and learn from past SMARTs investigating cancer treatment options, to discuss potential limitations preventing the widespread use of SMARTs in cancer research, and to describe courses of action to increase the implementation of SMARTs and collaboration between statisticians and clinicians. CONCLUSION There have been SMARTs investigating treatment questions in areas of cancer, but the novelty and perceived complexity has limited its use. By building bridges between statisticians and clinicians, clarifying research objectives, and furthering methods work, there should be an increase in SMARTs addressing relevant cancer treatment questions. Within any area of cancer, SMARTs develop DTRs that can guide treatment decisions over the disease history and improve patient outcomes.
Collapse
|
29
|
Personalizing medicine: a review of adaptive treatment strategies. Pharmacoepidemiol Drug Saf 2014; 23:580-5. [DOI: 10.1002/pds.3606] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Revised: 02/04/2014] [Accepted: 02/04/2014] [Indexed: 11/10/2022]
|
30
|
Recent development on statistical methods for personalized medicine discovery. Front Med 2013; 7:102-10. [PMID: 23377890 DOI: 10.1007/s11684-013-0245-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Accepted: 12/06/2012] [Indexed: 01/01/2023]
Abstract
It is well documented that patients can show significant heterogeneous responses to treatments so the best treatment strategies may require adaptation over individuals and time. Recently, a number of new statistical methods have been developed to tackle the important problem of estimating personalized treatment rules using single-stage or multiple-stage clinical data. In this paper, we provide an overview of these methods and list a number of challenges.
Collapse
|
31
|
Reply from Authors re: Camillo Porta. How to Identify Active Novel Agents in Rare Cancers and then Make Them Available: A Need for a Paradigm Shift. Eur Urol 2012;62:1020–1. Eur Urol 2012. [DOI: 10.1016/j.eururo.2012.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
32
|
Rejoinder to comments on Evaluation of Viable Dynamic Treatment Regimes in a Sequentially Randomized Trial of Advanced Prostate Cancer. J Am Stat Assoc 2012; 107:518-520. [PMID: 24489418 DOI: 10.1080/01621459.2012.665198] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
33
|
A Bayesian-frequentist two-stage single-arm phase II clinical trial design. Stat Med 2012; 31:2055-67. [DOI: 10.1002/sim.5330] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2010] [Accepted: 01/10/2012] [Indexed: 11/08/2022]
|
34
|
Up-front versus sequential randomizations for inference on adaptive treatment strategies. Stat Med 2012; 31:812-30. [DOI: 10.1002/sim.4473] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2010] [Accepted: 11/01/2011] [Indexed: 11/06/2022]
|
35
|
Abstract
We develop methodology for a multistage-decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.
Collapse
|
36
|
Abstract
Interventions often involve a sequence of decisions. For example, clinicians frequently adapt the intervention to an individual's outcomes. Altering the intensity and type of intervention over time is crucial for many reasons, such as to obtain improvement if the individual is not responding or to reduce costs and burden when intensive treatment is no longer necessary. Adaptive interventions utilize individual variables (severity, preferences) to adapt the intervention and then dynamically utilize individual outcomes (response to treatment, adherence) to readapt the intervention. The Sequential Multiple Assignment Randomized Trial (SMART) provides high-quality data that can be used to construct adaptive interventions. We review the SMART and highlight its advantages in constructing and revising adaptive interventions as compared to alternative experimental designs. Selected examples of SMART studies are described and compared. A data analysis method is provided and illustrated using data from the Extending Treatment Effectiveness of Naltrexone SMART study.
Collapse
|
37
|
Abstract
Typical regimens for advanced metastatic stage IIIB/IV nonsmall cell lung cancer (NSCLC) consist of multiple lines of treatment. We present an adaptive reinforcement learning approach to discover optimal individualized treatment regimens from a specially designed clinical trial (a "clinical reinforcement trial") of an experimental treatment for patients with advanced NSCLC who have not been treated previously with systemic therapy. In addition to the complexity of the problem of selecting optimal compounds for first- and second-line treatments based on prognostic factors, another primary goal is to determine the optimal time to initiate second-line therapy, either immediately or delayed after induction therapy, yielding the longest overall survival time. A reinforcement learning method called Q-learning is utilized, which involves learning an optimal regimen from patient data generated from the clinical reinforcement trial. Approximating the Q-function with time-indexed parameters can be achieved by using a modification of support vector regression that can utilize censored data. Within this framework, a simulation study shows that the procedure can extract optimal regimens for two lines of treatment directly from clinical data without prior knowledge of the treatment effect mechanism. In addition, we demonstrate that the design reliably selects the best initial time for second-line therapy while taking into account the heterogeneity of NSCLC across patients.
Collapse
|
38
|
A comparison of two worlds: How does Bayes hold up to the status quo for the analysis of clinical trials? Contemp Clin Trials 2011; 32:561-8. [PMID: 21453792 PMCID: PMC4477745 DOI: 10.1016/j.cct.2011.03.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2010] [Revised: 03/08/2011] [Accepted: 03/15/2011] [Indexed: 11/22/2022]
Abstract
BACKGROUND There is a paucity of literature comparing Bayesian analytic techniques with traditional approaches for analyzing clinical trials using real trial data. METHODS We compared Bayesian and frequentist group sequential methods using data from two published clinical trials. We chose two widely accepted frequentist rules, O'Brien-Fleming and Lan-DeMets, and conjugate Bayesian priors. Using the nonparametric bootstrap, we estimated a sampling distribution of stopping times for each method. Because current practice dictates the preservation of an experiment-wise false positive rate (Type I error), we approximated these error rates for our Bayesian and frequentist analyses with the posterior probability of detecting an effect in a simulated null sample. Thus for the data-generated distribution represented by these trials, we were able to compare the relative performance of these techniques. RESULTS No final outcomes differed from those of the original trials. However, the timing of trial termination differed substantially by method and varied by trial. For one trial, group sequential designs of either type dictated early stopping of the study. In the other, stopping times were dependent upon the choice of spending function and prior distribution. CONCLUSIONS Results indicate that trialists ought to consider Bayesian methods in addition to traditional approaches for analysis of clinical trials. Though findings from this small sample did not demonstrate either method to consistently outperform the other, they did suggest the need to replicate these comparisons using data from varied clinical trials in order to determine the conditions under which the different methods would be most efficient.
Collapse
|
39
|
Dynamic treatment regimes for managing chronic health conditions: a statistical perspective. Am J Public Health 2010; 101:40-5. [PMID: 21088260 DOI: 10.2105/ajph.2010.198937] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Dynamic treatment regimes are an emerging and important methodological area in health research, particularly in the management of chronic health conditions. This paradigm encompasses the ideological shift in research from the acute care model to the chronic care model. It allows individualization of treatment (type, dosage, timing) at each stage of intervention. Constructing evidence-based dynamic treatment regimes requires implementation of cutting-edge design and analysis tools. Here I briefly discuss some of these modern tools, namely the sequential multiple assignment randomized trial (SMART) design and a regression-based analysis approach called Q-learning.
Collapse
|
40
|
Abstract
A dynamic treatment regime is a set of decision rules, one per stage, each taking a patient's treatment and covariate history as input, and outputting a recommended treatment. In the estimation of the optimal dynamic treatment regime from longitudinal data, the treatment effect parameters at any stage prior to the last can be non-regular under certain distributions of the data. This results in biased estimates and invalid confidence intervals for the treatment effect parameters. In this article, we discuss both the problem of non-regularity, and available estimation methods. We provide an extensive simulation study to compare the estimators in terms of their ability to lead to valid confidence intervals under a variety of non-regular scenarios. Analysis of a data set from a smoking cessation trial is provided as an illustration.
Collapse
|
41
|
Bladder Cancer: Optimal Application of Preclinical Models to Suitable Translational Questions. Sci Transl Med 2010; 2:22ps11. [DOI: 10.1126/scitranslmed.3000215] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
42
|
Abstract
We develop reinforcement learning trials for discovering individualized treatment regimens for life-threatening diseases such as cancer. A temporal-difference learning method called Q-learning is utilized that involves learning an optimal policy from a single training set of finite longitudinal patient trajectories. Approximating the Q-function with time-indexed parameters can be achieved by using support vector regression or extremely randomized trees. Within this framework, we demonstrate that the procedure can extract optimal strategies directly from clinical data without relying on the identification of any accurate mathematical models, unlike approaches based on adaptive design. We show that reinforcement learning has tremendous potential in clinical research because it can select actions that improve outcomes by taking into account delayed effects even when the relationship between actions and outcomes is not fully known. To support our claims, the methodology's practical utility is illustrated in a simulation analysis. In the immediate future, we will apply this general strategy to studying and identifying new treatments for advanced metastatic stage IIIB/IV non-small cell lung cancer, which usually includes multiple lines of chemotherapy treatment. Moreover, there is significant potential of the proposed methodology for developing personalized treatment strategies in other cancers, in cystic fibrosis, and in other life-threatening diseases.
Collapse
|
43
|
Inference for two-stage adaptive treatment strategies using mixture distributions. J R Stat Soc Ser C Appl Stat 2010. [DOI: 10.1111/j.1467-9876.2009.00679.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
44
|
|
45
|
A review of phase 2-3 clinical trial designs. LIFETIME DATA ANALYSIS 2008; 14:37-53. [PMID: 17763973 DOI: 10.1007/s10985-007-9049-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2007] [Accepted: 07/18/2007] [Indexed: 05/17/2023]
Abstract
This article reviews phase 2-3 clinical trial designs, including their genesis and the potential role of such designs in treatment evaluation. The paper begins with a discussion of the many scientific flaws in the conventional phase 2 --> phase 3 treatment evaluation process that motivate phase 2-3 designs. This is followed by descriptions of some particular phase 2-3 designs that have been proposed, including two-stage designs to evaluate one experimental treatment, a design that accommodates both frontline and salvage therapy in oncology, two-stage select-and-test designs that evaluate several experimental treatments, dose-ranging designs, and a seamless phase 2-3 design based on both early response-toxicity outcomes and later event times. A general conclusion is that, in many circumstances, a properly designed phase 2-3 trial utilizes resources much more efficiently and provides much more reliable inferences than conventional methods.
Collapse
|