1
|
Dissecting the restricted mean time in favor of treatment. J Biopharm Stat 2024; 34:111-126. [PMID: 37224223 PMCID: PMC10667568 DOI: 10.1080/10543406.2023.2210658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 05/01/2023] [Indexed: 05/26/2023]
Abstract
The restricted mean time in favor (RMT-IF) summarizes the treatment effect on a hierarchical composite endpoint with mortality at the top. Its crude decomposition into "stage-wise effects," i.e., the net average time gained by the treatment prior to each component event, does not reveal the patient state in which the extra time is spent. To obtain this information, we break each stage-wise effect into subcomponents according to the specific state to which the reference condition is improved. After re-expressing the subcomponents as functionals of the marginal survival functions of outcome events, we estimate them conveniently by plugging in the Kaplan -- Meier estimators. Their robust variance matrices allow us to construct joint tests on the decomposed units, which are particularly powerful against component-wise differential treatment effects. By reanalyzing a cancer trial and a cardiovascular trial, we acquire new insights into the quality and composition of the extra survival times, as well as the extra time with fewer hospitalizations, gained by the treatment in question. The proposed methods are implemented in the rmt package freely available on the Comprehensive R Archive Network (CRAN).
Collapse
|
2
|
Sample size calculation for multi-arm parallel design with restricted mean survival time. Stat Methods Med Res 2024; 33:130-147. [PMID: 38093411 DOI: 10.1177/09622802231219852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2024]
Abstract
With the recent advances in oncology treatment, restricted mean survival time (RMST) is increasingly being used to replace the routine approach based on hazard ratios in randomized controlled trials for time-to-event outcomes. While RMST has been widely applied in single-arm and two-arm designs, challenges still exist in comparing RMST in multi-arm trials with three or more groups. In particular, it is unclear in the literature how to compare more than one intervention simultaneously or perform multiple testing based on RMST, and sample size determination is a major obstacle to its penetration to practice. In this paper, we propose a novel method of designing multi-arm clinical trials with right-censored survival endpoint based on RMST that can be applied in both phase II/III settings using a global χ 2 test as well as a modeling-based multiple comparison procedure. The framework provides a closed-form sample size formula built upon a multi-arm global test and a sample size determination procedure based on multiple-comparison in the phase II dose-finding study. The proposed method enjoys strong robustness and flexibility as it requires less a priori set-up than conventional work, and obtains a smaller sample size while achieving the target power. In the assessment of sample size, we also incorporate practical considerations, including the presence of non-proportional hazards and staggered patient entry. We evaluate the validity of our method through simulation studies under various scenarios. Finally, we demonstrate the accuracy and stability of our method by implementing it in the design of two real clinical trial examples.
Collapse
|
3
|
Study design for restricted mean time analysis of recurrent events and death. Biometrics 2023; 79:3701-3714. [PMID: 37612246 PMCID: PMC10841174 DOI: 10.1111/biom.13923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 08/10/2023] [Indexed: 08/25/2023]
Abstract
The restricted mean time in favor (RMT-IF) of treatment has just been added to the analytic toolbox for composite endpoints of recurrent events and death. To help practitioners design new trials based on this method, we develop tools to calculate the sample size and power. Specifically, we formulate the outcomes as a multistate Markov process with a sequence of transient states for recurrent events and an absorbing state for death. The transition intensities, in this case the instantaneous risks of another nonfatal event or death, are assumed to be time-homogeneous but nonetheless allowed to depend on the number of past events. Using the properties of Coxian distributions, we derive the RMT-IF effect size under the alternative hypothesis as a function of the treatment-to-control intensity ratios along with the baseline intensities, the latter of which can be easily estimated from historical data. We also reduce the variance of the nonparametric RMT-IF estimator to calculable terms under a standard set-up for censoring. Simulation studies show that the resulting formulas provide accurate approximation to the sample size and power in realistic settings. For illustration, a past cardiovascular trial with recurrent-hospitalization and mortality outcomes is analyzed to generate the parameters needed to design a future trial. The procedures are incorporated into the rmt package along with the original methodology on the Comprehensive R Archive Network (CRAN).
Collapse
|
4
|
Omnibus test for restricted mean survival time based on influence function. Stat Methods Med Res 2023; 32:1082-1099. [PMID: 37015346 PMCID: PMC10331519 DOI: 10.1177/09622802231158735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2023]
Abstract
The restricted mean survival time (RMST), which evaluates the expected survival time up to a pre-specified time point τ , has been widely used to summarize the survival distribution due to its robustness and straightforward interpretation. In comparative studies with time-to-event data, the RMST-based test has been utilized as an alternative to the classic log-rank test because the power of the log-rank test deteriorates when the proportional hazards assumption is violated. To overcome the challenge of selecting an appropriate time point τ , we develop an RMST-based omnibus Wald test to detect the survival difference between two groups throughout the study follow-up period. Treating a vector of RMSTs at multiple quantile-based time points as a statistical functional, we construct a Wald χ 2 test statistic and derive its asymptotic distribution using the influence function. We further propose a new procedure based on the influence function to estimate the asymptotic covariance matrix in contrast to the usual bootstrap method. Simulations under different scenarios validate the size of our RMST-based omnibus test and demonstrate its advantage over the existing tests in power, especially when the true survival functions cross within the study follow-up period. For illustration, the proposed test is applied to two real datasets, which demonstrate its power and applicability in various situations.
Collapse
|
5
|
On assessing survival benefit of immunotherapy using long-term restricted mean survival time. Stat Med 2023; 42:1139-1155. [PMID: 36653933 DOI: 10.1002/sim.9662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 11/09/2022] [Accepted: 01/05/2023] [Indexed: 01/20/2023]
Abstract
The pattern of the difference between two survival curves we often observe in randomized clinical trials for evaluating immunotherapy is not proportional hazards; the treatment effect typically appears several months after the initiation of the treatment (ie, delayed difference pattern). The commonly used logrank test and hazard ratio estimation approach will be suboptimal concerning testing and estimation for those trials. The long-term restricted mean survival time (LT-RMST) approach is a promising alternative for detecting the treatment effect that potentially appears later in the study. A challenge in employing the LT-RMST approach is that it must specify a lower end of the time window in addition to a truncation time point that the RMST requires. There are several investigations and suggestions regarding the choice of the truncation time point for the RMST. However, little has been investigated to address the choice of the lower end of the time window. In this paper, we propose a flexible LT-RMST-based test/estimation approach that does not require users to specify a lower end of the time window. Numerical studies demonstrated that the potential power loss by adopting this flexibility was minimal, compared to the standard LT-RMST approach using a prespecified lower end of the time window. The proposed method is flexible and can offer higher power than the RMST-based approach when the delayed treatment effect is expected. Also, it provides a robust estimate of the magnitude of the treatment effect and its confidence interval that corresponds to the test result.
Collapse
|
6
|
Designing superiority trials with window mean survival time as a primary endpoint. Stat Med 2023. [DOI: 10.1002/sim.9738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 02/09/2023] [Accepted: 03/25/2023] [Indexed: 04/04/2023]
|
7
|
Ratio and difference of average hazard with survival weight: New measures to quantify survival benefit of new therapy. Stat Med 2023; 42:936-952. [PMID: 36604833 DOI: 10.1002/sim.9651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/02/2022] [Accepted: 12/22/2022] [Indexed: 01/07/2023]
Abstract
The hazard ratio (HR) has been the most popular measure to quantify the magnitude of treatment effect on time-to-event outcomes in clinical research. However, the traditional Cox's HR approach has several drawbacks. One major issue is that there is no clear interpretation when the proportional hazards (PH) assumption does not hold, because the estimated HR is affected by study-specific censoring time distribution in non-PH cases. Another major issue is that the lack of a group-specific absolute hazard value in each group obscures the clinical significance of the magnitude of the treatment effect. Given these, we propose average hazard with survival weight (AH-SW) as a summary metric of event time distribution and will use difference in AH-SW (DAH-SW) or ratio of AH-SW (RAH-SW) to quantify the treatment effect magnitude. The AH-SW is interpreted as a person-time incidence rate that does not depend on random censoring. It is defined as the ratio of cumulative incidence probability and restricted mean survival time (RMST), which can be estimated non-parametrically. Numerical studies demonstrate that DAH-SW and RAH-SW offer almost identical power to Cox's HR-based tests under PH scenarios and can be more powerful for delayed-difference patterns often seen in immunotherapy trials. Like median and RMST differences, the proposed approach is a good model-free alternative to the HR-based approach for evaluating the treatment effect magnitude. Such a model-free measure will increase the likelihood that results from clinical studies are correctly interpreted and generalized to future populations.
Collapse
|
8
|
A comparative study to alternatives to the log-rank test. Contemp Clin Trials 2023; 128:107165. [PMID: 36972865 DOI: 10.1016/j.cct.2023.107165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/17/2023] [Accepted: 03/20/2023] [Indexed: 03/29/2023]
Abstract
BACKGROUND Studies to compare the survival of two or more groups using time-to-event data are of high importance in medical research. The gold standard is the log-rank test, which is optimal under proportional hazards. As the latter is no simple regularity assumption, we are interested in evaluating the power of various statistical tests under different settings including proportional and non-proportional hazards with a special emphasis on crossing hazards. This challenge has been going on for many years now and multiple methods have already been investigated in extensive simulation studies. However, in recent years new omnibus tests and methods based on the restricted mean survival time appeared that have been strongly recommended in biometric literature. METHODS Thus, to give updated recommendations, we perform a vast simulation study to compare tests that showed high power in previous studies with these more recent approaches. We thereby analyze various simulation settings with varying survival and censoring distributions, unequal censoring between groups, small sample sizes and unbalanced group sizes. RESULTS Overall, omnibus tests are more robust in terms of power against deviations from the proportional hazards assumption. CONCLUSION We recommend considering the more robust omnibus approaches for group comparison in case of uncertainty about the underlying survival time distributions.
Collapse
|
9
|
Power and Sample Size Calculations for the Restricted Mean Time Analysis of Prioritized Composite Endpoints. Stat Biopharm Res 2022; 15:540-548. [PMID: 37663164 PMCID: PMC10473860 DOI: 10.1080/19466315.2022.2110936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 04/27/2022] [Accepted: 07/25/2022] [Indexed: 10/15/2022]
Abstract
As a new way of reporting treatment effect, the restricted mean time in favor (RMT-IF) of treatment measures the net average time the treated have had a less serious outcome than the untreated over a specified time window. With multiple outcomes of differing severity, this offers a more interpretable and data-efficient alternative to the prototypical restricted mean (event-free) survival time. To facilitate its adoption in actual trials, we develop simple approaches to power and sample size calculations and implement them in user-friendly R programs. In doing so we model the bivariate outcomes of death and a nonfatal event using a Gumbel-Hougaard copula with component-wise proportional hazards structures, under which the RMT-IF estimand is derived in closed form. In a standard set-up for censoring, the variance of the nonparametric effect-size estimator is simplified and computed via a hybrid of numerical and Monte Carlo integrations, allowing us to compute the power and sample size as functions of component-wise hazard ratios. Simulation studies show that these formulas provide accurate approximations in realistic settings. To illustrate our methods, we consider designing a new trial to evaluate treatment effect on the composite outcomes of death and cancer relapse in lymph node-positive breast cancer patients, with baseline parameters calculated from a previous study.
Collapse
|
10
|
Conversion of non-inferiority margin from hazard ratio to restricted mean survival time difference using data from multiple historical trials. Stat Methods Med Res 2022; 31:1819-1844. [PMID: 35642291 DOI: 10.1177/09622802221102621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The restricted mean survival time measure has gained a lot of interests for designing and analyzing oncology trials with time-to-event endpoints due to its intuitive clinical interpretation and potentially high statistical power. In the non-inferiority trial literature, restricted mean survival time has been used as an alternative measure for reanalyzing a completed trial, which was originally designed and analyzed based on traditional proportional hazard model. However, the reanalysis procedure requires a conversion from the non-inferiority margin measured in hazard ratio to a non-inferiority margin measured by restricted mean survival time difference. An existing conversion method assumes a Weibull distribution for the population survival time of the historical active control group under the proportional hazard assumption using data from a single trial. In this article, we develop a methodology for non-inferiority margin conversion when data from multiple historical active control studies are available, and introduce a Kaplan-Meier estimator-based method for the non-inferiority margin conversion to relax the parametric assumption. We report extensive simulation studies to examine the performances of proposed methods under the Weibull data generative models and a piecewise-exponential data generative model that mimic the tumor recurrence and survival characteristics of advanced colon cancer. This work is motivated to achieve non-inferiority margin conversion, using historical patient-level data from a large colon cancer clinical database, to reanalyze an internationally collaborated non-inferiority study that evaluates 6-month versus 3-month duration of adjuvant chemotherapy in stage III colon cancer patients.
Collapse
|
11
|
Impact of COVID-19 pandemic on oncology clinical trial design, data collection and analysis. Contemp Clin Trials 2022; 116:106736. [PMID: 35331946 PMCID: PMC8935956 DOI: 10.1016/j.cct.2022.106736] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 03/14/2022] [Accepted: 03/17/2022] [Indexed: 01/26/2023]
Abstract
BACKGROUND To identify and assess via simulation the impact of COVID-19 pandemic on oncology trials and discuss potential mitigation strategies for study design, data collection, endpoints and analyses. METHODS We simulated clinical trials to evaluate the COVID-19 impact on overall survival and progression-free survival. We evaluated survival in single-region trials with different proportions of impacted patients across treatment arms, and in multi-region randomized trials with different proportions of impacted patients across regions. We also assessed the impact on PFS when the missingness of disease assessment and censoring rules vary. Impact on the trial success and robustness of statistical inference was summarized. RESULTS Without regional impact, the impact on OS analysis is minimal if proportions of impacted patients are similar across arms, however, if a larger proportion of treatment arm patients are impacted, trials may suffer substantial power loss and underestimate treatment effect size. For multi-region trials, if more treatment arm patients are enrolled from more severely impacted regions, trials also have poorer performance. For PFS analysis, the intent-to-treat rule performs well even when the treatment arm patients are more likely to miss disease assessments, while the consecutive-missing censoring rule may lead to poorer performance. CONCLUSION COVID-19 affects oncology trials. Simulations would be highly informative to Data Monitoring Committee in understanding the impact and making appropriate recommendations, upon which the sponsor could start planning potential remedies. We also recommend a decision tree for choosing the appropriate methods for PFS evaluation in the presence of missing disease assessments due to COVID-19.
Collapse
|
12
|
Which test for crossing survival curves? A user’s guideline. BMC Med Res Methodol 2022; 22:34. [PMID: 35094686 PMCID: PMC8802494 DOI: 10.1186/s12874-022-01520-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 01/18/2022] [Indexed: 12/27/2022] Open
Abstract
Background The exchange of knowledge between statisticians developing new methodology and clinicians, reviewers or authors applying them is fundamental. This is specifically true for clinical trials with time-to-event endpoints. Thereby, one of the most commonly arising questions is that of equal survival distributions in two-armed trial. The log-rank test is still the gold-standard to infer this question. However, in case of non-proportional hazards, its power can become poor and multiple extensions have been developed to overcome this issue. We aim to facilitate the choice of a test for the detection of survival differences in the case of crossing hazards. Methods We restricted the review to the most recent two-armed clinical oncology trials with crossing survival curves. Each data set was reconstructed using a state-of-the-art reconstruction algorithm. To ensure reproduction quality, only publications with published number at risk at multiple time points, sufficient printing quality and a non-informative censoring pattern were included. This article depicts the p-values of the log-rank and Peto-Peto test as references and compares them with nine different tests developed for detection of survival differences in the presence of non-proportional or crossing hazards. Results We reviewed 1400 recent phase III clinical oncology trials and selected fifteen studies that met our eligibility criteria for data reconstruction. After including further three individual patient data sets, for nine out of eighteen studies significant differences in survival were found using the investigated tests. An important point that reviewers should pay attention to is that 28% of the studies with published survival curves did not report the number at risk. This makes reconstruction and plausibility checks almost impossible. Conclusions The evaluation shows that inference methods constructed to detect differences in survival in presence of non-proportional hazards are beneficial and help to provide guidance in choosing a sensible alternative to the standard log-rank test. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01520-0.
Collapse
|
13
|
Extensions of empirical likelihood and chi-squared-based tests for ordered alternatives. J Appl Stat 2022; 49:24-43. [DOI: 10.1080/02664763.2020.1796944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
14
|
Abstract
When designing a comparative oncology trial for an overall or progression-free survival endpoint, investigators often quantify the treatment effect using a difference in median survival times. However, rather than directly designing the study to estimate this difference, it is almost always converted to a hazard ratio (HR) to determine the study size. At the analysis stage, the hazard ratio is utilized for formal analysis, yet because it may be difficult to interpret clinically, especially when the proportional hazards assumption is not met, the observed medians are also reported descriptively. The hazard ratio and median difference contrast different aspects of the survival curves. Whereas the hazard ratio places greater emphasis on late-occurring separation, the median difference focuses locally on the centers of the distributions and cannot capture either short- or long-term differences. Having 2 sets of summaries (a hazard ratio and the medians) may lead to incoherent conclusions regarding the treatment effect. For instance, the hazard ratio may suggest a treatment difference whereas the medians do not, or vice versa. In this commentary, we illustrate these commonly encountered issues using examples from recent oncology trials. We present a coherent alternative strategy that, unlike relying on the hazard ratio, does not require modeling assumptions and always results in clinically interpretable summaries of the treatment effect.
Collapse
|
15
|
Restricted survival benefit with right-censored data. Biom J 2021; 64:696-713. [PMID: 34970772 DOI: 10.1002/bimj.202000392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Revised: 10/09/2021] [Accepted: 10/24/2021] [Indexed: 11/11/2022]
Abstract
The hazard ratio is widely used to quantify treatment effects. However, it may be difficult to interpret for patients and practitioners, especially when the hazard ratio is not constant over time. Alternative measures of the treatment effects have been proposed such as the difference of the restricted mean survival times, the difference in survival proportions at some fixed follow-up time, or the net chance of a longer survival. In this paper, we propose the restricted survival benefit (RSB), a quantity that can incorporate multiple useful measurements of treatment effects. Hence, it provides a framework for a comprehensive assessment of the treatment effects. We provide estimation and inference procedures for the RSB that accommodate censored survival outcomes, using methods of the inverse-probability-censoring-weighted U -statistic and the jackknife empirical likelihood. We conduct extensive simulation studies to examine the numerical performance of the proposed method, and we analyze data from a randomized Phase III clinical trial (SWOG S0777) using the proposed method.
Collapse
|
16
|
Complex survival trial design by the product integration method. Stat Med 2021; 41:798-814. [PMID: 34908180 DOI: 10.1002/sim.9256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 09/29/2021] [Accepted: 10/23/2021] [Indexed: 11/09/2022]
Abstract
Nonproportional hazards (NPHs) are often observed in survival trials such as the immunotherapy cancer trials. Under NPH, the classical log-rank test can be inefficient, and the estimated hazards ratio from the Cox model is difficult to interpret. The weighted log-rank test, and the tests for comparing the restricted mean survival time or the milestone survival become increasingly popular in handling NPH. The sample size calculation for these tests may require high-dimensional numerical integration. We present a sample size determination method for survival trials via product integration on the basis of a continuous-time multistate Markov model. The main challenge of the method lies in the design of the multistate model under a complex NPH pattern, and this is illustrated for NPH induced by delayed effect with individual heterogeneity in the lag duration, cure fractions, and treatment switching due to disease progression or noncompliance. Numerical examples are presented to demonstrate the accuracy of the proposed method. We obtain the following findings. The powers of the tests for milestone survival and RMST depend on both the trial duration and milestone timepoint, and may not increase as the milestone timepoint increases. If the milestone timepoint is appropriately chosen, the RMST test can be more powerful than the conventional log-rank test in the presence of diminishing treatment effect or in the proportional hazards cure model. In general, the RMST test yields lower power than a proper Fleming-Harrington weighted log-rank test.
Collapse
|
17
|
Choosing clinically interpretable summary measures and robust analytic procedures for quantifying the treatment difference in comparative clinical studies. Stat Med 2021; 40:6235-6242. [PMID: 34783094 PMCID: PMC8687139 DOI: 10.1002/sim.8971] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 02/20/2021] [Accepted: 03/16/2021] [Indexed: 12/11/2022]
|
18
|
Bayesian multivariate network meta-analysis model for the difference in restricted mean survival times. Stat Med 2021; 41:595-611. [PMID: 34883534 DOI: 10.1002/sim.9276] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 10/15/2021] [Accepted: 10/23/2021] [Indexed: 11/08/2022]
Abstract
Network meta-analysis (NMA) is essential for clinical decision-making. NMA enables inference for all pair-wise comparisons between interventions available for the same indication, by using both direct evidence and indirect evidence. In randomized trials with time-to event outcome data, such as lung cancer data, conventional NMA methods rely on the hazard ratio and the proportional hazards assumption, and ignore the varying follow-up durations across trials. We introduce a novel multivariate NMA model for the difference in restricted mean survival times (RMST). Our model synthesizes all the available evidence from multiple time points simultaneously and borrows information across time points through within-study covariance and between-study covariance for the differences in RMST. We propose an estimator of the within-study covariance and we then assume it to be known. We estimate the model under the Bayesian framework. We evaluated our model by conducting a simulation study. Our multiple-time-point model yields lower mean squared error over the conventional single-time-point model at all time points, especially when the availability of evidence decreases. We illustrated the model on a network of randomized trials of second-line treatments of advanced non-small-cell lung cancer. Our multiple-time-point model yielded increased precision and detected evidence of benefit at earlier time points as compared to the single-time-point model. Our model has the advantage of providing clinically interpretable measures of treatment effects.
Collapse
|
19
|
Treatment effect measures for culture conversion endpoints in phase IIb tuberculosis treatment trials. Clin Infect Dis 2021; 73:2131-2139. [PMID: 34254635 DOI: 10.1093/cid/ciab576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Indexed: 11/12/2022] Open
Abstract
Phase IIb trials of tuberculosis therapy rely on early biomarkers of treatment effect. Despite limited predictive ability for clinical outcomes, culture conversion, the event in which an individual previously culture positive for Mycobacterium tuberculosis yields a negative culture after initiating treatment, is a commonly used endpoint. Lack of consensus on how to define the outcome and corresponding measure of treatment effect complicates interpretation and limits between-trial comparisons. We review common analytic approaches to measuring treatment effect and introduce difference in restricted mean survival times as an alternative to identify faster times to culture conversion and express magnitude of effect on the time scale. Findings from the PanACEA MAMSTB trial are reanalyzed as an illustrative example. In a systematic review we demonstrate variability in analytic approaches, sampling strategies, and outcome definitions in phase IIb tuberculosis trials. Harmonization would allow for larger meta-analyses, and may help expedite advancement of new TB therapeutics.
Collapse
|
20
|
Some new confidence intervals for Kaplan-Meier based estimators from one and two sample survival data. Stat Med 2021; 40:4961-4976. [PMID: 34131948 DOI: 10.1002/sim.9105] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 04/08/2021] [Accepted: 05/19/2021] [Indexed: 11/09/2022]
Abstract
The restricted mean survival time (RMST) has been popularly used to assess the treatment effect in survival trials. Greenwood's formula is often used to estimate the variance of RMST, and the resulting Wald confidence interval (CI) tends to be liberal in small and moderate samples. We propose the empirical likelihood ratio, score-type, and loglog transformed CIs for RMST in a single sample. The method of variance estimates recovery technique is used to derive the CIs for the difference and ratio parameters in the two sample inference. A variance estimate, which assumes equal survival curves, but possibly different censoring rates in the two groups, is proposed for comparing two groups. The new variance estimate shows excellent performance in testing for superiority, and also works well for a noninferiority test with a small margin, and for the interval estimation when the two survival curves are close. We use similar techniques to construct CIs for comparing two milestone survival probabilities. Numerical examples are used to assess these interval estimation methods.
Collapse
|
21
|
Interpreting the Clinical Utility of Early Interdisciplinary Supportive Care for Untreated Metastatic Esophageal Cancer. J Clin Oncol 2021; 39:2518. [PMID: 33961499 DOI: 10.1200/jco.21.00122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
22
|
Reduction in number to treat versus number needed to treat. BMC Med Res Methodol 2021; 21:48. [PMID: 33750292 PMCID: PMC7945324 DOI: 10.1186/s12874-021-01246-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Accepted: 03/02/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We propose a new measure of treatment effect based on the expected reduction in the number of patients to treat (RNT) which is defined as the difference of the reciprocals of clinical measures of interest between two arms. Compared with the conventional number needed to treat (NNT), RNT shows superiority with both binary and time-to-event endpoints in randomized controlled trials (RCTs). METHODS Five real RCTs, two with binary endpoints and three with survival endpoints, are used to illustrate the concept of RNT and compare the performances between RNT and NNT. For survival endpoints, we propose two versions of RNT: one is based on the survival rate and the other is based on the restricted mean survival time (RMST). Hypothetical scenarios are also constructed to explore the advantages and disadvantages of RNT and NNT. RESULTS Because there is no baseline for computation of NNT, it fails to differentiate treatment effect in the absolute scale. In contrast, RNT conveys more information than NNT due to its reversed order of differencing and inverting. For survival endpoints, two versions of RNT calculated as the difference of the reciprocals of survival rates and RMSTs are complementary to each other. The RMST-based RNT can capture the entire follow-up profile and thus is clinically more intuitive and meaningful, as it inherits the time-to-event characteristics for survival endpoints instead of using truncated binary endpoints at a specific time point. CONCLUSIONS The RNT can serve as an alternative measure for quantifying treatment effect in RCTs, which complements NNT to help patients and clinicians better understand the magnitude of treatment benefit.
Collapse
|
23
|
Are restricted mean survival time methods especially useful for noninferiority trials? Clin Trials 2021; 18:188-196. [PMID: 33626896 DOI: 10.1177/1740774520976576] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
BACKGROUND Restricted mean survival time methods compare the areas under the Kaplan-Meier curves up to a time τ for the control and experimental treatments. Extraordinary claims have been made about the benefits (in terms of dramatically smaller required sample sizes) when using restricted mean survival time methods as compared to proportional hazards methods for analyzing noninferiority trials, even when the true survival distributions satisfy proportional hazardss. METHODS Through some limited simulations and asymptotic power calculations, the authors compare the operating characteristics of restricted mean survival time and proportional hazards methods for analyzing both noninferiority and superiority trials under proportional hazardss to understand what relative power benefits there are when using restricted mean survival time methods for noninferiority testing. RESULTS In the setting of low-event rates, very large targeted noninferiority margins, and limited follow-up past τ, restricted mean survival time methods have more power than proportional hazards methods. For superiority testing, proportional hazards methods have more power. This is not a small-sample phenomenon but requires a low-event rate and a large noninferiority margin. CONCLUSION Although there are special settings where restricted mean survival time methods have a power advantage over proportional hazards methods for testing noninferiority, the larger issue in these settings is defining appropriate noninferiority margins. We find the restricted mean survival time methods lacking in these regards.
Collapse
|
24
|
Multivariate meta-analysis model for the difference in restricted mean survival times. Biostatistics 2021; 22:82-96. [PMID: 31175828 PMCID: PMC7846118 DOI: 10.1093/biostatistics/kxz018] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 04/26/2019] [Accepted: 04/28/2019] [Indexed: 01/01/2023] Open
Abstract
In randomized controlled trials (RCTs) with time-to-event outcomes, the difference in restricted mean survival times (RMSTD) offers an absolute measure of the treatment effect on the time scale. Computation of the RMSTD relies on the choice of a time horizon, $\tau$. In a meta-analysis, varying follow-up durations may lead to the exclusion of RCTs with follow-up shorter than $\tau$. We introduce an individual patient data multivariate meta-analysis model for RMSTD estimated at multiple time horizons. We derived the within-trial covariance for the RMSTD enabling the synthesis of all data by borrowing strength from multiple time points. In a simulation study covering 60 scenarios, we compared the statistical performance of the proposed method to that of two univariate meta-analysis models, based on available data at each time point and based on predictions from flexible parametric models. Our multivariate model yields smaller mean squared error over univariate methods at all time points. We illustrate the method with a meta-analysis of five RCTs comparing transcatheter aortic valve replacement (TAVR) with surgical replacement in patients with aortic stenosis. Over 12, 24, and 36 months of follow-up, those treated by TAVR live 0.28 [95% confidence interval (CI) 0.01 to 0.56], 0.46 (95% CI $-$0.08 to 1.01), and 0.79 (95% CI $-$0.43 to 2.02) months longer on average compared to those treated by surgery, respectively.
Collapse
|
25
|
Survival analysis using a 5-step stratified testing and amalgamation routine (5-STAR) in randomized clinical trials. Stat Med 2020; 39:4724-4744. [PMID: 32954531 DOI: 10.1002/sim.8750] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 06/25/2020] [Accepted: 08/24/2020] [Indexed: 11/12/2022]
Abstract
Randomized clinical trials are often designed to assess whether a test treatment prolongs survival relative to a control treatment. Increased patient heterogeneity, while desirable for generalizability of results, can weaken the ability of common statistical approaches to detect treatment differences, potentially hampering the regulatory approval of safe and efficacious therapies. A novel solution to this problem is proposed. A list of baseline covariates that have the potential to be prognostic for survival under either treatment is pre-specified in the analysis plan. At the analysis stage, using all observed survival times but blinded to patient-level treatment assignment, "noise" covariates are removed with elastic net Cox regression. The shortened covariate list is used by a conditional inference tree algorithm to segment the heterogeneous trial population into subpopulations of prognostically homogeneous patients (risk strata). After patient-level treatment unblinding, a treatment comparison is done within each formed risk stratum and stratum-level results are combined for overall statistical inference. The impressive power-boosting performance of our proposed 5-step stratified testing and amalgamation routine (5-STAR), relative to that of the logrank test and other common approaches that do not leverage inherently structured patient heterogeneity, is illustrated using a hypothetical and two real datasets along with simulation results. Furthermore, the importance of reporting stratum-level comparative treatment effects (time ratios from accelerated failure time model fits in conjunction with model averaging and, as needed, hazard ratios from Cox proportional hazard model fits) is highlighted as a potential enabler of personalized medicine. An R package is available at https://github.com/rmarceauwest/fiveSTAR.
Collapse
|
26
|
Non-proportional hazards in immuno-oncology: Is an old perspective needed? Pharm Stat 2020; 20:512-527. [PMID: 33350587 DOI: 10.1002/pst.2091] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 09/24/2020] [Accepted: 12/08/2020] [Indexed: 11/11/2022]
Abstract
A fundamental concept in two-arm non-parametric survival analysis is the comparison of observed versus expected numbers of events on one of the treatment arms (the choice of which arm is arbitrary), where the expectation is taken assuming that the true survival curves in the two arms are identical. This concept is at the heart of the counting-process theory that provides a rigorous basis for methods such as the log-rank test. It is natural, therefore, to maintain this perspective when extending the log-rank test to deal with non-proportional hazards, for example, by considering a weighted sum of the "observed - expected" terms, where larger weights are given to time periods where the hazard ratio is expected to favor the experimental treatment. In doing so, however, one may stumble across some rather subtle issues, related to difficulties in the interpretation of hazard ratios, that may lead to strange conclusions. An alternative approach is to view non-parametric survival comparisons as permutation tests. With this perspective, one can easily improve on the efficiency of the log-rank test, while thoroughly controlling the false positive rate. In particular, for the field of immuno-oncology, where researchers often anticipate a delayed treatment effect, sample sizes could be substantially reduced without loss of power.
Collapse
|
27
|
Restricted mean survival time as a function of restriction time. Biometrics 2020; 78:192-201. [PMID: 33616953 DOI: 10.1111/biom.13414] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 08/18/2020] [Accepted: 11/25/2020] [Indexed: 11/26/2022]
Abstract
Restricted mean survival time (RMST) is a clinically interpretable and meaningful survival metric that has gained popularity in recent years. Several methods are available for regression modeling of RMST, most based on pseudo-observations or what is essentially an inverse-weighted complete-case analysis. No existing RMST regression method allows for the covariate effects to be expressed as functions over time. This is a considerable limitation, in light of the many hazard regression methods that do accommodate such effects. To address this void in the literature, we propose RMST methods that permit estimating time-varying effects. In particular, we propose an inference framework for directly modeling RMST as a continuous function of L. Large-sample properties are derived. Simulation studies are performed to evaluate the performance of the methods in finite sample sizes. The proposed framework is applied to kidney transplant data obtained from the Scientific Registry of Transplant Recipients.
Collapse
|
28
|
Restricted mean survival time for interval-censored data. Stat Med 2020; 39:3879-3895. [PMID: 32767503 DOI: 10.1002/sim.8699] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 06/22/2020] [Accepted: 06/28/2020] [Indexed: 11/10/2022]
Abstract
Restricted mean survival time (RMST) evaluates the mean event-free survival time up to a prespecified time point. It has been used as an alternative measure of treatment effect owing to its model-free structure and clinically meaningful interpretation of treatment benefit for right-censored data. In clinical trials, another type of censoring called interval censoring may occur if subjects are examined at several discrete time points and the survival time falls into an interval rather than being exactly observed. The missingness of exact observations under interval-censored cases makes the nonparametric measure of treatment effect more challenging. Employing the linear smoothing technique to overcome the ambiguity, we propose a new model-free measure for the interval-censored RMST. As an alternative to the commonly used log-rank test, we further construct a hypothesis testing procedure to assess the survival difference between two groups. Simulation studies show that the bias of our proposed interval-censored RMST estimator is negligible and the testing procedure delivers promising performance in detecting between-group difference with regard to size and power under various configurations of survival curves. The proposed method is illustrated by reanalyzing two real datasets containing interval-censored observations.
Collapse
|
29
|
Statistical Considerations for Sequential Analysis of the Restricted Mean Survival Time for Randomized Clinical Trials. Stat Biopharm Res 2020; 13:210-218. [PMID: 33927801 DOI: 10.1080/19466315.2020.1816491] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
In this paper, we illustrate the method of designing a group-sequential randomized clinical trial based on the difference in restricted mean survival time (RMST). The procedure is based on theoretical formulations of Murray and Tsiatis (1999). We also present a numerical example in designing a cardiology surgical trial. Various practical considerations are discussed. R codes are provided in the Supplementary Materials. We conclude that the group-sequential design for RMST is a viable option in practice. A simulation study is performed to compare the proposed method to the Max-Combo and conventional log-rank tests. The simulation result shows that when there is a delayed treatment benefit and the proportional hazards assumption is untrue, the sequential design based on the RMST can be more efficient than that based on the log-rank test but less efficient than that based on the Max-Combo test. Compared with Max-Combo test, the RMST-based study design yield coherent estimand, statistical inference and result interpretation.
Collapse
|
30
|
Empirical power comparison of statistical tests in contemporary phase III randomized controlled trials with time-to-event outcomes in oncology. Clin Trials 2020; 17:597-606. [PMID: 32933339 DOI: 10.1177/1740774520940256] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND More than 95% of recent cancer randomized controlled trials used the log-rank test to detect a treatment difference making it the predominant tool for comparing two survival functions. As with other tests, the log-rank test has both advantages and disadvantages. One advantage is that it offers the highest power against proportional hazards differences, which may be a major reason why alternative methods have rarely been employed in practice. The performance of statistical tests has traditionally been investigated both theoretically and numerically for several patterns of difference between two survival functions. However, to the best of our knowledge, there has been no attempt to compare the performance of various statistical tests using empirical data from past oncology randomized controlled trials. So, it is unknown whether the log-rank test offers a meaningful power advantage over alternative testing methods in contemporary cancer randomized controlled trials. Focusing on recently reported phase III cancer randomized controlled trials, we assessed whether the log-rank test gave meaningfully greater power when compared with five alternative testing methods: generalized Wilcoxon, test based on maximum of test statistics from multiple weighted log-rank tests, difference in t-year event rate, and difference in restricted mean survival time with fixed and adaptive τ. METHODS Using manuscripts from cancer randomized controlled trials recently published in high-tier clinical journals, we reconstructed patient-level data for overall survival (69 trials) and progression-free survival (54 trials). For each trial endpoint, we estimated the empirical power of each test. Empirical power was measured as the proportion of trials for which a test would have identified a significant result (p value < .05). RESULTS For overall survival, t-year event rate offered the lowest (30.4%) empirical power and restricted mean survival time with fixed τ offered the highest (43.5%). The empirical power of the other types of tests was almost identical (36.2%-37.7%). For progression-free survival, the tests we investigated offered numerically equivalent empirical power (55.6%-61.1%). No single test consistently outperformed any other test. CONCLUSION The empirical power assessment with the past cancer randomized controlled trials provided new insights on the performance of statistical tests. Although the log-rank test has been used in almost all trials, our study suggests that the log-rank test is not the only option from an empirical power perspective. Near universal use of the log-rank test is not supported by a meaningful difference in empirical power. Clinical trial investigators could consider alternative methods, beyond the log-rank test, for their primary analysis when designing a cancer randomized controlled trial. Factors other than power (e.g. interpretability of the estimated treatment effect) should garner greater consideration when selecting statistical tests for cancer randomized controlled trials.
Collapse
|
31
|
Abstract
In a comparative longitudinal clinical study, multiple clinical events of interest are typically collected in timing and occurrence during the follow-up period. These clinical events are often indicative of disease burden over the study period and provide overall evidence of benefit/risk of one treatment relative to another. While these clinical events are usually used to form a composite endpoint, only the first occurrence of the composite endpoint event is considered in primary efficacy analysis. This type of analysis is commonly performed but it may not be ideal. Most of the existing methods for analyzing multiple event-time data were developed, relying on certain model assumptions. However, the assumptions may greatly affect the inferences for treatment effect. In this paper, we propose a simple, non-parametric estimator of conditional mean survival time for multiple events to quantify treatment effect which has clinically meaningful interpretation. We use simulation studies to evaluate the performance of the new method. Further, we apply this method to analyze the data from a cardiovascular clinical trial as an illustration.
Collapse
|
32
|
Efficient analysis of time-to-event endpoints when the event involves a continuous variable crossing a threshold. J Stat Plan Inference 2020; 208:119-129. [PMID: 32884165 PMCID: PMC7097971 DOI: 10.1016/j.jspi.2020.02.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Revised: 02/15/2020] [Accepted: 02/15/2020] [Indexed: 01/29/2023]
Abstract
In many trials, the duration between patient enrolment and an event occurring is used as the efficacy endpoint. Common endpoints of this type include the time until relapse, progression to the next stage of a disease, or time until remission. The criteria of an event may be defined by multiple components, one or more of which may be a continuous measurement being above or below a threshold. Typical analyses consider all components as binary variables and record the first time at which the patient has an event. This is analysed through constructing and testing survival functions using Kaplan-Meier, parametric models or Cox models. This approach ignores information contained in the continuous components. We propose a method that makes use of this information to improve the precision of analyses using these types of endpoints. We use joint modelling of the continuous and binary components to construct survival curves. We show how to compute confidence intervals for quantities of interest, such as the median or mean event time. We assess the properties of the proposed method using simulations and data from a phase II cancer trial and an observational study in renal disease.
Collapse
|
33
|
Statistical Issues and Lessons Learned From COVID-19 Clinical Trials With Lopinavir-Ritonavir and Remdesivir. JMIR Public Health Surveill 2020; 6:e19538. [PMID: 32589146 PMCID: PMC7357691 DOI: 10.2196/19538] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 06/18/2020] [Accepted: 06/25/2020] [Indexed: 12/25/2022] Open
Abstract
Background Recently, three randomized clinical trials on coronavirus disease (COVID-19) treatments were completed: one for lopinavir-ritonavir and two for remdesivir. One trial reported that remdesivir was superior to placebo in shortening the time to recovery, while the other two showed no benefit of the treatment under investigation. Objective The aim of this paper is to, from a statistical perspective, identify several key issues in the design and analysis of three COVID-19 trials and reanalyze the data from the cumulative incidence curves in the three trials using more appropriate statistical methods. Methods The lopinavir-ritonavir trial enrolled 39 additional patients due to insignificant results after the sample size reached the planned number, which led to inflation of the type I error rate. The remdesivir trial of Wang et al failed to reach the planned sample size due to a lack of eligible patients, and the bootstrap method was used to predict the quantity of clinical interest conditionally and unconditionally if the trial had continued to reach the originally planned sample size. Moreover, we used a terminal (or cure) rate model and a model-free metric known as the restricted mean survival time or the restricted mean time to improvement (RMTI) to analyze the reconstructed data. The remdesivir trial of Beigel et al reported the median recovery time of the remdesivir and placebo groups, and the rate ratio for recovery, while both quantities depend on a particular time point representing local information. We use the restricted mean time to recovery (RMTR) as a global and robust measure for efficacy. Results For the lopinavir-ritonavir trial, with the increase of sample size from 160 to 199, the type I error rate was inflated from 0.05 to 0.071. The difference of RMTIs between the two groups evaluated at day 28 was –1.67 days (95% CI –3.62 to 0.28; P=.09) in favor of lopinavir-ritonavir but not statistically significant. For the remdesivir trial of Wang et al, the difference of RMTIs at day 28 was –0.89 days (95% CI –2.84 to 1.06; P=.37). The planned sample size was 453, yet only 236 patients were enrolled. The conditional prediction shows that the hazard ratio estimates would reach statistical significance if the target sample size had been maintained. For the remdesivir trial of Beigel et al, the difference of RMTRs between the remdesivir and placebo groups at day 30 was –2.7 days (95% CI –4.0 to –1.2; P<.001), confirming the superiority of remdesivir. The difference in the recovery time at the 25th percentile (95% CI –3 to 0; P=.65) was insignificant, while the differences became more statistically significant at larger percentiles. Conclusions Based on the statistical issues and lessons learned from the recent three clinical trials on COVID-19 treatments, we suggest more appropriate approaches for the design and analysis of ongoing and future COVID-19 trials.
Collapse
|
34
|
Abstract
BACKGROUND In randomized clinical trials with censored time-to-event outcomes, the logrank test is known to have substantial statistical power under the proportional hazards assumption and is widely adopted as a tool to compare two survival distributions. However, the proportional hazards assumption is impossible to validate in practice until the data are unblinded. However, the statistical analysis plan of a randomized clinical trial and in particular its primary analysis method must be pre-specified before any unblinded information may be reviewed. PURPOSE The purpose of this article is to guide applied biostatisticians in the prespecification of a desired primary analysis method when a treatment effect with nonproportional hazards is anticipated. While articles proposing alternate statistical tests are aplenty, to the best of our knowledge, there is no article available that attempts to simplify the choice and prespecification of a primary statistical test under specific expected patterns on nonproportional hazards. We provide such guidance by reviewing various tests proposed as more powerful alternatives to the standard logrank test under nonproportional hazards and simultaneously comparing their performance under a wide variety of nonproportional hazards scenarios to elucidate their advantages and disadvantages. METHOD In order to select the most preferable test for detecting specific differences between survival distributions of interest while controlling false positive rates, we review and assess the performance of weighted and adaptively weighted logrank tests, weighted and adaptively weighted Kaplan-Meier tests and versatile tests under various patterns of nonproportional hazards treatment effects through simulation. CONCLUSION We validate some of the claimed properties of the proposed extensions and identify tests that may be more preferable under specific expected pattern of nonproportional hazards when such knowledge is available. We show that versatile tests, while achieving robustness to departures from proportional hazards, may lose interpretation of directionality (superiority or inferiority) and can only be seen to test departures from equality. Detailed summary and discussion of the performance of each test in terms of type I error rate and power are provided to formulate specific guidance about their applicability and use.
Collapse
|
35
|
On permutation tests for comparing restricted mean survival time with small sample from randomized trials. Stat Med 2020; 39:2655-2670. [PMID: 32432805 DOI: 10.1002/sim.8565] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 02/25/2020] [Accepted: 04/12/2020] [Indexed: 12/15/2022]
Abstract
Between-group comparison based on the restricted mean survival time (RMST) is getting attention as an alternative to the conventional logrank/hazard ratio approach for time-to-event outcomes in randomized controlled trials (RCTs). The validity of the commonly used nonparametric inference procedure for RMST has been well supported by large sample theories. However, we sometimes encounter cases with a small sample size in practice, where we cannot rely on the large sample properties. Generally, the permutation approach can be useful to handle these situations in RCTs. However, a numerical issue arises when implementing permutation tests for difference or ratio of RMST from two groups. In this article, we discuss the numerical issue and consider six permutation methods for comparing survival time distributions between two groups using RMST in RCTs setting. We conducted extensive numerical studies and assessed type I error rates of these methods. Our numerical studies demonstrated that the inflation of the type I error rate of the asymptotic methods is not negligible when sample size is small, and that all of the six permutation methods are workable solutions. Although some permutation methods became a little conservative, no remarkable inflation of the type I error rates were observed. We recommend using permutation tests instead of the asymptotic tests, especially when the sample size is less than 50 per arm.
Collapse
|
36
|
Novel Risk Modeling Approach of Atrial Fibrillation With Restricted Mean Survival Times: Application in the Framingham Heart Study Community-Based Cohort. Circ Cardiovasc Qual Outcomes 2020; 13:e005918. [PMID: 32228064 PMCID: PMC7176529 DOI: 10.1161/circoutcomes.119.005918] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Risk prediction models for atrial fibrillation (AF) do not give information about when AF might develop. Restricted mean survival time (RMST) quantifies risk into the time domain. Our objective was to use RMST to re-express individualized AF risk predictions. METHODS AND RESULTS We included AF-free participants from the Framingham Heart Study community-based cohorts. We predicted new-onset AF over 10-year follow-up according to baseline covariates: age, height, weight, systolic blood pressure, diastolic blood pressure, current smoking, antihypertensive treatment, diabetes mellitus, prevalent heart failure, and prevalent myocardial infarction. First, we fitted a Cox regression model and estimated the 10-year predicted risk of AF. Second, we fitted an RMST model and estimated the predicted mean time free of AF and alive over a time horizon of 10 years. We included 7586 AF-free participants contributing to 11 088 examinations (mean age 61±11 years, 44% were men). During 10-year follow-up, 822 participants developed AF. The Cox and RMST models were in agreement regarding the direction, strength, and statistical significance of associations for all covariates. Low (<5%), intermediate (5%-15%), and high (>15%) 10-year predicted risk of AF corresponded to predicted mean time alive and free of AF of 9.9, 9.6, and 8.8 years, respectively. A 60-year-old woman with a body mass index of 25 kg/m2, no use of hypertension treatment and no history of heart failure had a predicted mean time alive and free of AF of 9.9 years, whereas a 70-year-old man with a body mass index of 30 kg/m2, use of hypertension treatment, and with prevalent heart failure had a predicted mean time alive and free of AF of 7.9 years. CONCLUSIONS The RMST can be used to develop risk prediction models to express results in a time scale. RMST may offer a complementary risk communication tool for AF in clinical practice.
Collapse
|
37
|
On the empirical choice of the time window for restricted mean survival time. Biometrics 2020; 76:1157-1166. [PMID: 32061098 DOI: 10.1111/biom.13237] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2018] [Revised: 01/24/2020] [Accepted: 01/27/2020] [Indexed: 11/27/2022]
Abstract
The t-year mean survival or restricted mean survival time (RMST) has been used as an appealing summary of the survival distribution within a time window [0, t]. RMST is the patient's life expectancy until time t and can be estimated nonparametrically by the area under the Kaplan-Meier curve up to t. In a comparative study, the difference or ratio of two RMSTs has been utilized to quantify the between-group-difference as a clinically interpretable alternative summary to the hazard ratio. The choice of the time window [0, t] may be prespecified at the design stage of the study based on clinical considerations. On the other hand, after the survival data have been collected, the choice of time point t could be data-dependent. The standard inferential procedures for the corresponding RMST, which is also data-dependent, ignore this subtle yet important issue. In this paper, we clarify how to make inference about a random "parameter." Moreover, we demonstrate that under a rather mild condition on the censoring distribution, one can make inference about the RMST up to t, where t is less than or even equal to the largest follow-up time (either observed or censored) in the study. This finding reduces the subjectivity of the choice of t empirically. The proposal is illustrated with the survival data from a primary biliary cirrhosis study, and its finite sample properties are investigated via an extensive simulation study.
Collapse
|
38
|
Restricted mean survival time as a summary measure of time-to-event outcome. Pharm Stat 2020; 19:436-453. [PMID: 32072769 DOI: 10.1002/pst.2004] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 01/19/2020] [Accepted: 01/21/2020] [Indexed: 01/13/2023]
Abstract
Many clinical research studies evaluate a time-to-event outcome, illustrate survival functions, and conventionally report estimated hazard ratios to express the magnitude of the treatment effect when comparing between groups. However, it may not be straightforward to interpret the hazard ratio clinically and statistically when the proportional hazards assumption is invalid. In some recent papers published in clinical journals, the use of restricted mean survival time (RMST) or τ-year mean survival time is discussed as one of the alternative summary measures for the time-to-event outcome. The RMST is defined as the expected value of time to event limited to a specific time point corresponding to the area under the survival curve up to the specific time point. This article summarizes the necessary information to conduct statistical analysis using the RMST, including the definition and statistical properties of the RMST, adjusted analysis methods, sample size calculation, information fraction for the RMST difference, and clinical and statistical meaning and interpretation. Additionally, we discuss how to set the specific time point to define the RMST from two main points of view. We also provide developed SAS codes to determine the sample size required to detect an expected RMST difference with appropriate power and reconstruct individual survival data to estimate an RMST reference value from a reported survival curve.
Collapse
|
39
|
Designing clinical trials with (restricted) mean survival time endpoint: Practical considerations. Clin Trials 2020; 17:285-294. [PMID: 32063031 DOI: 10.1177/1740774520905563] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND/AIMS The difference in mean survival time, which quantifies the treatment effect in terms most meaningful to patients and retains its interpretability regardless of the shape of the survival distribution or the proportionality of the treatment effect, is an alternative endpoint that could be used more often as the primary endpoint to design clinical trials. The underuse of this endpoint is due to investigators' lack of familiarity with the test comparing the mean survival times and the lack of tools to facilitate trial design with this endpoint. The aim of this article is to provide investigators with insights and software to design trials with restricted mean survival time as the primary endpoint. METHODS A closed-form formula for the asymptotic power of the test of restricted mean survival time difference is presented. The effects of design parameters on power were evaluated for the mean survival time test and log-rank test. An R package which calculates the power or the sample size for user-specified parameter values and provides power plots for each design parameter is provided. The R package also calculates the probability that the restricted mean survival time is estimable for user-defined trial designs. RESULTS Under proportional hazards and late differences in survival, the power of the mean survival time test can approach that of the log-rank test if the restriction time is late. Under early differences, the power of the restricted mean survival time test is higher than that of the log-rank test. Duration of accrual and follow-up have little influence on the power of the restricted mean survival time test. The choice of restriction time, on the other hand, has a large impact on power. Because the power depends on the interplay among the design factors, plotting the relationship between each design parameter and power allows the users to select the designs most appropriate for their trial. When modification is necessary to ensure the difference in restricted mean survival time is estimable, the three available modifications all perform adequately in the scenarios studied. CONCLUSION The restricted mean survival time is a survival endpoint that is meaningful to investigators and to patients and at the same time requires less restrictive assumptions. The biggest challenge with this endpoint is selection of the restriction time. We recommend selecting a restriction time that is clinically relevant to the disease and the clinical setting of the trial of interest. The practical considerations and the R package provided in this work are readily available tools that researchers can use to design trials with restricted mean survival time as the primary endpoint.
Collapse
|
40
|
|
41
|
On null hypotheses in survival analysis. Biometrics 2019; 75:1276-1287. [DOI: 10.1111/biom.13102] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 06/12/2019] [Indexed: 11/29/2022]
|
42
|
An Alternative Approach for the Analysis of Time-to-Event and Survival Outcomes in Pulmonary Medicine. Am J Respir Crit Care Med 2019; 198:684-687. [PMID: 29701996 DOI: 10.1164/rccm.201801-0189le] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
43
|
Adding a new analytical procedure with clinical interpretation in the tool box of survival analysis. Ann Oncol 2019; 29:1092-1094. [PMID: 29617717 DOI: 10.1093/annonc/mdy109] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
44
|
Modestly weighted logrank tests. Stat Med 2019; 38:3782-3790. [DOI: 10.1002/sim.8186] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 02/04/2019] [Accepted: 04/10/2019] [Indexed: 11/09/2022]
|
45
|
Difference in Restricted Mean Survival Time: Small Sample Distribution and Asymptotic Relative Efficiency. Stat Biopharm Res 2019. [DOI: 10.1080/19466315.2018.1527249] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
46
|
Risk of death, relapse or progression, and loss of life expectancy at different progression-free survival milestones in primary central nervous system lymphoma. Leuk Lymphoma 2019; 60:2516-2523. [DOI: 10.1080/10428194.2019.1594219] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
47
|
Abstract
BACKGROUND The logrank test and the Cox proportional hazards model are routinely applied in the design and analysis of randomised controlled trials (RCTs) with time-to-event outcomes. Usually, sample size and power calculations assume proportional hazards (PH) of the treatment effect, i.e. the hazard ratio is constant over the entire follow-up period. If the PH assumption fails, the power of the logrank/Cox test may be reduced, sometimes severely. It is, therefore, important to understand how serious this can become in real trials, and for a proven, alternative test to be available to increase the robustness of the primary test. METHODS We performed a systematic search to identify relevant articles in four leading medical journals that publish results of phase 3 clinical trials. Altogether, 50 articles satisfied our inclusion criteria. We digitised published Kaplan-Meier curves and created approximations to the original times to event or censoring at the individual patient level. Using the reconstructed data, we tested for non-PH in all 50 trials. We compared the results from the logrank/Cox test with those from the combined test recently proposed by Royston and Parmar. RESULTS The PH assumption was checked and reported only in 28% of the studies. Evidence of non-PH at the 0.10 level was detected in 31% of comparisons. The Cox test of the treatment effect was significant at the 0.05 level in 49% of comparisons, and the combined test in 55%. In four of five trials with discordant results, the interpretation would have changed had the combined test been used. The degree of non-PH and the dominance of the p value for the combined test were strongly associated. Graphical investigation suggested that non-PH was mostly due to a treatment effect manifesting in an early follow-up and disappearing later. CONCLUSIONS The evidence for non-PH is checked (and, hence, identified) in only a small minority of RCTs, but non-PH may be present in a substantial fraction of such trials. In our reanalysis of the reconstructed data from 50 trials, the combined test outperformed the Cox test overall. The combined test is a promising approach to making trial design and analysis more robust.
Collapse
|
48
|
Summarizing and communicating on survival data according to the audience: a tutorial on different measures illustrated with population-based cancer registry data. Clin Epidemiol 2019; 11:53-65. [PMID: 30655705 PMCID: PMC6322561 DOI: 10.2147/clep.s173523] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Survival data analysis results are usually communicated through the overall survival probability. Alternative measures provide additional insights and may help in communicating the results to a wider audience. We describe these alternative measures in two data settings, the overall survival setting and the relative survival setting, the latter corresponding to the particular competing risk setting in which the cause of death is unavailable or unreliable. In the overall survival setting, we describe the overall survival probability, the conditional survival probability and the restricted mean survival time (restricted to a prespecified time window). In the relative survival setting, we describe the net survival probability, the conditional net survival probability, the restricted mean net survival time, the crude probability of death due to each cause and the number of life years lost due to each cause over a prespecified time window. These measures describe survival data either on a probability scale or on a timescale. The clinical or population health purpose of each measure is detailed, and their advantages and drawbacks are discussed. We then illustrate their use analyzing England population-based registry data of men 15-80 years old diagnosed with colon cancer in 2001-2003, aiming to describe the deprivation disparities in survival. We believe that both the provision of a detailed example of the interpretation of each measure and the software implementation will help in generalizing their use.
Collapse
|
49
|
Limitations of hazard ratios in clinical trials. Eur Heart J 2018; 40:1378-1383. [DOI: 10.1093/eurheartj/ehy770] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 08/27/2018] [Accepted: 10/26/2018] [Indexed: 01/21/2023] Open
|
50
|
How Do the Accrual Pattern and Follow-Up Duration Affect the Hazard Ratio Estimate When the Proportional Hazards Assumption Is Violated? Oncologist 2018; 24:867-871. [PMID: 30201741 DOI: 10.1634/theoncologist.2018-0141] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 06/25/2018] [Indexed: 11/17/2022] Open
Abstract
In randomized clinical trials, the magnitude of the treatment effect is often reported using the hazard ratio (HR) even when the proportional hazards (PH) assumption is not met. Conducting numerical studies, this commentary illustrates how/why the HR estimate via the standard Cox's procedure is difficult to interpret even as an “average” treatment effect for non‐PH cases.
Collapse
|