1
|
Smith A, Lambert PC, Rutherford MJ. Generating high-fidelity synthetic time-to-event datasets to improve data transparency and accessibility. BMC Med Res Methodol 2022; 22:176. [PMID: 35739465 PMCID: PMC9229142 DOI: 10.1186/s12874-022-01654-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 06/06/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A lack of available data and statistical code being published alongside journal articles provides a significant barrier to open scientific discourse, and reproducibility of research. Information governance restrictions inhibit the active dissemination of individual level data to accompany published manuscripts. Realistic, high-fidelity time-to-event synthetic data can aid in the acceleration of methodological developments in survival analysis and beyond by enabling researchers to access and test published methods using data similar to that which they were developed on. METHODS We present methods to accurately emulate the covariate patterns and survival times found in real-world datasets using synthetic data techniques, without compromising patient privacy. We model the joint covariate distribution of the original data using covariate specific sequential conditional regression models, then fit a complex flexible parametric survival model from which to generate survival times conditional on individual covariate patterns. We recreate the administrative censoring mechanism using the last observed follow-up date information from the initial dataset. Metrics for evaluating the accuracy of the synthetic data, and the non-identifiability of individuals from the original dataset, are presented. RESULTS We successfully create a synthetic version of an example colon cancer dataset consisting of 9064 patients which aims to show good similarity to both covariate distributions and survival times from the original data, without containing any exact information from the original data, therefore allowing them to be published openly alongside research. CONCLUSIONS We evaluate the effectiveness of the methods for constructing synthetic data, as well as providing evidence that there is minimal risk that a given patient from the original data could be identified from their individual unique patient information. Synthetic datasets using this methodology could be made available alongside published research without breaching data privacy protocols, and allow for data and code to be made available alongside methodological or applied manuscripts to greatly improve the transparency and accessibility of medical research.
Collapse
Affiliation(s)
- Aiden Smith
- Department of Health Sciences, Centre for Medicine, University of Leicester, University Road, Leicester, LE1 7RH, UK.
| | - Paul C Lambert
- Department of Health Sciences, Centre for Medicine, University of Leicester, University Road, Leicester, LE1 7RH, UK
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Mark J Rutherford
- Department of Health Sciences, Centre for Medicine, University of Leicester, University Road, Leicester, LE1 7RH, UK
| |
Collapse
|
2
|
Su W, He B, Zhang YD, Yin G. C-index regression for recurrent event data. Contemp Clin Trials 2022; 118:106787. [PMID: 35568377 DOI: 10.1016/j.cct.2022.106787] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 05/02/2022] [Accepted: 05/03/2022] [Indexed: 11/17/2022]
Abstract
Recurrent event data analysis plays an important role in many fields, e.g., medicine, social science, and economics. While the existing approaches under the proportional rates or mean model yield poor performance when the underlying model is misspecified, we propose a novel model-free approach by introducing a lower bound on the concordance index (C-Index). We develop an estimation method through deriving a continuous lower bound on the C-Index based on the log-sigmoid function and also provide a variable selection procedure in high dimensional settings. Under both low and high dimensional settings, simulation results show that the proposed methods outperform the gamma frailty recurrent event model when the proportional mean assumption is violated. Moreover, an application to the hospital readmission dataset shows results in line with previous studies and a higher C-Index value further assures model decency.
Collapse
Affiliation(s)
- Wen Su
- Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong
| | - Baihua He
- Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong
| | - Yan Dora Zhang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong
| | - Guosheng Yin
- Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong.
| |
Collapse
|
3
|
Zhang F, Huang X, Fan C. Prediction accuracy measures for time-to-event models with left-truncated and right-censored data. J STAT COMPUT SIM 2021. [DOI: 10.1080/00949655.2021.1908285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Feipeng Zhang
- School of Economics and Finance, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | - Xiaoyan Huang
- School of Mathematics and Statistics, Hunan Normal University, Changsha, People's Republic of China
| | - Caiyun Fan
- School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai, People's Republic of China
| |
Collapse
|
4
|
Shi B, Wei P, Huang X. Functional principal component based landmark analysis for the effects of longitudinal cholesterol profiles on the risk of coronary heart disease. Stat Med 2020; 40:650-667. [PMID: 33155338 DOI: 10.1002/sim.8794] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 07/04/2020] [Accepted: 10/10/2020] [Indexed: 12/19/2022]
Abstract
Patients' longitudinal biomarker changing patterns are crucial factors for their disease progression. In this research, we apply functional principal component analysis techniques to extract these changing patterns and use them as predictors in landmark models for dynamic prediction. The time-varying effects of risk factors along a sequence of landmark times are smoothed by a supermodel to borrow information from neighbor time intervals. This results in more stable estimation and more clear demonstration of the time-varying effects. Compared with the traditional landmark analysis, simulation studies show our proposed approach results in lower prediction error rates and higher area under receiver operating characteristic curve (AUC) values, which indicate better ability to discriminate between subjects with different risk levels. We apply our method to data from the Framingham Heart Study, using longitudinal total cholesterol (TC) levels to predict future coronary heart disease (CHD) risk profiles. Our approach not only obtains the overall trend of biomarker-related risk profiles, but also reveals different risk patterns that are not available from the traditional landmark analyses. Our results show that high cholesterol levels during young ages are more harmful than those in old ages. This demonstrates the importance of analyzing the age-dependent effects of TC on CHD risk.
Collapse
Affiliation(s)
- Bin Shi
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA.,Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Xuelin Huang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| |
Collapse
|
5
|
Association of genetic and behavioral characteristics with the onset of diabetes. BMC Public Health 2019; 19:1297. [PMID: 31615468 PMCID: PMC6794810 DOI: 10.1186/s12889-019-7618-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 09/13/2019] [Indexed: 01/15/2023] Open
Abstract
Background Prior work has established sociodemographic, lifestyle, and behavioral risk factors for diabetes but the contribution of these factors to the onset of diabetes remains unclear when accounting for genetic propensity for diabetes. We examined the contribution of a diabetes polygenic score (PGS) to the onset of diabetes in the context of modifiable known risk factors for diabetes. Methods Our sample consisted of 15,190 respondents in the United States-based Health and Retirement Study, a longitudinal study with up to 22 years of follow-up. We performed multivariate Cox regression models stratified by race (non-Hispanic white and non-Hispanic black) with time-varying covariates. Results We observed 4217 (27.76%) cases of incident diabetes over the survey period. The diabetes PGS was statistically significantly associated with diabetes onset for both non-Hispanic whites (hazard ratio [HR] = 1.38, 95% confidence interval [CI] = 1.30, 1.46) and non-Hispanic blacks (HR = 1.22, 95% CI = 1.06, 1.40) after adjusting for a range of known risk factors for diabetes, highlighting the critical role genetic endowment might play. Nevertheless, genetics do not downplay the role that modifiable characteristics could still play in diabetes management; even with the inclusion of the diabetes PGS, several behavioral and lifestyle characteristics remained significant for both race groups. Conclusions The effects of genetic and lifestyle characteristics should be taken into consideration for both future studies and diabetes management.
Collapse
|
6
|
Li G, Wang X. Prediction Accuracy Measures for a Nonlinear Model and for Right-Censored Time-to-Event Data. J Am Stat Assoc 2019; 114:1815-1825. [PMID: 32863480 DOI: 10.1080/01621459.2018.1515079] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
This article develops a pair of new prediction summary measures for a nonlinear prediction function with right-censored time-to-event data. The first measure, defined as the proportion of explained variance by a linearly corrected prediction function, quantifies the potential predictive power of the nonlinear prediction function. The second measure, defined as the proportion of explained prediction error by its corrected prediction function, gauges the closeness of the prediction function to its corrected version and serves as a supplementary measure to indicate (by a value less than 1) whether the correction is needed to fulfill its potential predictive power and quantify how much prediction error reduction can be realized with the correction. The two measures together provide a complete summary of the predictive accuracy of the nonlinear prediction function. We motivate these measures by first establishing a variance decomposition and a prediction error decomposition at the population level and then deriving uncensored and censored sample versions of these decompositions. We note that for the least square prediction function under the linear model with no censoring, the first measure reduces to the classical coefficient of determination and the second measure degenerates to 1. We show that the sample measures are consistent estimators of their population counterparts and conduct extensive simulations to investigate their finite sample properties. A real data illustration is provided using the PBC data. Supplementary materials for this article are available online. An R package PAmeasures has been developed and made available via the CRAN R library. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Gang Li
- Departments of Biostatistics and Biomathematics, University of California, Los Angeles, CA
| | - Xiaoyan Wang
- Division of General Internal Medicine and Health Services Research, University of California, Los Angeles, CA
| |
Collapse
|
7
|
Maringe C, Pohar Perme M, Stare J, Rachet B. Explained variation of excess hazard models. Stat Med 2018; 37:2284-2300. [PMID: 29633343 PMCID: PMC6001643 DOI: 10.1002/sim.7645] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Revised: 01/30/2018] [Accepted: 01/31/2018] [Indexed: 12/15/2022]
Abstract
The availability of longstanding collection of detailed cancer patient information makes multivariable modelling of cancer-specific hazard of death appealing. We propose to report variation in survival explained by each variable that constitutes these models. We adapted the ranks explained (RE) measure to the relative survival data setting, ie, when competing risks of death are accounted for through life tables from the general population. RE is calculated at each event time. We introduce weights for each death reflecting its probability to be a cancer death. RE varies between -1 and +1 and can be reported at given times in the follow-up and as a time-varying measure from diagnosis onward. We present an application for patients diagnosed with colon or lung cancer in England. The RE measure shows reasonable properties and is comparable in both relative and cause-specific settings. One year after diagnosis, RE for the most complex excess hazard models reaches 0.56, 95% CI: 0.54 to 0.58 (0.58 95% CI: 0.56-0.60) and 0.69, 95% CI: 0.68 to 0.70 (0.67, 95% CI: 0.66-0.69) for lung and colon cancer men (women), respectively. Stage at diagnosis accounts for 12.4% (10.8%) of the overall variation in survival among lung cancer patients whereas it carries 61.8% (53.5%) of the survival variation in colon cancer patients. Variables other than performance status for lung cancer (10%) contribute very little to the overall explained variation. The proportion of the variation in survival explained by key prognostic factors is a crucial information toward understanding the mechanisms underpinning cancer survival. The time-varying RE provides insights into patterns of influence for strong predictors.
Collapse
Affiliation(s)
- Camille Maringe
- Cancer Survival GroupLondon School of Hygiene and Tropical MedicineKeppel StreetLondonWC1E 7HTUK
| | - Maja Pohar Perme
- Department of Biostatistics and Medical InformaticsUniversity of LlubljanaVrazov trg 2SI‐1000LjubljanaSlovenia
| | - Janez Stare
- Department of Biostatistics and Medical InformaticsUniversity of LlubljanaVrazov trg 2SI‐1000LjubljanaSlovenia
| | - Bernard Rachet
- Cancer Survival GroupLondon School of Hygiene and Tropical MedicineKeppel StreetLondonWC1E 7HTUK
| |
Collapse
|
8
|
Fournier MC, Dantan E, Blanche P. An R2-curve for evaluating the accuracy of dynamic predictions. Stat Med 2017; 37:1125-1133. [DOI: 10.1002/sim.7571] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Revised: 11/04/2017] [Accepted: 11/06/2017] [Indexed: 11/08/2022]
Affiliation(s)
- Marie-Cécile Fournier
- INSERM UMR 1246-SPHERE; Nantes University, Tours University; Nantes France
- ITUN Institut de Transplantation Urologie Néphrologie INSERM UMR 1064; Nantes France
| | - Etienne Dantan
- INSERM UMR 1246-SPHERE; Nantes University, Tours University; Nantes France
| | - Paul Blanche
- LMBA; Université de Bretagne Sud; Vannes Brittany France
| |
Collapse
|
9
|
van Klaveren D, Gönen M, Steyerberg EW, Vergouwe Y. A new concordance measure for risk prediction models in external validation settings. Stat Med 2016; 35:4136-52. [PMID: 27251001 DOI: 10.1002/sim.6997] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 04/29/2016] [Accepted: 04/29/2016] [Indexed: 11/11/2022]
Abstract
Concordance measures are frequently used for assessing the discriminative ability of risk prediction models. The interpretation of estimated concordance at external validation is difficult if the case-mix differs from the model development setting. We aimed to develop a concordance measure that provides insight into the influence of case-mix heterogeneity and is robust to censoring of time-to-event data. We first derived a model-based concordance (mbc) measure that allows for quantification of the influence of case-mix heterogeneity on discriminative ability of proportional hazards and logistic regression models. This mbc can also be calculated including a regression slope that calibrates the predictions at external validation (c-mbc), hence assessing the influence of overall regression coefficient validity on discriminative ability. We derived variance formulas for both mbc and c-mbc. We compared the mbc and the c-mbc with commonly used concordance measures in a simulation study and in two external validation settings. The mbc was asymptotically equivalent to a previously proposed resampling-based case-mix corrected c-index. The c-mbc remained stable at the true value with increasing proportions of censoring, while Harrell's c-index and to a lesser extent Uno's concordance measure increased unfavorably. Variance estimates of mbc and c-mbc were well in agreement with the simulated empirical variances. We conclude that the mbc is an attractive closed-form measure that allows for a straightforward quantification of the expected change in a model's discriminative ability due to case-mix heterogeneity. The c-mbc also reflects regression coefficient validity and is a censoring-robust alternative for the c-index when the proportional hazards assumption holds. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- David van Klaveren
- Department of Public Health, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Mithat Gönen
- Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, U.S.A
| | - Ewout W Steyerberg
- Department of Public Health, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Yvonne Vergouwe
- Department of Public Health, Erasmus University Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
10
|
Alotaibi R, Fiaccone R, Henderson R, Stare J. Explained variation for recurrent event data. Biom J 2015; 57:571-91. [PMID: 25899247 DOI: 10.1002/bimj.201300143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Revised: 02/06/2015] [Accepted: 02/14/2015] [Indexed: 11/07/2022]
Abstract
Although there are many suggested measures of explained variation for single-event survival data, there has been little attention to explained variation for recurrent event data. We describe an existing rank-based measure and we investigate a new statistic based on observed and expected event count processes. Both methods can be used for all models. Adjustments for missing data are proposed and demonstrated through simulation to be effective. We compare the population values of the two statistics and illustrate their use in comparing an array of non-nested models for data on recurrent episodes of infant diarrhoea.
Collapse
Affiliation(s)
- Refah Alotaibi
- Princess Norah Bint Abdulrahman University, Riyadh 11635, Saudi Arabia
| | - Rosemeire Fiaccone
- Statistics Department, Federal University of Bahia, Salvador, Bahia 40170-110, Brazil
| | - Robin Henderson
- School of Mathematics & Statistics, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
| | - Janez Stare
- Institute for Biostatistics and Medical Informatics, Faculty of Medicine, University of Ljubljana, Ljubljana 1000, Slovenia
| |
Collapse
|
11
|
Elze MC, Ciocarlie O, Heinze A, Kloess S, Gardlowski T, Esser R, Klingebiel T, Bader P, Huenecke S, Serban M, Köhl U, Hutton JL. Dendritic cell reconstitution is associated with relapse-free survival and acute GVHD severity in children after allogeneic stem cell transplantation. Bone Marrow Transplant 2014; 50:266-73. [PMID: 25387093 DOI: 10.1038/bmt.2014.257] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Revised: 09/25/2014] [Accepted: 09/25/2014] [Indexed: 12/19/2022]
Abstract
DCs are potent APCs and key regulators of innate and adaptive immunity. After allo-SCT, their reconstitution in the peripheral blood (PB) to levels similar to those in healthy individuals tends to be slow. We investigate the age- and sex-dependant immune reconstitution of myeloid (mDC) and plasmacytoid DC (pDC) in the PB of 45 children with leukaemia or myelodysplastic syndrome (aged 1-17 years, median 10) after allo-SCT with regard to relapse, acute GVHD (aGVHD) and relapse-free survival. Low pDC/μL PB up to day 60 post SCT are associated with higher incidence of moderate or severe aGVHD (P=0.035), whereas high pDC/μL PB up to day 60 are associated with higher risk of relapse (P<0.001). The time-trend of DCs/μL PB for days 0-200 is a significant predictor of relapse-free survival for both mDCs (P<0.001) and pDCs (P=0.020). Jointly modelling DC reconstitution and complications improves on these simple criteria. Compared with BM, PBSC transplants tend to show slower mDC/pDC reconstitution (P=0.001, 0.031, respectively), but have no direct effect on relapse-free survival. These results suggest an important role for both mDCs and pDCs in the reconstituting immune system. The inclusion of mDCs and pDCs may improve existing models for complication prediction following allo-SCT.
Collapse
Affiliation(s)
- M C Elze
- Department of Statistics, University of Warwick, Coventry, UK
| | - O Ciocarlie
- 1] Institute of Cellular Therapeutics, Integrated Research and Treatment Center Transplantation, Hannover Medical School, Hannover, Germany [2] Paediatrics Department, Victor Babes University of Medicine and Pharmacy, Timisoara, Romania
| | - A Heinze
- Pediatrics Department, University Hospital Frankfurt, Goethe University, Frankfurt am Main, Germany
| | - S Kloess
- Institute of Cellular Therapeutics, Integrated Research and Treatment Center Transplantation, Hannover Medical School, Hannover, Germany
| | - T Gardlowski
- Institute of Cellular Therapeutics, Integrated Research and Treatment Center Transplantation, Hannover Medical School, Hannover, Germany
| | - R Esser
- Institute of Cellular Therapeutics, Integrated Research and Treatment Center Transplantation, Hannover Medical School, Hannover, Germany
| | - T Klingebiel
- Pediatrics Department, University Hospital Frankfurt, Goethe University, Frankfurt am Main, Germany
| | - P Bader
- Pediatrics Department, University Hospital Frankfurt, Goethe University, Frankfurt am Main, Germany
| | - S Huenecke
- Pediatrics Department, University Hospital Frankfurt, Goethe University, Frankfurt am Main, Germany
| | - M Serban
- Paediatrics Department, Victor Babes University of Medicine and Pharmacy, Timisoara, Romania
| | - U Köhl
- Institute of Cellular Therapeutics, Integrated Research and Treatment Center Transplantation, Hannover Medical School, Hannover, Germany
| | - J L Hutton
- Department of Statistics, University of Warwick, Coventry, UK
| |
Collapse
|
12
|
Mauguen A, Collette S, Pignon JP, Rondeau V. Concordance measures in shared frailty models: application to clustered data in cancer prognosis. Stat Med 2013; 32:4803-20. [PMID: 23729305 DOI: 10.1002/sim.5852] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2012] [Accepted: 04/24/2013] [Indexed: 11/07/2022]
Abstract
Frailty models are gaining interest in prognostic studies, especially because of the spread of multicenter studies. However, little research has been performed to extend prognostic tools to frailty models, including discrimination measures. As previously performed for the Harrell's c-index, we extended two different discrimination measures (the model-based concordance probability estimation of Gönen and Heller and the nonparametric Uno's c-index) to take into account cluster membership. We calculate measures at three levels: between-group, where only patients with different frailties are compared, within-group, where only patients sharing the same frailty are compared, and overall. We performed simulations to study the impact of group size and the number of groups on these measures. Results showed that the two measures can be extended to frailty models while remaining independent from censoring distribution, provided that the group size is sufficient. We apply the extended measures to two real datasets, a meta-analysis and a large multicenter trial.
Collapse
Affiliation(s)
- Audrey Mauguen
- Univ. Bordeaux ISPED, Centre INSERM U897-Epidémiologie-Biostatistique, F-33000 Bordeaux, France; INSERM, ISPED, Centre INSERM U897-Epidémiologie-Biostatistique, F-33000 Bordeaux, France
| | | | | | | |
Collapse
|
13
|
Lin Y, Chappell R, Gönen M. A systematic selection method for the development of cancer staging systems. Stat Methods Med Res 2013; 25:1438-51. [PMID: 23698866 DOI: 10.1177/0962280213486853] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The tumor-node-metastasis (TNM) staging system has been the anchor of cancer diagnosis, treatment, and prognosis for many years. For meaningful clinical use, an orderly, progressive condensation of the T and N categories into an overall staging system needs to be defined, usually with respect to a time-to-event outcome. This can be considered as a cutpoint selection problem for a censored response partitioned with respect to two ordered categorical covariates and their interaction. The aim is to select the best grouping of the TN categories. A novel bootstrap cutpoint/model selection method is proposed for this task by maximizing bootstrap estimates of the chosen statistical criteria. The criteria are based on prognostic ability including a landmark measure of the explained variation, the area under the receiver operating characteristic (ROC) curve, and a concordance probability generalized from Harrell's c-index. We illustrate the utility of our method by applying it to the staging of colorectal cancer.
Collapse
Affiliation(s)
- Yunzhi Lin
- Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA
| | - Richard Chappell
- Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA Department of Biostatistics & Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Mithat Gönen
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
14
|
Vrieling A, Buck K, Heinz J, Obi N, Benner A, Flesch-Janys D, Chang-Claude J. Pre-diagnostic alcohol consumption and postmenopausal breast cancer survival: a prospective patient cohort study. Breast Cancer Res Treat 2012; 136:195-207. [DOI: 10.1007/s10549-012-2230-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 08/21/2012] [Indexed: 11/25/2022]
|
15
|
Choodari-Oskooei B, Royston P, Parmar MKB. A simulation study of predictive ability measures in a survival model II: explained randomness and predictive accuracy. Stat Med 2012; 31:2644-59. [PMID: 22764064 DOI: 10.1002/sim.5460] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2010] [Accepted: 03/13/2012] [Indexed: 11/10/2022]
Abstract
Several R(2) -type measures have been proposed to evaluate the predictive ability of a survival model. In Part I, we classified the measures into four categories and studied the measures in the explained variation category. In this paper, we study the remaining measures in a similar fashion, discussing their strengths and shortcomings. Simulation studies are used to examine the performance of the measures with respect to the criteria we set out in Part I. Our simulation studies showed that among the measures studied in this paper, the measures proposed by Kent and O'Quigley ρ(W)(2) (and its approximation ρ(W,A)(2)) and Schemper and Kaider R(SK)(2) perform better with respect to our criteria. However, our investigations showed that ρ(W)(2) is adversely affected by the distribution of covariate and the presence of influential observations. The results show that the other measures perform poorly, primarily because they are affected either by the degree of censoring or the follow-up period.
Collapse
Affiliation(s)
- B Choodari-Oskooei
- London Hub for Trials Methodology Research, MRC Clinical Trials Unit, Aviation House, London, WC2B 6NH, UK.
| | | | | |
Collapse
|
16
|
Pfeilstöcker M, Tüchler H, Schönmetzler A, Nösslinger T, Pittermann E. Time changes in predictive power of established and recently proposed clinical, cytogenetical and comorbidity scores for Myelodysplastic Syndromes. Leuk Res 2011; 36:132-9. [PMID: 21967831 DOI: 10.1016/j.leukres.2011.09.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Revised: 09/09/2011] [Accepted: 09/12/2011] [Indexed: 11/29/2022]
Abstract
BACKGROUND Recent improvements in the treatment of Myelodysplastic Syndromes have fostered further interest in the development of prognostic scores. Prognostic indices such as the IPSS were developed and later validated assuming their predictive values to be unchanged over time. A systematic analysis of the possible variability of predictive power over time in different scores is still lacking and was the aim of this study. DESIGN AND METHODS For 243 primary MDS patients from a single institution treated with supportive care, 19 established or modified scoring systems based on different prognostic factors (clinical, cytogenetical and/or comorbidity) were analysed for their variability over time by statistical methods that quantify time variations in the risk relations (specifically the risk ratios of Cox models) between prognostic subgroups. RESULTS Established scores based mainly on clinical parameters showed strong to moderate loss of predictive power over time whereas cytogenetic scores maintained their predictive power. Scores including comorbidity data showed gain of predictive power over time. CONCLUSIONS The development and comparison of prognostic systems have to take into account their stability versus the possibility or need for re-evaluation. Possibly not only re-evaluation after time is of importance, but also different weighting of items constituting scores.
Collapse
|