1
|
Berkowitz M, Altman RM, Loughin TM. Random forests for survival data: which methods work best and under what conditions? Int J Biostat 2024; 0:ijb-2023-0056. [PMID: 38656274 DOI: 10.1515/ijb-2023-0056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 02/26/2024] [Indexed: 04/26/2024]
Abstract
Few systematic comparisons of methods for constructing survival trees and forests exist in the literature. Importantly, when the goal is to predict a survival time or estimate a survival function, the optimal choice of method is unclear. We use an extensive simulation study to systematically investigate various factors that influence survival forest performance - forest construction method, censoring, sample size, distribution of the response, structure of the linear predictor, and presence of correlated or noisy covariates. In particular, we study 11 methods that have recently been proposed in the literature and identify 6 top performers. We find that all the factors that we investigate have significant impact on the methods' relative accuracy of point predictions of survival times and survival function estimates. We use our results to make recommendations for which methods to use in a given context and offer explanations for the observed differences in relative performance.
Collapse
Affiliation(s)
- Matthew Berkowitz
- Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
| | | | - Thomas M Loughin
- Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
2
|
Liao CM, Su CT, Huang HC, Lin CM. Improved Survival Analyses Based on Characterized Time-Dependent Covariates to Predict Individual Chronic Kidney Disease Progression. Biomedicines 2023; 11:1664. [PMID: 37371759 DOI: 10.3390/biomedicines11061664] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 06/01/2023] [Accepted: 06/05/2023] [Indexed: 06/29/2023] Open
Abstract
Kidney diseases can cause severe morbidity, mortality, and health burden. Determining the risk factors associated with kidney damage and deterioration has become a priority for the prevention and treatment of kidney disease. This study followed 497 patients with stage 3-5 chronic kidney disease (CKD) who were treated at the ward of Taipei Veterans General Hospital from January 2006 to 2019 in Taiwan. The patients underwent 3-year-long follow-up sessions for clinical measurements, which occurred every 3 months. Three time-dependent survival models, namely the Cox proportional hazard model (Cox PHM), random survival forest (RSF), and an artificial neural network (ANN), were used to process patient demographics and laboratory data for predicting progression to renal failure, and important features for optimal prediction were evaluated. The individual prediction of CKD progression was validated using the Kaplan-Meier estimation method, based on patients' true outcomes during and beyond the study period. The results showed that the average concordance indexes for the cross-validation of the Cox PHM, ANN, and RSF models were 0.71, 0.72, and 0.89, respectively. RSF had the best predictive performances for CKD patients within the 3 years of follow-up sessions, with a sensitivity of 0.79 and specificity of 0.88. Creatinine, age, estimated glomerular filtration rate, and urine protein to creatinine ratio were useful factors for predicting the progression of CKD patients in the RSF model. These results may be helpful for instantaneous risk prediction at each follow-up session for CKD patients.
Collapse
Affiliation(s)
- Chen-Mao Liao
- Department of Applied Statistics and Information Science, Ming Chuan University, Taoyuan 333, Taiwan
| | - Chuan-Tsung Su
- Department of Healthcare Information and Management, Ming Chuan University, Taoyuan 333, Taiwan
| | - Hao-Che Huang
- Department of Applied Statistics and Information Science, Ming Chuan University, Taoyuan 333, Taiwan
| | - Chih-Ming Lin
- Department of Healthcare Information and Management, Ming Chuan University, Taoyuan 333, Taiwan
| |
Collapse
|
3
|
Abstract
AbstractTree-based models are increasingly popular due to their ability to identify complex relationships that are beyond the scope of parametric models. Survival tree methods adapt these models to allow for the analysis of censored outcomes, which often appear in medical data. We present a new Optimal Survival Trees algorithm that leverages mixed-integer optimization (MIO) and local search techniques to generate globally optimized survival tree models. We demonstrate that the OST algorithm improves on the accuracy of existing survival tree methods, particularly in large datasets.
Collapse
|
4
|
Bertrand F, Maumy-Bertrand M. Fitting and Cross-Validating Cox Models to Censored Big Data With Missing Values Using Extensions of Partial Least Squares Regression Models. Front Big Data 2021; 4:684794. [PMID: 34790895 PMCID: PMC8591675 DOI: 10.3389/fdata.2021.684794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 10/07/2021] [Indexed: 11/22/2022] Open
Abstract
Fitting Cox models in a big data context -on a massive scale in terms of volume, intensity, and complexity exceeding the capacity of usual analytic tools-is often challenging. If some data are missing, it is even more difficult. We proposed algorithms that were able to fit Cox models in high dimensional settings using extensions of partial least squares regression to the Cox models. Some of them were able to cope with missing data. We were recently able to extend our most recent algorithms to big data, thus allowing to fit Cox model for big data with missing values. When cross-validating standard or extended Cox models, the commonly used criterion is the cross-validated partial loglikelihood using a naive or a van Houwelingen scheme -to make efficient use of the death times of the left out data in relation to the death times of all the data. Quite astonishingly, we will show, using a strong simulation study involving three different data simulation algorithms, that these two cross-validation methods fail with the extensions, either straightforward or more involved ones, of partial least squares regression to the Cox model. This is quite an interesting result for at least two reasons. Firstly, several nice features of PLS based models, including regularization, interpretability of the components, missing data support, data visualization thanks to biplots of individuals and variables -and even parsimony or group parsimony for Sparse partial least squares or sparse group SPLS based models, account for a common use of these extensions by statisticians who usually select their hyperparameters using cross-validation. Secondly, they are almost always featured in benchmarking studies to assess the performance of a new estimation technique used in a high dimensional or big data context and often show poor statistical properties. We carried out a vast simulation study to evaluate more than a dozen of potential cross-validation criteria, either AUC or prediction error based. Several of them lead to the selection of a reasonable number of components. Using these newly found cross-validation criteria to fit extensions of partial least squares regression to the Cox model, we performed a benchmark reanalysis that showed enhanced performances of these techniques. In addition, we proposed sparse group extensions of our algorithms and defined a new robust measure based on the Schmid score and the R coefficient of determination for least absolute deviation: the integrated R Schmid Score weighted. The R-package used in this article is available on the CRAN, http://cran.r-project.org/web/packages/plsRcox/index.html. The R package bigPLS will soon be available on the CRAN and, until then, is available on Github https://github.com/fbertran/bigPLS.
Collapse
Affiliation(s)
- Frédéric Bertrand
- LIST3N, Université de Technologie de Troyes, Troyes, France
- IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France
| | - Myriam Maumy-Bertrand
- LIST3N, Université de Technologie de Troyes, Troyes, France
- IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France
| |
Collapse
|
5
|
Emura T, Hsu WC, Chou WC. A survival tree based on stabilized score tests for high-dimensional covariates. J Appl Stat 2021; 50:264-290. [PMID: 36698545 PMCID: PMC9870022 DOI: 10.1080/02664763.2021.1990224] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
A survival tree can classify subjects into different survival prognostic groups. However, when data contains high-dimensional covariates, the two popular classification trees exhibit fatal drawbacks. The logrank tree is unstable and tends to have false nodes; the conditional inference tree is difficult to interpret the adjusted P-value for high-dimensional tests. Motivated by these problems, we propose a new survival tree based on the stabilized score tests. We propose a novel matrix-based algorithm in order to tests a number of nodes simultaneously via stabilized score tests. We propose a recursive partitioning algorithm to construct a survival tree and develop our original R package uni.survival.tree (https://cran.r-project.org/package=uni.survival.tree) for implementation. Simulations are performed to demonstrate the superiority of the proposed method over the existing methods. The lung cancer data analysis demonstrates the usefulness of the proposed method.
Collapse
Affiliation(s)
- Takeshi Emura
- Biostatistics Center, Kurume University, Kurume, Japan, Takeshi Emura Biostatistics Center, Kurume University, 67 Asahi-machi, Kurume, Japan
| | - Wei-Chern Hsu
- Graduate Institute of Statistics, National Central University, Taoyuan, Taiwan
| | - Wen-Chi Chou
- Department of Hematology and Oncology, Chang Gung Memorial Hospital and College of Medicine, Chang Gung University, Taoyuan, Taiwan
| |
Collapse
|
6
|
Tzeng S, Zhu J, Weisman AJ, Bradshaw TJ, Jeraj R. Spatial process decomposition for quantitative imaging biomarkers using multiple images of varying shapes. Stat Med 2020; 40:1243-1261. [PMID: 33336451 DOI: 10.1002/sim.8838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 11/11/2020] [Accepted: 11/14/2020] [Indexed: 11/11/2022]
Abstract
Quantitative imaging biomarkers (QIB) are extracted from medical images in radiomics for a variety of purposes including noninvasive disease detection, cancer monitoring, and precision medicine. The existing methods for QIB extraction tend to be ad hoc and not reproducible. In this article, a general and flexible statistical approach is proposed for handling up to three-dimensional medical images and reasonably capturing features with respect to specific spatial patterns. In particular, a model-based spatial process decomposition is developed where the random weights are unique to individual patients for component functions common across patients. Model fitting and selection are based on maximum likelihood, while feature extractions are via optimal prediction of the underlying true image. Simulation studies are conducted to investigate the properties of the proposed methodology. For illustration, a cancer image data set is analyzed and QIBs are extracted in association with a clinical endpoint.
Collapse
Affiliation(s)
- ShengLi Tzeng
- Department of Applied Mathematics, National Sun Yat-sen University, Kaohsiung City, Taiwan
| | - Jun Zhu
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Amy J Weisman
- Department of Medical Physics, University of Wisconsin Madison, Madison, Wisconsin, USA
| | - Tyler J Bradshaw
- Department of Medical Physics, University of Wisconsin Madison, Madison, Wisconsin, USA
| | - Robert Jeraj
- Department of Medical Physics, University of Wisconsin Madison, Madison, Wisconsin, USA.,Department of Human Oncology, University of Wisconsin Madison, Madison, Wisconsin, USA
| |
Collapse
|
7
|
Roshanaei G, Safari M, Faradmal J, Abbasi M, Khazaei S. Factors affecting the survival of patients with colorectal cancer using random survival forest. J Gastrointest Cancer 2020; 53:64-71. [PMID: 33174117 DOI: 10.1007/s12029-020-00544-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/28/2020] [Indexed: 11/26/2022]
Abstract
PURPOSE Colorectal cancer is one of the most common cancers and the leading cause of cancer death in Iran. This study aimed to develop and validate a random survival forest (RSF) to identify important risk factors on mortality in colorectal patients based on their demographic and clinical-related variables. METHODS In this retrospective cohort study, the information of 317 patients with colorectal cancer who were referred to Imam Khomeini Clinic of Hamadan during the years of 2002 to 2017 were examined. Patient survival was calculated from the time of diagnosis to death. In the present study, the RSF model was used to identify factors affecting patient survival. Also, the results of the RSF model were compared with the Cox model. The data were analyzed using R software (version 3.6.1) and survival packages. RESULTS One-, 2-, 3-, 4-, 5-, and 10-year survival rates of included patients were 81.4%, 63%, 57%, 52%, 45%, and 34%, respectively, and the median survival was obtained to be 53 months. The number of 150 patients was died at this time period. The four most important predictors of survival included metastasis to other organs, WBC count, disease stage, and number of lymphomas involved. RSF method predicted survival better than the conventional Cox proportional hazard model. CONCLUSION We found that metastasis to other organs, WBC count, disease stage, and number of lymphomas involved were the most four most important predictors of low survival for colorectal cancer patients.
Collapse
Affiliation(s)
- Ghodratollah Roshanaei
- Department of Biostatistics, School of Public Health, Modeling of Noncommunicable Diseases Research Canter, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Malihe Safari
- Department of Biostatistics, School of Public Health, Modeling of Noncommunicable Diseases Research Canter, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Javad Faradmal
- Department of Biostatistics, School of Public Health, Modeling of Noncommunicable Diseases Research Canter, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Mohammad Abbasi
- Department of Internal Medicine, School of Medicine, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Salman Khazaei
- Research Center for Health Sciences, Hamadan University of Medical Sciences, Hamadan, Iran.
| |
Collapse
|
8
|
Tollenaar N, van der Heijden PGM. Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes. PLoS One 2019; 14:e0213245. [PMID: 30849094 PMCID: PMC6407787 DOI: 10.1371/journal.pone.0213245] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Accepted: 02/19/2019] [Indexed: 11/19/2022] Open
Abstract
In a recidivism prediction context, there is no consensus on which modeling strategy should be followed for obtaining an optimal prediction model. In previous papers, a range of statistical and machine learning techniques were benchmarked on recidivism data with a binary outcome. However, two important tree ensemble methods, namely gradient boosting and random forests were not extensively evaluated. In this paper, we further explore the modeling potential of these techniques in the binary outcome criminal prediction context. Additionally, we explore the predictive potential of classical statistical and machine learning methods for censored time-to-event data. A range of statistical manually specified statistical and (semi-)automatic machine learning models is fitted on Dutch recidivism data, both for the binary outcome case and censored outcome case. To enhance generalizability of results, the same models are applied to two historical American data sets, the North Carolina prison data. For all datasets, (semi-) automatic modeling in the binary case seems to provide no improvement over an appropriately manually specified traditional statistical model. There is however evidence of slightly improved performance of gradient boosting in survival data. Results on the reconviction data from two sources suggest that both statistical and machine learning should be tried out for obtaining an optimal model. Even if a flexible black-box model does not improve upon the predictions of a manually specified model, it can serve as a test whether important interactions are missing or other misspecification of the model are present and can thus provide more security in the modeling process.
Collapse
Affiliation(s)
- Nikolaj Tollenaar
- Research and Documentation Centre (WODC), Ministry of Justice and Security, The Hague, Zuid-Holland, the Netherlands
| | - Peter G. M. van der Heijden
- Department of Social Sciences, Utrecht University, Utrecht, Utrecht, the Netherlands
- Department of Social Sciences, University of Southampton, Hampshire, United Kingdom
| |
Collapse
|
9
|
Development and validation of a multivariate predictive model for rheumatoid arthritis mortality using a machine learning approach. Sci Rep 2017; 7:10189. [PMID: 28860558 PMCID: PMC5579234 DOI: 10.1038/s41598-017-10558-w] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Accepted: 08/11/2017] [Indexed: 12/15/2022] Open
Abstract
We developed and independently validated a rheumatoid arthritis (RA) mortality prediction model using the machine learning method Random Survival Forests (RSF). Two independent cohorts from Madrid (Spain) were used: the Hospital Clínico San Carlos RA Cohort (HCSC-RAC; training; 1,461 patients), and the Hospital Universitario de La Princesa Early Arthritis Register Longitudinal study (PEARL; validation; 280 patients). Demographic and clinical-related variables collected during the first two years after disease diagnosis were used. 148 and 21 patients from HCSC-RAC and PEARL died during a median follow-up time of 4.3 and 5.0 years, respectively. Age at diagnosis, median erythrocyte sedimentation rate, and number of hospital admissions showed the higher predictive capacity. Prediction errors in the training and validation cohorts were 0.187 and 0.233, respectively. A survival tree identified five mortality risk groups using the predicted ensemble mortality. After 1 and 7 years of follow-up, time-dependent specificity and sensitivity in the validation cohort were 0.79–0.80 and 0.43–0.48, respectively, using the cut-off value dividing the two lower risk categories. Calibration curves showed overestimation of the mortality risk in the validation cohort. In conclusion, we were able to develop a clinical prediction model for RA mortality using RSF, providing evidence for further work on external validation.
Collapse
|
10
|
Kretowska M. Piecewise-linear criterion functions in oblique survival tree induction. Artif Intell Med 2017; 75:32-39. [PMID: 28363454 DOI: 10.1016/j.artmed.2016.12.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Revised: 11/07/2016] [Accepted: 12/28/2016] [Indexed: 11/17/2022]
Abstract
OBJECTIVE Recursive partitioning is a common, assumption-free method of survival data analysis. It focuses mainly on univariate trees, which use splits based on a single variable in each internal node. In this paper, I provide an extension of an oblique survival tree induction technique, in which axis-parallel splits are replaced by hyperplanes, dividing the feature space into areas with a homogeneous survival experience. METHOD AND MATERIALS The proposed tree induction algorithm consists of two steps. The first covers the induction of a large tree with internal nodes represented by hyperplanes, whose positions are calculated by the minimization of a piecewise-linear criterion function, the dipolar criterion. The other phase uses a split-complexity algorithm to prune unnecessary tree branches and a 10-fold cross-validation technique to choose the best tree. The terminal nodes of the final tree are characterised by Kaplan-Meier survival functions. A synthetic data set was used to test the performance, while seven real data sets were exploited to validate the proposed method. RESULTS The evaluation of the method was focused on two features: predictive ability and tree size. These were compared with two univariate tree models: the conditional inference tree and recursive partitioning for survival trees, respectively. The comparison of the predictive ability, expressed as an integrated Brier score, showed no statistically significant differences (p=0.486) among the three methods. Similar results were obtained for the tree size (p=0.11), which was calculated as a median value over 20 runs of a 10-fold cross-validation. CONCLUSIONS The predictive ability of trees generated using piecewise-linear criterion functions is comparable to that of univariate tree-based models. Although a similar conclusion may be drawn from the analysis of the tree size, in the majority of the studied cases, the number of nodes of the dipolar tree is one of the smallest among all the methods.
Collapse
Affiliation(s)
- Malgorzata Kretowska
- Faculty of Computer Science, Bialystok University of Technology, Wiejska 45a, 15-351 Bialystok, Poland.
| |
Collapse
|
11
|
Abstract
We compare splitting methods for constructing survival trees that are used as a model of survival time based on covariates. A number of splitting criteria on the classification and regression tree (CART) have been proposed by various authors, and we compare nine criteria through simulations. Comparative studies have been restricted to criteria that suppose the survival model for each terminal node in the final tree as a non-parametric model. As the main results, the criteria using the exponential log-likelihood loss, log-rank test statistics, the deviance residual under the proportional hazard model, or square error of martingale residual are recommended when it appears that the data have constant hazard with the passage of time. On the other hand, when the data are thought to have decreasing hazard with passage of time, the criterion using the two-sample test statistic, or square error of deviance residual would be optimal. Moreover, when the data are thought to have increasing hazard with the passage of time, the criterion using the exponential log-likelihood loss, or impurity that combines observed times and the proportion of censored observations would be the best. We also present the results of an actual medical research to show the utility of survival trees.
Collapse
|
12
|
Application of Random Forest Survival Models to Increase Generalizability of Decision Trees: A Case Study in Acute Myocardial Infarction. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:576413. [PMID: 26858773 PMCID: PMC4698527 DOI: 10.1155/2015/576413] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2015] [Revised: 11/23/2015] [Accepted: 11/24/2015] [Indexed: 11/17/2022]
Abstract
Background. Tree models provide easily interpretable prognostic tool, but instable results. Two approaches to enhance the generalizability of the results are pruning and random survival forest (RSF). The aim of this study is to assess the generalizability of saturated tree (ST), pruned tree (PT), and RSF. Methods. Data of 607 patients was randomly divided into training and test set applying 10-fold cross-validation. Using training sets, all three models were applied. Using Log-Rank test, ST was constructed by searching for optimal cutoffs. PT was selected plotting error rate versus minimum sample size in terminal nodes. In construction of RSF, 1000 bootstrap samples were drawn from the training set. C-index and integrated Brier score (IBS) statistic were used to compare models. Results. ST provides the most overoptimized statistics. Mean difference between C-index in training and test set was 0.237. Corresponding figure in PT and RSF was 0.054 and 0.007. In terms of IBS, the difference was 0.136 in ST, 0.021 in PT, and 0.0003 in RSF. Conclusion. Pruning of tree and assessment of its performance of a test set partially improve the generalizability of decision trees. RSF provides results that are highly generalizable.
Collapse
|
13
|
Shimokawa A, Kawasaki Y, Miyaoka E. A comparative study on splitting criteria of a survival tree based on the Cox proportional model. J Biopharm Stat 2015; 26:386-401. [PMID: 26043356 DOI: 10.1080/10543406.2015.1052485] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
We treat the situations that the effect of covariates on hazard is differed in subgroups of patients. To handle this situation, we can consider the hybrid model of the Cox model and tree-structured model. Through simulation studies, we compared several splitting criteria for constructing this hybrid model. As a result, the criterion using the degree of the improvement in the negative maximum partial log-likelihood obtained by splitting showed a good performance for many situations. We also present the results obtained by applying this tree model in an actual medical research study to show its utility.
Collapse
Affiliation(s)
- Asanao Shimokawa
- a Graduate School of Science , Tokyo University of Science , Tokyo , Japan
| | - Yohei Kawasaki
- b Department of Mathematics , Tokyo University of Science , Tokyo , Japan
| | - Etsuo Miyaoka
- b Department of Mathematics , Tokyo University of Science , Tokyo , Japan
| |
Collapse
|
14
|
Schwartz CE, Ahmed S, Sawatzky R, Sajobi T, Mayo N, Finkelstein J, Lix L, Verdam MGE, Oort FJ, Sprangers MAG. Guidelines for secondary analysis in search of response shift. Qual Life Res 2013; 22:2663-73. [DOI: 10.1007/s11136-013-0402-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/22/2013] [Indexed: 01/31/2023]
|
15
|
Response shift in patients with multiple sclerosis: an application of three statistical techniques. Qual Life Res 2011; 20:1561-72. [PMID: 22081216 DOI: 10.1007/s11136-011-0056-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/25/2011] [Indexed: 10/15/2022]
Abstract
OBJECTIVE With the evolution of theory and methods for detecting recalibration, reprioritization, and reconceptualization response shifts, the time has come to evaluate and compare the current statistical detection techniques. This manuscript presents an overview of a cross-method validation done on the same patient sample. METHODS Three statistical techniques were used: Structural Equation Modeling, Latent Trajectory Analysis, and Recursive Partitioning and Regression Tree modeling. The study sample (n = 3,008) was drawn from the North American Research Committee on Multiple Sclerosis (NARCOMS) Registry to represent patients soon after diagnosis, classified as having either a self-reported relapsing, progressive, or stable disease trajectory. Patient-reported outcomes included the disease-specific Performance Scales and the Patient-Derived Disease Steps, and the generic SF-12v2 measure. RESULTS Small response shift effect sizes were detected by all of the methods. Recalibration response shift was detected by Structural Equation Modeling, Recursive Partitioning Regression Tree demonstrated patterns consistent with all three types of response shift, and Latent Trajectory Analysis, although unable to distinguish types of response shift, did detect response shift in less than 1% of the sample. CONCLUSION The methods and their findings were discussed for operationalization, interpretability, assumptions, ability to use all data points from the study sample, limitations, and strengths. Directions for future research are discussed.
Collapse
|
16
|
Li Y, Schwartz CE. Data mining for response shift patterns in multiple sclerosis patients using recursive partitioning tree analysis. Qual Life Res 2011; 20:1543-53. [DOI: 10.1007/s11136-011-0004-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/29/2011] [Indexed: 11/25/2022]
|
17
|
Prilutsky D, Rogachev B, Marks RS, Lobel L, Last M. Classification of infectious diseases based on chemiluminescent signatures of phagocytes in whole blood. Artif Intell Med 2011; 52:153-63. [DOI: 10.1016/j.artmed.2011.04.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2009] [Revised: 04/11/2011] [Accepted: 04/18/2011] [Indexed: 12/21/2022]
|
18
|
|
19
|
Kim SH, Lee JH, Choi J, Kwon KA, Lee S, Oh SY, Kwon HC, Han JY, Kim HJ. Improvement of the WHO classification-based prognostic scoring system (WPSS) by including age for Korean patients with the myelodysplastic syndrome. Leuk Res 2010; 34:1589-95. [PMID: 20633929 DOI: 10.1016/j.leukres.2010.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2010] [Revised: 02/28/2010] [Accepted: 03/01/2010] [Indexed: 11/29/2022]
Abstract
BACKGROUND The aim of this study was to improve the predictive power of the WHO classification-based prognostic scoring system (WPSS) by including age in patients with the myelodysplastic syndrome (MDS). PATIENTS AND METHODS 136 Korean patients with de novo MDS between 1995 and 2008 were evaluated retrospectively. All patients were reclassified according to WHO criteria. 114 patients were included in the final analysis. An individualized age-adapted scoring system was developed to improve the accuracy of prognosis of the WPSS. RESULTS The WPSS was significantly associated with the prediction of survival and the leukemia-free survival. While the risk of a patient with the WPSS was best represented by the values 0 (very low), +1 (low), +2 (intermediate), +3 (high), and +4 (very high), these values were found to vary between -1.0 and 4.2 in the same patients when age was included as a factor. The WPSS may vary according to age, <55 or ≥55 years. The estimated difference in median survival was more prominent in the lower risk groups of the WPSS than in the higher-risk groups. CONCLUSION In addition to the WPSS, age was found to significantly influence the prognosis of patients with MDS and provided a more individualized prognosis for the patients with MDS.
Collapse
Affiliation(s)
- Sung-Hyun Kim
- Department of Internal Medicine, Dong-A University College of Medicine, Seo-gu, Busan, South Korea
| | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Li Y, Rapkin B. Classification and regression tree uncovered hierarchy of psychosocial determinants underlying quality-of-life response shift in HIV/AIDS. J Clin Epidemiol 2010; 62:1138-47. [PMID: 19595576 DOI: 10.1016/j.jclinepi.2009.03.021] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2008] [Revised: 03/26/2009] [Accepted: 03/31/2009] [Indexed: 11/25/2022]
Abstract
OBJECTIVES Rapkin and Schwartz define response shift as otherwise unexplained, discrepant change in health-related quality of life (HRQOL) that is associated with change in cognitive appraisal. In this article, we demonstrate how a recursive partitioning (rpart) regression tree analytic approach may be used to explore cognitive changes to gain additional insight into response-shift phenomena. STUDY DESIGN AND SETTING Data are from the "Choices in Care Study," an evaluation of HIV+ Medicaid recipients' experiences and outcomes in care (N=394). Cognitive assessment was based on the QOL appraisal battery. HRQOL was measured by the SF-36 Health Survey, version 2 (SF-36v2). RESULTS We used rpart to examine 6-month change in SF-36v2 mental composite score as a function of changes in appraisal, after controlling for patient characteristics, health changes, and intervening events. Rpart identified nine distinct patterns of cognitive change, including three associated with negative discrepancies, four with positive discrepancies, and two with no discrepancies. CONCLUSION Rpart classification provides a nuanced treatment of response shift. This methodology has implications for evaluating programs, guiding decisions, and targeting care.
Collapse
Affiliation(s)
- Yuelin Li
- Department of Psychiatry and Behavioral Sciences, Memorial Sloan-Kettering Cancer Center, 641 Lexington Avenue, New York, NY 10022, USA.
| | | |
Collapse
|
21
|
van Wieringen WN, Kun D, Hampel R, Boulesteix AL. Survival prediction using gene expression data: A review and comparison. Comput Stat Data Anal 2009. [DOI: 10.1016/j.csda.2008.05.021] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
22
|
Bou-hamad I, Larocque D, Ben-Ameur H, Mâsse LC, Vitaro F, Tremblay RE. Discrete-time survival trees. CAN J STAT 2009. [DOI: 10.1002/cjs.10007] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
23
|
Beck AW, Murphy EH, Hocking JA, Timaran CH, Arko FR, Clagett GP. Aortic reconstruction with femoral-popliteal vein: Graft stenosis incidence, risk and reintervention. J Vasc Surg 2008; 47:36-43; discussion 44. [DOI: 10.1016/j.jvs.2007.08.035] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Revised: 08/17/2007] [Accepted: 08/19/2007] [Indexed: 11/17/2022]
|
24
|
Dobler L, Marek O, Rolf E, Andreas G, Antje M, Hubertus KF, Andreas WG. Rapid evaluation of human biomonitoring data using pattern recognition systems. JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH. PART A 2008; 71:816-826. [PMID: 18569580 DOI: 10.1080/15287390801985778] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Assessing human biomonitoring data often necessitates dealing with fragmentary prior knowledge and a complex set of variables. A procedure for explorative data analysis via decision-tree analysis was undertaken to obtain high-level descriptive summary information on human exposure on a timely basis. This study is based on a subset of monitoring data of the Environmental Specimen Bank for Human Tissues within the German Environmental Specimen Bank (n sigma: 2401: 42/58% males/females; 34/66% born in East/West Germany). Three well-known xenobiotic organochlorines (XOCs) [sum of polychlorinated biphenyls (PCBs) 138 + 153 + 180, pentachlorophenol (PCP), and hexachlorobenzene (HCB)] were used as target variables. Meta-data regarding the samples and individuals were collected via a self-reported questionnaire and used as potential predictor variables. Prior to decision-tree analysis, XOC levels were adjusted (trend, lipids, creatinine, total protein) via stepwise linear regression. Adjusted XOC levels were subsequently utilized to identify relevant predictors of human XOC exposure using Exhaustive CHAID as a common decision-tree algorithm. Although overall tree model quality is generally poor, consistent and plausible predictors for human exposure were identified. Besides time trend and clinical parameters, the predominant predictors for HCB and PCB exposure were birthplace, gender, age, body mass index (BMI), and consumption of milk/dairy products or animal fats. For PCP, predominant predictors were sampling site, gender, and consumption of animal fats. Summing results of decision-tree models and regression models, explained variances for metric scaled XOC are: PCB (34.2%) > HCB (30.3%) > PCP (17.2%). Explorative analysis of human biomonitoring data based on simple decision-tree analysis provides valuable information for planning further investigations and statistical data for analyses to support prediction, consequences, and regulation of XOC.
Collapse
Affiliation(s)
- Lorenz Dobler
- Environmental Specimen Bank for Human Tissues, Westphalian Wilhelms University Muenster, Muenster, Germany.
| | | | | | | | | | | | | |
Collapse
|
25
|
Radespiel-Tröger M, Hothorn T, Pfahlberg AB, Gefeller O. Re: "Applying recursive partitioning to a prospective study of factors associated with adherence to mammography screening guidelines". Am J Epidemiol 2006; 164:400-1; author reply 401-2. [PMID: 16809428 DOI: 10.1093/aje/kwj235] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
26
|
van Dijk MR, Steyerberg EW, Stenning SP, Habbema JDF. Identifying subgroups among poor prognosis patients with nonseminomatous germ cell cancer by tree modelling: a validation study. Ann Oncol 2004; 15:1400-5. [PMID: 15319246 DOI: 10.1093/annonc/mdh350] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND In order to target intensive treatment strategies for poor prognosis patients with non-seminomatous germ cell cancer, those with the poorest prognosis should be identified. These patients might profit most from more intensive treatment strategies. For this purpose, a regression tree was previously developed on 332 patients. We aimed to evaluate the performance and structure of this tree. PATIENTS AND METHODS The previously developed tree was applied to 456 patients with a poor prognosis as defined by the International Germ Cell Cancer Collaborative Group (IGCCCG). Next, we developed a new tree to evaluate whether a similar structure to the previous tree was found. We assessed the internal validity of the new tree, and compared the 2-year survival estimates of each subgroup together with the discriminative ability for both the previously developed and the new tree. Discriminative ability was measured by a concordance (c) statistic, which varies between 0.5 (no discrimination) and 1.0 (perfect discrimination). RESULTS The 2-year survival estimates in the IGCCCG data ranged from 33% to 63%. The ordering of the subgroups was different and discriminative ability was lower than originally found (c = 0.56 in the IGCCCG data versus 0.63 originally). The new tree differed considerably from the original tree, and identified poor prognosis subgroups with 2-year survival estimates from 38% to 73%. Internal validation showed similar discriminative ability for the new tree and the original tree (c = 0.59 versus 0.56). CONCLUSIONS The previously developed tree showed poor validity with respect to discriminative ability and the stability of its structure. The performance of the new tree was also unsatisfactory. Given the low proportion of patients categorised as poor prognosis, it seems that the potential to identify further subgroups with the currently available patient characteristics is limited.
Collapse
Affiliation(s)
- M R van Dijk
- Department of Public Health, Erasmus MC, University Medical Center Rotterdam, The Netherlands.
| | | | | | | |
Collapse
|