1
|
Neural Networks for Survival Prediction in Medicine Using Prognostic Factors: A Review and Critical Appraisal. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:1176060. [PMID: 36238497 PMCID: PMC9553343 DOI: 10.1155/2022/1176060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 08/26/2022] [Accepted: 09/13/2022] [Indexed: 11/17/2022]
Abstract
Survival analysis deals with the expected duration of time until one or more events of interest occur. Time to the event of interest may be unobserved, a phenomenon commonly known as right censoring, which renders the analysis of these data challenging. Over the years, machine learning algorithms have been developed and adapted to right-censored data. Neural networks have been repeatedly employed to build clinical prediction models in healthcare with a focus on cancer and cardiology. We present the first ever attempt at a large-scale review of survival neural networks (SNNs) with prognostic factors for clinical prediction in medicine. This work provides a comprehensive understanding of the literature (24 studies from 1990 to August 2021, global search in PubMed). Relevant manuscripts are classified as methodological/technical (novel methodology or new theoretical model; 13 studies) or applications (11 studies). We investigate how researchers have used neural networks to fit survival data for prediction. There are two methodological trends: either time is added as part of the input features and a single output node is specified, or multiple output nodes are defined for each time interval. A critical appraisal of model aspects that should be designed and reported more carefully is performed. We identify key characteristics of prediction models (i.e., number of patients/predictors, evaluation measures, calibration), and compare ANN's predictive performance to the Cox proportional hazards model. The median sample size is 920 patients, and the median number of predictors is 7. Major findings include poor reporting (e.g., regarding missing data, hyperparameters) as well as inaccurate model development/validation. Calibration is neglected in more than half of the studies. Cox models are not developed to their full potential and claims for the performance of SNNs are exaggerated. Light is shed on the current state of art of SNNs in medicine with prognostic factors. Recommendations are made for the reporting of clinical prediction models. Limitations are discussed, and future directions are proposed for researchers who seek to develop existing methodology.
Collapse
|
2
|
A Simulation Study to Compare the Predictive Performance of Survival Neural Networks with Cox Models for Clinical Trial Data. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:2160322. [PMID: 34880930 PMCID: PMC8646180 DOI: 10.1155/2021/2160322] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Accepted: 11/10/2021] [Indexed: 12/23/2022]
Abstract
Background Studies focusing on prediction models are widespread in medicine. There is a trend in applying machine learning (ML) by medical researchers and clinicians. Over the years, multiple ML algorithms have been adapted to censored data. However, the choice of methodology should be motivated by the real-life data and their complexity. Here, the predictive performance of ML techniques is compared with statistical models in a simple clinical setting (small/moderate sample size and small number of predictors) with Monte-Carlo simulations. Methods Synthetic data (250 or 1000 patients) were generated that closely resembled 5 prognostic factors preselected based on a European Osteosarcoma Intergroup study (MRC BO06/EORTC 80931). Comparison was performed between 2 partial logistic artificial neural networks (PLANNs) and Cox models for 20, 40, 61, and 80% censoring. Survival times were generated from a log-normal distribution. Models were contrasted in terms of the C-index, Brier score at 0-5 years, integrated Brier score (IBS) at 5 years, and miscalibration at 2 and 5 years (usually neglected). The endpoint of interest was overall survival. Results PLANNs original/extended were tuned based on the IBS at 5 years and the C-index, achieving a slightly better performance with the IBS. Comparison with Cox models showed that PLANNs can reach similar predictive performance on simulated data for most scenarios with respect to the C-index, Brier score, or IBS. However, Cox models were frequently less miscalibrated. Performance was robust in scenario data where censored patients were removed before 2 years or curtailing at 5 years was performed (on training data). Conclusion Survival neural networks reached a comparable predictive performance with Cox models but were generally less well calibrated. All in all, researchers should be aware of burdensome aspects of ML techniques such as data preprocessing, tuning of hyperparameters, and computational intensity that render them disadvantageous against conventional regression models in a simple clinical setting.
Collapse
|
3
|
Kantidakis G, Putter H, Lancia C, Boer JD, Braat AE, Fiocco M. Survival prediction models since liver transplantation - comparisons between Cox models and machine learning techniques. BMC Med Res Methodol 2020; 20:277. [PMID: 33198650 PMCID: PMC7667810 DOI: 10.1186/s12874-020-01153-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Accepted: 10/26/2020] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians. METHODS In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques. RESULTS Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years. CONCLUSION In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables. TRIAL REGISTRATION Retrospective data were provided by the Scientific Registry of Transplant Recipients under Data Use Agreement number 9477 for analysis of risk factors after liver transplantation.
Collapse
Affiliation(s)
- Georgios Kantidakis
- Mathematical Institute (MI) Leiden University, Niels Bohrweg 1, Leiden, 2333 CA, the Netherlands. .,Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, Leiden, 2333 ZA, The Netherlands. .,Department of Statistics, European Organisation for Research and Treatment of Cancer (EORTC) Headquarters, Ave E. Mounier 83/11, Brussels, 1200, Belgium.
| | - Hein Putter
- Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, Leiden, 2333 ZA, The Netherlands
| | - Carlo Lancia
- Mathematical Institute (MI) Leiden University, Niels Bohrweg 1, Leiden, 2333 CA, the Netherlands
| | - Jacob de Boer
- Department of Surgery, Leiden University Medical Center (LUMC), Albinusdreef 2, Leiden, 2333 ZA, the Netherlands
| | - Andries E Braat
- Department of Surgery, Leiden University Medical Center (LUMC), Albinusdreef 2, Leiden, 2333 ZA, the Netherlands
| | - Marta Fiocco
- Mathematical Institute (MI) Leiden University, Niels Bohrweg 1, Leiden, 2333 CA, the Netherlands.,Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, Leiden, 2333 ZA, The Netherlands.,Trial and Data Center, Princess Máxima Center for pediatric oncology (PMC), Heidelberglaan 25, Utrecht, 3584 CS, the Netherlands
| |
Collapse
|
4
|
Artificial Neural Network and Cox Regression Models for Predicting Mortality after Hip Fracture Surgery: A Population-Based Comparison. ACTA ACUST UNITED AC 2020; 56:medicina56050243. [PMID: 32438724 PMCID: PMC7279348 DOI: 10.3390/medicina56050243] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 05/13/2020] [Accepted: 05/13/2020] [Indexed: 01/31/2023]
Abstract
This study purposed to validate the accuracy of an artificial neural network (ANN) model for predicting the mortality after hip fracture surgery during the study period, and to compare performance indices between the ANN model and a Cox regression model. A total of 10,534 hip fracture surgery patients during 1996–2010 were recruited in the study. Three datasets were used: a training dataset (n = 7374) was used for model development, a testing dataset (n = 1580) was used for internal validation, and a validation dataset (1580) was used for external validation. Global sensitivity analysis also was performed to evaluate the relative importances of input predictors in the ANN model. Mortality after hip fracture surgery was significantly associated with referral system, age, gender, urbanization of residence area, socioeconomic status, Charlson comorbidity index (CCI) score, intracapsular fracture, hospital volume, and surgeon volume (p < 0.05). For predicting mortality after hip fracture surgery, the ANN model had higher prediction accuracy and overall performance indices compared to the Cox model. Global sensitivity analysis of the ANN model showed that the referral to lower-level medical institutions was the most important variable affecting mortality, followed by surgeon volume, hospital volume, and CCI score. Compared with the Cox regression model, the ANN model was more accurate in predicting postoperative mortality after a hip fracture. The forecasting predictors associated with postoperative mortality identified in this study can also bae used to educate candidates for hip fracture surgery with respect to the course of recovery and health outcomes.
Collapse
|
5
|
Multicenter External Validation of the Liverpool Uveal Melanoma Prognosticator Online: An OOG Collaborative Study. Cancers (Basel) 2020; 12:cancers12020477. [PMID: 32085617 PMCID: PMC7072188 DOI: 10.3390/cancers12020477] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2019] [Revised: 01/22/2020] [Accepted: 02/13/2020] [Indexed: 12/19/2022] Open
Abstract
Uveal melanoma (UM) is fatal in ~50% of patients as a result of disseminated disease. This study aims to externally validate the Liverpool Uveal Melanoma Prognosticator Online V3 (LUMPO3) to determine its reliability in predicting survival after treatment for choroidal melanoma when utilizing external data from other ocular oncology centers. Anonymized data of 1836 UM patients from seven international ocular oncology centers were analyzed with LUMPO3 to predict the 10-year survival for each patient in each external dataset. The analysts were masked to the patient outcomes. Model predictions were sent to an independent statistician to evaluate LUMPO3’s performance using discrimination and calibration methods. LUMPO3’s ability to discriminate between UM patients who died of metastatic UM and those who were still alive was fair-to-good, with C-statistics ranging from 0.64 to 0.85 at year 1. The pooled estimate for all external centers was 0.72 (95% confidence interval: 0.68 to 0.75). Agreement between observed and predicted survival probabilities was generally good given differences in case mix and survival rates between different centers. Despite the differences between the international cohorts of patients with primary UM, LUMPO3 is a valuable tool for predicting all-cause mortality in this disease when using data from external centers.
Collapse
|
6
|
Bannister CA, Halcox JP, Currie CJ, Preece A, Spasić I. A genetic programming approach to development of clinical prediction models: A case study in symptomatic cardiovascular disease. PLoS One 2018; 13:e0202685. [PMID: 30180175 PMCID: PMC6122798 DOI: 10.1371/journal.pone.0202685] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 08/06/2018] [Indexed: 12/22/2022] Open
Abstract
Background Genetic programming (GP) is an evolutionary computing methodology capable of identifying complex, non-linear patterns in large data sets. Despite the potential advantages of GP over more typical, frequentist statistical approach methods, its applications to survival analyses are rare, at best. The aim of this study was to determine the utility of GP for the automatic development of clinical prediction models. Methods We compared GP against the commonly used Cox regression technique in terms of the development and performance of a cardiovascular risk score using data from the SMART study, a prospective cohort study of patients with symptomatic cardiovascular disease. The composite endpoint was cardiovascular death, non-fatal stroke, and myocardial infarction. A total of 3,873 patients aged 19–82 years were enrolled in the study 1996–2006. The cohort was split 70:30 into derivation and validation sets. The derivation set was used for development of both GP and Cox regression models. These models were then used to predict the discrete hazards at t = 1, 3, and 5 years. The predictive ability of both models was evaluated in terms of their risk discrimination and calibration using the validation set. Results The discrimination of both models was comparable. At time points t = 1, 3, and 5 years the C-index was 0.59, 0.69, 0.64 and 0.66, 0.70, 0.70 for the GP and Cox regression models respectively. At the same time points, the calibration of both models, which was assessed using calibration plots and the generalization of the Hosmer-Lemeshow test statistic, was also comparable, but with the Cox model being better calibrated to the validation data. Conclusion Using empirical data, we demonstrated that a prediction model developed automatically by GP has predictive ability comparable to that of manually tuned Cox regression. The GP model was more complex, but it was developed in a fully automated way and comprised fewer covariates. Furthermore, it did not require the expertise normally needed for its derivation, thereby alleviating the knowledge elicitation bottleneck. Overall, GP demonstrated considerable potential as a method for the automated development of clinical prediction models for diagnostic and prognostic purposes.
Collapse
Affiliation(s)
- Christian A. Bannister
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
- Cochrane Institute of Primary Care & Public Health, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Julian P. Halcox
- Department of Cardiology, Medical School, Swansea University, Swansea, United Kingdom
| | - Craig J. Currie
- Cochrane Institute of Primary Care & Public Health, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Alun Preece
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Irena Spasić
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
- * E-mail:
| |
Collapse
|
7
|
Lisboa P, Vellido A, Tagliaferri R, Napolitano F, Ceccarelli M, Martin-Guerrero J, Biganzoli E. Data Mining in Cancer Research [Application Notes. IEEE COMPUT INTELL M 2010. [DOI: 10.1109/mci.2009.935311] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
8
|
Lisboa PJG, Etchells TA, Jarman IH, Arsene CTC, Aung MSH, Eleuteri A, Taktak AFG, Ambrogi F, Boracchi P, Biganzoli E. Partial logistic artificial neural network for competing risks regularized with automatic relevance determination. ACTA ACUST UNITED AC 2009; 20:1403-16. [PMID: 19628458 DOI: 10.1109/tnn.2009.2023654] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Time-to-event analysis is important in a wide range of applications from clinical prognosis to risk modeling for credit scoring and insurance. In risk modeling, it is sometimes required to make a simultaneous assessment of the hazard arising from two or more mutually exclusive factors. This paper applies to an existing neural network model for competing risks (PLANNCR), a Bayesian regularization with the standard approximation of the evidence to implement automatic relevance determination (PLANNCR-ARD). The theoretical framework for the model is described and its application is illustrated with reference to local and distal recurrence of breast cancer, using the data set of Veronesi (1995).
Collapse
Affiliation(s)
- Paulo J G Lisboa
- School of Computing and Mathematical Sciences, Liverpool John Moores University, Liverpool L33AF, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Taktak AF, Eleuteri A, Lake SP, Fisher AC. A web-based tool for the assessment of discrimination and calibration properties of prognostic models. Comput Biol Med 2008; 38:785-91. [DOI: 10.1016/j.compbiomed.2008.04.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2006] [Revised: 03/03/2008] [Accepted: 04/14/2008] [Indexed: 11/27/2022]
|
10
|
Artificial neural networks estimating survival probability after treatment of choroidal melanoma. Ophthalmology 2008; 115:1598-607. [PMID: 18342942 DOI: 10.1016/j.ophtha.2008.01.032] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Revised: 11/28/2007] [Accepted: 01/30/2008] [Indexed: 11/27/2022] Open
Abstract
PURPOSE To describe neural networks predicting survival from choroidal melanoma (i.e., any uveal melanoma involving choroid) and to demonstrate the value of entering age, sex, clinical stage, cytogenetic type, and histologic grade into the predictive model. DESIGN Nonrandomized case series. PARTICIPANTS Patients resident in mainland Britain treated by the first author for choroidal melanoma between 1984 and 2006. METHODS A conditional hazard estimating neural network (CHENN) was trained according to the Bayesian formalism with a training set of 1780 patients and evaluated with a test set of another 874 patients. Conditional hazard estimating neural network-generated survival curves were compared with those obtained with Kaplan-Meier analyses. A second model was created with information on chromosome 3 loss, using training and test sets of 211 and 140 patients, respectively. MAIN OUTCOME MEASURES Comparison of CHENN survival curves with Kaplan-Meier analyses. Representative results showing all-cause survival and inferred melanoma-specific mortality, according to age, sex, clinical stage, cytogenetic type, and histologic grade. RESULTS The predictive model plotted a survival curve with 95% credibility intervals for patients with melanoma according to relevant risk factors: age, sex, largest basal tumor diameter, ciliary body involvement, extraocular extension, tumor cell type, closed loops, mitotic rate, and chromosome 3 loss (i.e., monosomy 3). A survival curve for the age-matched general population of the same sex allowed estimation of the melanoma-related mortality. All-cause survival curves generated by the CHENN matched those produced with Kaplan-Meier analysis (Kolmogorov-Smirnov, P<0.05). In older patients, however, the estimated melanoma-related mortality was lower with the CHENN, which accounted for competing risks, unlike Kaplan-Meier analysis. Largest basal tumor diameter was most predictive of mortality in tumors showing histologic and cytogenetic features of high-grade malignancy. Ciliary body involvement and extraocular extension lost significance when cytogenetic and histologic data were included in the model. Patients with a monosomy 3 melanoma of a particular size were predicted to have shorter survival if their tumor showed epithelioid cells and closed loops. CONCLUSIONS Estimation of survival prognosis in patients with choroidal melanoma requires multivariate assessment of age, sex, clinical tumor stage, cytogenetic melanoma type, and histologic grade of malignancy.
Collapse
|