1
|
Otieno JA, Häggström J, Darehed D, Eriksson M. Developing machine learning models to predict multi-class functional outcomes and death three months after stroke in Sweden. PLoS One 2024; 19:e0303287. [PMID: 38739586 PMCID: PMC11090298 DOI: 10.1371/journal.pone.0303287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 04/23/2024] [Indexed: 05/16/2024] Open
Abstract
Globally, stroke is the third-leading cause of mortality and disability combined, and one of the costliest diseases in society. More accurate predictions of stroke outcomes can guide healthcare organizations in allocating appropriate resources to improve care and reduce both the economic and social burden of the disease. We aim to develop and evaluate the performance and explainability of three supervised machine learning models and the traditional multinomial logistic regression (mLR) in predicting functional dependence and death three months after stroke, using routinely-collected data. This prognostic study included adult patients, registered in the Swedish Stroke Registry (Riksstroke) from 2015 to 2020. Riksstroke contains information on stroke care and outcomes among patients treated in hospitals in Sweden. Prognostic factors (features) included demographic characteristics, pre-stroke functional status, cardiovascular risk factors, medications, acute care, stroke type, and severity. The outcome was measured using the modified Rankin Scale at three months after stroke (a scale of 0-2 indicates independent, 3-5 dependent, and 6 dead). Outcome prediction models included support vector machines, artificial neural networks (ANN), eXtreme Gradient Boosting (XGBoost), and mLR. The models were trained and evaluated on 75% and 25% of the dataset, respectively. Model predictions were explained using SHAP values. The study included 102,135 patients (85.8% ischemic stroke, 53.3% male, mean age 75.8 years, and median NIHSS of 3). All models demonstrated similar overall accuracy (69%-70%). The ANN and XGBoost models performed significantly better than the mLR in classifying dependence with F1-scores of 0.603 (95% CI; 0.594-0.611) and 0.577 (95% CI; 0.568-0.586), versus 0.544 (95% CI; 0.545-0.563) for the mLR model. The factors that contributed most to the predictions were expectedly similar in the models, based on clinical knowledge. Our ANN and XGBoost models showed a modest improvement in prediction performance and explainability compared to mLR using routinely-collected data. Their improved ability to predict functional dependence may be of particular importance for the planning and organization of acute stroke care and rehabilitation.
Collapse
Affiliation(s)
| | - Jenny Häggström
- Department of Statistics, USBE, Umeå University, Umeå, Sweden
| | - David Darehed
- Department of Public Health and Clinical Medicine, Sunderby Research Unit, Umeå University, Umeå, Sweden
| | - Marie Eriksson
- Department of Statistics, USBE, Umeå University, Umeå, Sweden
| |
Collapse
|
2
|
Axford D, Sohel F, Abedi V, Zhu Y, Zand R, Barkoudah E, Krupica T, Iheasirim K, Sharma UM, Dugani SB, Takahashi PY, Bhagra S, Murad MH, Saposnik G, Yousufuddin M. Development and internal validation of machine learning-based models and external validation of existing risk scores for outcome prediction in patients with ischaemic stroke. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2024; 5:109-122. [PMID: 38505491 PMCID: PMC10944684 DOI: 10.1093/ehjdh/ztad073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 10/14/2023] [Accepted: 10/30/2023] [Indexed: 03/21/2024]
Abstract
Aims We developed new machine learning (ML) models and externally validated existing statistical models [ischaemic stroke predictive risk score (iScore) and totalled health risks in vascular events (THRIVE) scores] for predicting the composite of recurrent stroke or all-cause mortality at 90 days and at 3 years after hospitalization for first acute ischaemic stroke (AIS). Methods and results In adults hospitalized with AIS from January 2005 to November 2016, with follow-up until November 2019, we developed three ML models [random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBOOST)] and externally validated the iScore and THRIVE scores for predicting the composite outcomes after AIS hospitalization, using data from 721 patients and 90 potential predictor variables. At 90 days and 3 years, 11 and 34% of patients, respectively, reached the composite outcome. For the 90-day prediction, the area under the receiver operating characteristic curve (AUC) was 0.779 for RF, 0.771 for SVM, 0.772 for XGBOOST, 0.720 for iScore, and 0.664 for THRIVE. For 3-year prediction, the AUC was 0.743 for RF, 0.777 for SVM, 0.773 for XGBOOST, 0.710 for iScore, and 0.675 for THRIVE. Conclusion The study provided three ML-based predictive models that achieved good discrimination and clinical usefulness in outcome prediction after AIS and broadened the application of the iScore and THRIVE scoring system for long-term outcome prediction. Our findings warrant comparative analyses of ML and existing statistical method-based risk prediction tools for outcome prediction after AIS in new data sets.
Collapse
Affiliation(s)
- Daniel Axford
- Department of Information Technology, Mathematics and Statistics, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, Australia
| | - Ferdous Sohel
- Department of Information Technology, Mathematics and Statistics, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, Australia
| | - Vida Abedi
- Department of Public Health Science, Penn State College of Medicine, Hershey, PA, USA
| | - Ye Zhu
- Robert D. and Patricia E. Kern Centre for the Science of Healthcare Delivery, Mayo Clinic, Rochester, MN, USA
| | - Ramin Zand
- Neuroscience Institute, Geisinger Health System, 100 North Academy Ave, Danville, PA 17822, USA
- Neuroscience Institute, The Pennsylvania State University, Hershey, PA 17033, USA
| | - Ebrahim Barkoudah
- Internal Medicine/Hospital Medicine, Brigham and Women’s Hospital, Harvard University, Boston, MA, USA
| | - Troy Krupica
- Internal Medicine/Hospital Medicine, West Virginial University, Morgantown, WV, USA
| | - Kingsley Iheasirim
- Internal Medicine/Hospital Internal Medicine, Mayo Clinic Health System, Mankato, MN, USA
| | - Umesh M Sharma
- Hospital Internal Medicine, Mayo Clinic, Phoenix, AZ, USA
| | - Sagar B Dugani
- Hospital Internal Medicine, Mayo Clinic, Rochester, MN, USA
| | | | - Sumit Bhagra
- Endocrinology, Diabetes and Metabolism, Mayo Clinic Health System, Austin, MN, USA
| | - Mohammad H Murad
- Division of Public Health, Infectious Diseases, and Occupational Medicine, Mayo Clinic, Rochester, MN, USA
| | - Gustavo Saposnik
- Stroke Outcomes and Decision Neuroscience Research Unit, Division of Neurology, Department of Medicine and Li Ka Shing Knowledge Institute, St.Michael’s Hospital, University of Toronto, Toronto, Ontario, Canada
| | - Mohammed Yousufuddin
- Hospital Internal Medicine, Mayo Clinic Health System, 1000 1st Drive NW, Austin, MN 55912, USA
| |
Collapse
|
3
|
Wang W, Otieno JA, Eriksson M, Wolfe CD, Curcin V, Bray BD. Developing and externally validating a machine learning risk prediction model for 30-day mortality after stroke using national stroke registers in the UK and Sweden. BMJ Open 2023; 13:e069811. [PMID: 37968001 PMCID: PMC10660948 DOI: 10.1136/bmjopen-2022-069811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 07/27/2023] [Indexed: 11/17/2023] Open
Abstract
OBJECTIVES We aimed to develop and externally validate a generalisable risk prediction model for 30-day stroke mortality suitable for supporting quality improvement analytics in stroke care using large nationwide stroke registers in the UK and Sweden. DESIGN Registry-based cohort study. SETTING Stroke registries including the Sentinel Stroke National Audit Programme (SSNAP) in England, Wales and Northern Ireland (2013-2019) and the national Swedish stroke register (Riksstroke 2015-2020). PARTICIPANTS AND METHODS Data from SSNAP were used for developing and temporally validating the model, and data from Riksstroke were used for external validation. Models were developed with the variables available in both registries using logistic regression (LR), LR with elastic net and interaction terms and eXtreme Gradient Boosting (XGBoost). Performances were evaluated with discrimination, calibration and decision curves. OUTCOME MEASURES The primary outcome was all-cause 30-day in-hospital mortality after stroke. RESULTS In total, 488 497 patients who had a stroke with 12.4% 30-day in-hospital mortality were used for developing and temporally validating the model in the UK. A total of 128 360 patients who had a stroke with 10.8% 30-day in-hospital mortality and 13.1% all mortality were used for external validation in Sweden. In the SSNAP temporal validation set, the final XGBoost model achieved the highest area under the receiver operating characteristic curve (AUC) (0.852 (95% CI 0.848 to 0.855)) and was well calibrated. The performances on the external validation in Riksstroke were as good and achieved AUC at 0.861 (95% CI 0.858 to 0.865) for in-hospital mortality. For Riksstroke, the models slightly overestimated the risk for in-hospital mortality, while they were better calibrated at the risk for all mortality. CONCLUSION The risk prediction model was accurate and externally validated using high quality registry data. This is potentially suitable to be deployed as part of quality improvement analytics in stroke care to enable the fair comparison of stroke mortality outcomes across hospitals and health systems across countries.
Collapse
Affiliation(s)
- Wenjuan Wang
- Department of Population Health Sciences, King's College London, London, UK
| | | | | | - Charles D Wolfe
- Department of Population Health Sciences, King's College London, London, UK
| | - Vasa Curcin
- Department of Population Health Sciences, King's College London, London, UK
| | - Benjamin D Bray
- Department of Population Health Sciences, King's College London, London, UK
| |
Collapse
|
4
|
Wang W, Rudd AG, Wang Y, Curcin V, Wolfe CD, Peek N, Bray B. Correction: Risk prediction of 30-day mortality after stroke using machine learning: a nationwide registry-based cohort study. BMC Neurol 2022; 22:319. [PMID: 36008776 PMCID: PMC9404635 DOI: 10.1186/s12883-022-02840-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Affiliation(s)
- Wenjuan Wang
- School of Population Health & Environmental Sciences, Faculty of Life Science and Medicine, King's College London, London, UK.
| | - Anthony G Rudd
- School of Population Health & Environmental Sciences, Faculty of Life Science and Medicine, King's College London, London, UK
| | - Yanzhong Wang
- School of Population Health & Environmental Sciences, Faculty of Life Science and Medicine, King's College London, London, UK.,NIHR Biomedical Research Centre, Guy's and St Thomas' NHS Foundation Trust and King's College London, London, UK.,NIHR Applied Research Collaboration (ARC) South London, London, UK
| | - Vasa Curcin
- School of Population Health & Environmental Sciences, Faculty of Life Science and Medicine, King's College London, London, UK.,NIHR Biomedical Research Centre, Guy's and St Thomas' NHS Foundation Trust and King's College London, London, UK.,NIHR Applied Research Collaboration (ARC) South London, London, UK
| | - Charles D Wolfe
- School of Population Health & Environmental Sciences, Faculty of Life Science and Medicine, King's College London, London, UK.,NIHR Biomedical Research Centre, Guy's and St Thomas' NHS Foundation Trust and King's College London, London, UK.,NIHR Applied Research Collaboration (ARC) South London, London, UK
| | - Niels Peek
- Division of Informatics, Imaging and Data Science, School of Health Sciences, University of Manchester, Manchester, UK.,NIHR Manchester Biomedical Research Centre, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - Benjamin Bray
- School of Population Health & Environmental Sciences, Faculty of Life Science and Medicine, King's College London, London, UK
| |
Collapse
|