1
|
Maisonnave M, Rajabi E, Taghavi M, VanBerkel P. Explainable machine learning to identify risk factors for unplanned hospital readmissions in Nova Scotian hospitals. Comput Biol Med 2025; 190:110024. [PMID: 40147186 DOI: 10.1016/j.compbiomed.2025.110024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 02/19/2025] [Accepted: 03/11/2025] [Indexed: 03/29/2025]
Abstract
OBJECTIVE A report from the Canadian Institute for Health Information found unplanned hospital readmissions (UHR) common, costly, and potentially avoidable, estimating a $1.8 billion cost to the Canadian healthcare system associated with inpatient readmissions within 30 days of discharge for the studied period (11 months). The first step towards addressing this costly problem is enabling early detection of patients at risk through detecting UHR risk factors. METHODOLOGY We utilized Machine Learning and explainability tools to examine risk factors for UHR within 30 days of discharge, utilizing data from Nova Scotian (Canada) healthcare institutions (2015-2022). To the best of our knowledge, our research constitutes the most comprehensive study on UHR risk factors for the province. RESULTS We found that predicting UHR solely from healthcare data has limitations, as discharge information often falls short of accurately predicting readmission occurrences. However, despite this inherent limitation, integrating explainability tools offers insights into the underlying factors contributing to readmission risk, empowering medical personnel with information to improve patient care and outcomes. As part of this work, we identify and report risk factors for UHR and build a guideline to support medical personnel's decision-making regarding targeted post-discharge follow-ups. We found that conditions such as heart failure and Chronic Obstructive Pulmonary Disease (COPD) are associated with a higher likelihood of readmission. Patients admitted for procedures related to childbirth have a lower probability of readmission. We studied the impact of the admission type, patient characteristics, and patient stay characteristics on UHR. For example, we found that new and elective admission patients are less likely to be readmitted, while patients who received a transfusion are more likely to be readmitted. CONCLUSIONS We validated the risk factors and the guidelines using real-world data. Our results suggested that our proposal correctly identifies risk factors and effectively produces valuable guidelines for medical personnel. The guideline evaluation suggests we can screen half the patients while capturing more than 72% of the readmission episodes. Our study contributes insights into the challenge of identifying risk factors for UHR while providing a practical guideline for healthcare professionals to identify factors influencing patient readmission, particularly within Nova Scotia.
Collapse
Affiliation(s)
- Mariano Maisonnave
- Management Science Department, Shannon School of Business, Cape Breton University, 1250 Grand Lake Rd, Sydney, B1M 1A2, NS, Canada.
| | - Enayat Rajabi
- Management Science Department, Shannon School of Business, Cape Breton University, 1250 Grand Lake Rd, Sydney, B1M 1A2, NS, Canada.
| | - Majid Taghavi
- Sobey School of Business, Saint Mary's University, 903 Robie St, Halifax, B3H 3C2, NS, Canada.
| | - Peter VanBerkel
- Department of Industrial Engineering, Dalhousie University, 5269 Morris St, Halifax, B3J 1B6, NS, Canada.
| |
Collapse
|
2
|
Corbin CK, Maclay R, Acharya A, Mony S, Punnathanam S, Thapa R, Kotecha N, Shah NH, Chen JH. DEPLOYR: a technical framework for deploying custom real-time machine learning models into the electronic medical record. J Am Med Inform Assoc 2023; 30:1532-1542. [PMID: 37369008 PMCID: PMC10436147 DOI: 10.1093/jamia/ocad114] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/16/2023] [Accepted: 06/13/2023] [Indexed: 06/29/2023] Open
Abstract
OBJECTIVE Heatlhcare institutions are establishing frameworks to govern and promote the implementation of accurate, actionable, and reliable machine learning models that integrate with clinical workflow. Such governance frameworks require an accompanying technical framework to deploy models in a resource efficient, safe and high-quality manner. Here we present DEPLOYR, a technical framework for enabling real-time deployment and monitoring of researcher-created models into a widely used electronic medical record system. MATERIALS AND METHODS We discuss core functionality and design decisions, including mechanisms to trigger inference based on actions within electronic medical record software, modules that collect real-time data to make inferences, mechanisms that close-the-loop by displaying inferences back to end-users within their workflow, monitoring modules that track performance of deployed models over time, silent deployment capabilities, and mechanisms to prospectively evaluate a deployed model's impact. RESULTS We demonstrate the use of DEPLOYR by silently deploying and prospectively evaluating 12 machine learning models trained using electronic medical record data that predict laboratory diagnostic results, triggered by clinician button-clicks in Stanford Health Care's electronic medical record. DISCUSSION Our study highlights the need and feasibility for such silent deployment, because prospectively measured performance varies from retrospective estimates. When possible, we recommend using prospectively estimated performance measures during silent trials to make final go decisions for model deployment. CONCLUSION Machine learning applications in healthcare are extensively researched, but successful translations to the bedside are rare. By describing DEPLOYR, we aim to inform machine learning deployment best practices and help bridge the model implementation gap.
Collapse
Affiliation(s)
- Conor K Corbin
- Department of Biomedical Data Science, Stanford, California, USA
| | - Rob Maclay
- Stanford Children’s Health, Palo Alto, California, USA
| | | | | | | | - Rahul Thapa
- Stanford Health Care, Palo Alto, California, USA
| | | | - Nigam H Shah
- Center for Biomedical Informatics Research, Division of Hospital Medicine, Department of Medicine, Stanford University, School of Medicine, Stanford, California, USA
| | - Jonathan H Chen
- Center for Biomedical Informatics Research, Division of Hospital Medicine, Department of Medicine, Stanford University, School of Medicine, Stanford, California, USA
| |
Collapse
|
3
|
Luo AL, Ravi A, Arvisais-Anhalt S, Muniyappa AN, Liu X, Wang S. Development and Internal Validation of an Interpretable Machine Learning Model to Predict Readmissions in a United States Healthcare System. INFORMATICS 2023. [DOI: 10.3390/informatics10020033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023] Open
Abstract
(1) One in four hospital readmissions is potentially preventable. Machine learning (ML) models have been developed to predict hospital readmissions and risk-stratify patients, but thus far they have been limited in clinical applicability, timeliness, and generalizability. (2) Methods: Using deidentified clinical data from the University of California, San Francisco (UCSF) between January 2016 and November 2021, we developed and compared four supervised ML models (logistic regression, random forest, gradient boosting, and XGBoost) to predict 30-day readmissions for adults admitted to a UCSF hospital. (3) Results: Of 147,358 inpatient encounters, 20,747 (13.9%) patients were readmitted within 30 days of discharge. The final model selected was XGBoost, which had an area under the receiver operating characteristic curve of 0.783 and an area under the precision-recall curve of 0.434. The most important features by Shapley Additive Explanations were days since last admission, discharge department, and inpatient length of stay. (4) Conclusions: We developed and internally validated a supervised ML model to predict 30-day readmissions in a US-based healthcare system. This model has several advantages including state-of-the-art performance metrics, the use of clinical data, the use of features available within 24 h of discharge, and generalizability to multiple disease states.
Collapse
|
4
|
APLUS: A Python library for usefulness simulations of machine learning models in healthcare. J Biomed Inform 2023; 139:104319. [PMID: 36791900 DOI: 10.1016/j.jbi.2023.104319] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 02/09/2023] [Accepted: 02/10/2023] [Indexed: 02/16/2023]
Abstract
Despite the creation of thousands of machine learning (ML) models, the promise of improving patient care with ML remains largely unrealized. Adoption into clinical practice is lagging, in large part due to disconnects between how ML practitioners evaluate models and what is required for their successful integration into care delivery. Models are just one component of care delivery workflows whose constraints determine clinicians' abilities to act on models' outputs. However, methods to evaluate the usefulness of models in the context of their corresponding workflows are currently limited. To bridge this gap we developed APLUS, a reusable framework for quantitatively assessing via simulation the utility gained from integrating a model into a clinical workflow. We describe the APLUS simulation engine and workflow specification language, and apply it to evaluate a novel ML-based screening pathway for detecting peripheral artery disease at Stanford Health Care.
Collapse
|
5
|
Huang SC, Chaudhari AS, Langlotz CP, Shah N, Yeung S, Lungren MP. Developing medical imaging AI for emerging infectious diseases. Nat Commun 2022; 13:7060. [PMID: 36400764 PMCID: PMC9672573 DOI: 10.1038/s41467-022-34234-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 10/19/2022] [Indexed: 11/19/2022] Open
Abstract
Very few of the COVID-19 ML models were fit for deployment in real-world settings. In this Comment, Huang et al. discuss the main steps required to develop clinically useful models in the context of an emerging infectious disease.
Collapse
Affiliation(s)
- Shih-Cheng Huang
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Center for Artificial Intelligence in Medicine & Imaging, Stanford University, Stanford, CA, USA.
| | - Akshay S Chaudhari
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Center for Artificial Intelligence in Medicine & Imaging, Stanford University, Stanford, CA, USA
- Department of Radiology, Stanford University, Stanford, CA, USA
| | - Curtis P Langlotz
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Center for Artificial Intelligence in Medicine & Imaging, Stanford University, Stanford, CA, USA
- Department of Radiology, Stanford University, Stanford, CA, USA
| | - Nigam Shah
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Serena Yeung
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Center for Artificial Intelligence in Medicine & Imaging, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Electrical Engineering, Stanford University, Stanford, CA, USA
- Clinical Excellence Research Center, Stanford University School of Medicine, Stanford, CA, USA
| | - Matthew P Lungren
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Center for Artificial Intelligence in Medicine & Imaging, Stanford University, Stanford, CA, USA
- Department of Radiology, Stanford University, Stanford, CA, USA
| |
Collapse
|
6
|
Campagner A, Sternini F, Cabitza F. Decisions are not all equal-Introducing a utility metric based on case-wise raters' perceptions. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 221:106930. [PMID: 35690505 DOI: 10.1016/j.cmpb.2022.106930] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 05/13/2022] [Accepted: 05/31/2022] [Indexed: 06/15/2023]
Abstract
Background and Objective Evaluation of AI-based decision support systems (AI-DSS) is of critical importance in practical applications, nonetheless common evaluation metrics fail to properly consider relevant and contextual information. In this article we discuss a novel utility metric, the weighted Utility (wU), for the evaluation of AI-DSS, which is based on the raters' perceptions of their annotation hesitation and of the relevance of the training cases. Methods We discuss the relationship between the proposed metric and other previous proposals; and we describe the application of the proposed metric for both model evaluation and optimization, through three realistic case studies. Results We show that our metric generalizes the well-known Net Benefit, as well as other common error-based and utility-based metrics. Through the empirical studies, we show that our metric can provide a more flexible tool for the evaluation of AI models. We also show that, compared to other optimization metrics, model optimization based on the wU can provide significantly better performance (AUC 0.862 vs 0.895, p-value <0.05), especially on cases judged to be more complex by the human annotators (AUC 0.85 vs 0.92, p-value <0.05). Conclusions We make the point for having utility as a primary concern in the evaluation and optimization of machine learning models in critical domains, like the medical one; and for the importance of a human-centred approach to assess the potential impact of AI models on human decision making also on the basis of further information that can be collected during the ground-truthing process.
Collapse
Affiliation(s)
- Andrea Campagner
- Dipartimento di Informatica, Sistemistica e Comunicazione, Università di Milano-Bicocca, Milano, Italy.
| | - Federico Sternini
- Polito(BIO)Med Lab, Politecnico di Torino, Torino, Italy; USE-ME-D srl, I3P Politecnico di Torino, Torino, Ital
| | - Federico Cabitza
- Dipartimento di Informatica, Sistemistica e Comunicazione, Università di Milano-Bicocca, Milano, Italy; IRCCS Istituto Ortopedico Galeazzi, Milan, Italy
| |
Collapse
|
7
|
Guo LL, Pfohl SR, Fries J, Johnson AEW, Posada J, Aftandilian C, Shah N, Sung L. Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Sci Rep 2022; 12:2726. [PMID: 35177653 PMCID: PMC8854561 DOI: 10.1038/s41598-022-06484-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 01/31/2022] [Indexed: 11/24/2022] Open
Abstract
Temporal dataset shift associated with changes in healthcare over time is a barrier to deploying machine learning-based clinical decision support systems. Algorithms that learn robust models by estimating invariant properties across time periods for domain generalization (DG) and unsupervised domain adaptation (UDA) might be suitable to proactively mitigate dataset shift. The objective was to characterize the impact of temporal dataset shift on clinical prediction models and benchmark DG and UDA algorithms on improving model robustness. In this cohort study, intensive care unit patients from the MIMIC-IV database were categorized by year groups (2008-2010, 2011-2013, 2014-2016 and 2017-2019). Tasks were predicting mortality, long length of stay, sepsis and invasive ventilation. Feedforward neural networks were used as prediction models. The baseline experiment trained models using empirical risk minimization (ERM) on 2008-2010 (ERM[08-10]) and evaluated them on subsequent year groups. DG experiment trained models using algorithms that estimated invariant properties using 2008-2016 and evaluated them on 2017-2019. UDA experiment leveraged unlabelled samples from 2017 to 2019 for unsupervised distribution matching. DG and UDA models were compared to ERM[08-16] models trained using 2008-2016. Main performance measures were area-under-the-receiver-operating-characteristic curve (AUROC), area-under-the-precision-recall curve and absolute calibration error. Threshold-based metrics including false-positives and false-negatives were used to assess the clinical impact of temporal dataset shift and its mitigation strategies. In the baseline experiments, dataset shift was most evident for sepsis prediction (maximum AUROC drop, 0.090; 95% confidence interval (CI), 0.080-0.101). Considering a scenario of 100 consecutively admitted patients showed that ERM[08-10] applied to 2017-2019 was associated with one additional false-negative among 11 patients with sepsis, when compared to the model applied to 2008-2010. When compared with ERM[08-16], DG and UDA experiments failed to produce more robust models (range of AUROC difference, - 0.003 to 0.050). In conclusion, DG and UDA failed to produce more robust models compared to ERM in the setting of temporal dataset shift. Alternate approaches are required to preserve model performance over time in clinical medicine.
Collapse
Affiliation(s)
- Lin Lawrence Guo
- Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada
| | - Stephen R Pfohl
- Biomedical Informatics Research, Stanford University, Palo Alto, USA
| | - Jason Fries
- Biomedical Informatics Research, Stanford University, Palo Alto, USA
| | - Alistair E W Johnson
- Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada
| | - Jose Posada
- Biomedical Informatics Research, Stanford University, Palo Alto, USA
| | | | - Nigam Shah
- Biomedical Informatics Research, Stanford University, Palo Alto, USA
| | - Lillian Sung
- Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada.
- Division of Haematology/Oncology, The Hospital for Sick Children, 555 University Avenue, Toronto, ON, M5G1X8, Canada.
| |
Collapse
|