1
|
Wearable sensors in patient acuity assessment in critical care. Front Neurol 2024; 15:1386728. [PMID: 38784909 PMCID: PMC11112699 DOI: 10.3389/fneur.2024.1386728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 04/12/2024] [Indexed: 05/25/2024] Open
Abstract
Acuity assessments are vital for timely interventions and fair resource allocation in critical care settings. Conventional acuity scoring systems heavily depend on subjective patient assessments, leaving room for implicit bias and errors. These assessments are often manual, time-consuming, intermittent, and challenging to interpret accurately, especially for healthcare providers. This risk of bias and error is likely most pronounced in time-constrained and high-stakes environments, such as critical care settings. Furthermore, such scores do not incorporate other information, such as patients' mobility level, which can indicate recovery or deterioration in the intensive care unit (ICU), especially at a granular level. We hypothesized that wearable sensor data could assist in assessing patient acuity granularly, especially in conjunction with clinical data from electronic health records (EHR). In this prospective study, we evaluated the impact of integrating mobility data collected from wrist-worn accelerometers with clinical data obtained from EHR for estimating acuity. Accelerometry data were collected from 87 patients wearing accelerometers on their wrists in an academic hospital setting. The data was evaluated using five deep neural network models: VGG, ResNet, MobileNet, SqueezeNet, and a custom Transformer network. These models outperformed a rule-based clinical score (Sequential Organ Failure Assessment, SOFA) used as a baseline when predicting acuity state (for ground truth we labeled as unstable patients if they needed life-supporting therapies, and as stable otherwise), particularly regarding the precision, sensitivity, and F1 score. The results demonstrate that integrating accelerometer data with demographics and clinical variables improves predictive performance compared to traditional scoring systems in healthcare. Deep learning models consistently outperformed the SOFA score baseline across various scenarios, showing notable enhancements in metrics such as the area under the receiver operating characteristic (ROC) Curve (AUC), precision, sensitivity, specificity, and F1 score. The most comprehensive scenario, leveraging accelerometer, demographics, and clinical data, achieved the highest AUC of 0.73, compared to 0.53 when using SOFA score as the baseline, with significant improvements in precision (0.80 vs. 0.23), specificity (0.79 vs. 0.73), and F1 score (0.77 vs. 0.66). This study demonstrates a novel approach beyond the simplistic differentiation between stable and unstable conditions. By incorporating mobility and comprehensive patient information, we distinguish between these states in critically ill patients and capture essential nuances in physiology and functional status. Unlike rudimentary definitions, such as equating low blood pressure with instability, our methodology delves deeper, offering a more holistic understanding and potentially valuable insights for acuity assessment.
Collapse
|
2
|
The dilemma of consent for AI in healthcare. Surgery 2024; 175:1456-1457. [PMID: 38413305 DOI: 10.1016/j.surg.2024.01.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 01/17/2024] [Indexed: 02/29/2024]
|
3
|
Identifying acute illness phenotypes via deep temporal interpolation and clustering network on physiologic signatures. Sci Rep 2024; 14:8442. [PMID: 38600110 PMCID: PMC11006654 DOI: 10.1038/s41598-024-59047-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 04/05/2024] [Indexed: 04/12/2024] Open
Abstract
Using clustering analysis for early vital signs, unique patient phenotypes with distinct pathophysiological signatures and clinical outcomes may be revealed and support early clinical decision-making. Phenotyping using early vital signs has proven challenging, as vital signs are typically sampled sporadically. We proposed a novel, deep temporal interpolation and clustering network to simultaneously extract latent representations from irregularly sampled vital signs and derive phenotypes. Four distinct clusters were identified. Phenotype A (18%) had the greatest prevalence of comorbid disease with increased prevalence of prolonged respiratory insufficiency, acute kidney injury, sepsis, and long-term (3-year) mortality. Phenotypes B (33%) and C (31%) had a diffuse pattern of mild organ dysfunction. Phenotype B's favorable short-term clinical outcomes were tempered by the second highest rate of long-term mortality. Phenotype C had favorable clinical outcomes. Phenotype D (17%) exhibited early and persistent hypotension, high incidence of early surgery, and substantial biomarker incidence of inflammation. Despite early and severe illness, phenotype D had the second lowest long-term mortality. After comparing the sequential organ failure assessment scores, the clustering results did not simply provide a recapitulation of previous acuity assessments. This tool may impact triage decisions and have significant implications for clinical decision-support under time constraints and uncertainty.
Collapse
|
4
|
Use of artificial intelligence in critical care: opportunities and obstacles. Crit Care 2024; 28:113. [PMID: 38589940 PMCID: PMC11000355 DOI: 10.1186/s13054-024-04860-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 03/05/2024] [Indexed: 04/10/2024] Open
Abstract
BACKGROUND Perhaps nowhere else in the healthcare system than in the intensive care unit environment are the challenges to create useful models with direct time-critical clinical applications more relevant and the obstacles to achieving those goals more massive. Machine learning-based artificial intelligence (AI) techniques to define states and predict future events are commonplace activities of modern life. However, their penetration into acute care medicine has been slow, stuttering and uneven. Major obstacles to widespread effective application of AI approaches to the real-time care of the critically ill patient exist and need to be addressed. MAIN BODY Clinical decision support systems (CDSSs) in acute and critical care environments support clinicians, not replace them at the bedside. As will be discussed in this review, the reasons are many and include the immaturity of AI-based systems to have situational awareness, the fundamental bias in many large databases that do not reflect the target population of patient being treated making fairness an important issue to address and technical barriers to the timely access to valid data and its display in a fashion useful for clinical workflow. The inherent "black-box" nature of many predictive algorithms and CDSS makes trustworthiness and acceptance by the medical community difficult. Logistically, collating and curating in real-time multidimensional data streams of various sources needed to inform the algorithms and ultimately display relevant clinical decisions support format that adapt to individual patient responses and signatures represent the efferent limb of these systems and is often ignored during initial validation efforts. Similarly, legal and commercial barriers to the access to many existing clinical databases limit studies to address fairness and generalizability of predictive models and management tools. CONCLUSIONS AI-based CDSS are evolving and are here to stay. It is our obligation to be good shepherds of their use and further development.
Collapse
|
5
|
Gender differences in autonomy and performance assessments in a national cohort of vascular surgery trainees. J Vasc Surg 2024:S0741-5214(24)00498-1. [PMID: 38493897 DOI: 10.1016/j.jvs.2024.03.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/22/2024] [Accepted: 03/09/2024] [Indexed: 03/19/2024]
Abstract
OBJECTIVE Gender disparities in surgical training and assessment are described in the general surgery literature. Assessment disparities have not been explored in vascular surgery. We sought to investigate gender disparities in operative assessment in a national cohort of vascular surgery integrated residents (VIRs) and fellows (VSFs). METHODS Operative performance and autonomy ratings from the Society for Improving Medical Professional Learning (SIMPL) application database were collected for all vascular surgery participating institutions from 2018 to 2023. Logistic generalized linear mixed models were conducted to examine the association of faculty and trainee gender on faculty and self-assessment of autonomy and performance. Data were adjusted for post-graduate year and case complexity. Random effects were included to account for clustering effects due to participant, program, and procedure. RESULTS One hundred three trainees (n = 63 VIRs; n = 40 VSFs; 63.1% men) and 99 faculty (73.7% men) from 17 institutions (n = 12 VIR and n = 13 VSF programs) contributed 4951 total assessments (44.4% by faculty, 55.6% by trainees) across 235 unique procedures. Faculty and trainee gender were not associated with faculty ratings of performance (faculty gender: odds ratio [OR], 0.78; 95% confidence interval [CI], 0.27-2.29; trainee gender: OR, 1.80; 95% CI, 0.76-0.43) or autonomy (faculty gender: OR, 0.99; 95% CI, 0.41-2.39; trainee gender: OR, 1.23; 95% CI, 0.62-2.45) of trainees. All trainees self-assessed at lower performance and autonomy ratings as compared with faculty assessments. However, women trainees rated themselves significantly lower than men for both autonomy (OR, 0.57; 95% CI, 0.43-0.74) and performance (OR, 0.40; 95% CI, 0.30-0.54). CONCLUSIONS Although gender was not associated with differences in faculty assessment of performance or autonomy among vascular surgery trainees, women trainees perceive themselves as performing with lower competency and less autonomy than their male colleagues. These findings suggest utility for exploring gender differences in real-time feedback delivered to and received by trainees and targeted interventions to align trainee self-perception with actual operative performance and autonomy to optimize surgical skill acquisition.
Collapse
|
6
|
The dawn of multimodal artificial intelligence in nephrology. Nat Rev Nephrol 2024; 20:79-80. [PMID: 38097775 DOI: 10.1038/s41581-023-00799-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
|
7
|
Digital health and acute kidney injury: consensus report of the 27th Acute Disease Quality Initiative workgroup. Nat Rev Nephrol 2023; 19:807-818. [PMID: 37580570 DOI: 10.1038/s41581-023-00744-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/06/2023] [Indexed: 08/16/2023]
Abstract
Acute kidney injury (AKI), which is a common complication of acute illnesses, affects the health of individuals in community, acute care and post-acute care settings. Although the recognition, prevention and management of AKI has advanced over the past decades, its incidence and related morbidity, mortality and health care burden remain overwhelming. The rapid growth of digital technologies has provided a new platform to improve patient care, and reports show demonstrable benefits in care processes and, in some instances, in patient outcomes. However, despite great progress, the potential benefits of using digital technology to manage AKI has not yet been fully explored or implemented in clinical practice. Digital health studies in AKI have shown variable evidence of benefits, and the digital divide means that access to digital technologies is not equitable. Upstream research and development costs, limited stakeholder participation and acceptance, and poor scalability of digital health solutions have hindered their widespread implementation and use. Here, we provide recommendations from the Acute Disease Quality Initiative consensus meeting, which involved experts in adult and paediatric nephrology, critical care, pharmacy and data science, at which the use of digital health for risk prediction, prevention, identification and management of AKI and its consequences was discussed.
Collapse
|
8
|
Utilization of the percutaneous left ventricular support as bridge to heart transplantation across the United States: In-depth UNOS database analysis. J Heart Lung Transplant 2023; 42:1597-1607. [PMID: 37307906 DOI: 10.1016/j.healun.2023.06.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 05/12/2023] [Accepted: 06/06/2023] [Indexed: 06/14/2023] Open
Abstract
BACKGROUND Intra-aortic balloon pump (IABP) and Impella device utilization as a bridge to heart transplantation (HTx) have risen exponentially. We aimed to explore the influence of device selection on HTx outcomes, considering regional practice variation. METHODS A retrospective longitudinal study was performed on a United Network for Organ Sharing (UNOS) registry dataset. We included adult patients listed for HTx between October 2018 and April 2022 as status 2, as justified by requiring IABP or Impella support. The primary end-point was successful bridging to HTx as status 2. RESULTS Of 32,806 HTx during the study period, 4178 met inclusion criteria (Impella n = 650, IABP n = 3528). Waitlist mortality increased from a nadir of 16 (in 2019) to a peak of 36 (in 2022) per thousand status 2 listed patients. Impella annual use increased from 8% in 2019 to 19% in 2021. Compared to IABP, Impella patients demonstrated higher medical acuity and lower success rate of transplantation as status 2 (92.1% vs 88.9%, p < 0.001). The IABP:Impella utilization ratio varied widely between regions, ranging from 1.77 to 21.31, with high Impella use in Southern and Western states. However, this difference was not justified by medical acuity, regional transplant volume, or waitlist time and did not correlate with waitlist mortality. CONCLUSIONS The shift in utilizing Impella as opposed to IABP did not improve waitlist outcomes. Our results suggest that clinical practice patterns beyond mere device selection determine successful bridging to HTx. There is a critical need for objective evidence to guide tMCS utilization and a paradigm shift in the UNOS allocation system to achieve equitable HTx practice across the United States.
Collapse
|
9
|
Dynamic Delirium Prediction in the Intensive Care Unit using Machine Learning on Electronic Health Records. ... IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS. IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS 2023; 2023:10.1109/bhi58575.2023.10313445. [PMID: 38585187 PMCID: PMC10998264 DOI: 10.1109/bhi58575.2023.10313445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Delirium is a syndrome of acute brain failure which is prevalent amongst older adults in the Intensive Care Unit (ICU). Incidence of delirium can significantly worsen prognosis and increase mortality, therefore necessitating its rapid and continual assessment in the ICU. Currently, the common approach for delirium assessment is manual and sporadic. Hence, there exists a critical need for a robust and automated system for predicting delirium in the ICU. In this work, we develop a machine learning (ML) system for real-time prediction of delirium using Electronic Health Record (EHR) data. Unlike prior approaches which provide one delirium prediction label per entire ICU stay, our approach provides predictions every 12 hours. We use the latest 12 hours of ICU data, along with patient demographic and medical history data, to predict delirium risk in the next 12-hour window. This enables delirium risk prediction as soon as 12 hours after ICU admission. We train and test four ML classification algorithms on longitudinal EHR data pertaining to 16,327 ICU stays of 13,395 patients covering a total of 56,297 12-hour windows in the ICU to predict the dynamic incidence of delirium. The best performing algorithm was Categorical Boosting which achieved an area under receiver operating characteristic curve (AUROC) of 0.87 (95% Confidence Interval; C.I, 0.86-0.87). The deployment of this ML system in ICUs can enable early identification of delirium, thereby reducing its deleterious impact on long-term adverse outcomes, such as ICU cost, length of stay and mortality.
Collapse
|
10
|
A deep learning-based dynamic model for predicting acute kidney injury risk severity in postoperative patients. Surgery 2023; 174:709-714. [PMID: 37316372 PMCID: PMC10683578 DOI: 10.1016/j.surg.2023.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 04/17/2023] [Accepted: 05/12/2023] [Indexed: 06/16/2023]
Abstract
BACKGROUND Acute kidney injury is a common postoperative complication affecting between 10% and 30% of surgical patients. Acute kidney injury is associated with increased resource usage and chronic kidney disease development, with more severe acute kidney injury suggesting more aggressive deterioration in clinical outcomes and mortality. METHODS We considered 42,906 surgical patients admitted to University of Florida Health (n = 51,806) between 2014 and 2021. Acute kidney injury stages were determined using the Kidney Disease Improving Global Outcomes serum creatinine criteria. We developed a recurrent neural network-based model to continuously predict acute kidney injury risk and state in the following 24 hours and compared it with logistic regression, random forest, and multi-layer perceptron models. We used medications, laboratory and vital measurements, and derived features from past one-year records as inputs. We analyzed the proposed model with integrated gradients for enhanced explainability. RESULTS Postoperative acute kidney injury at any stage developed in 20% (10,664) of the cohort. The recurrent neural network model was more accurate in predicting nearly all categories of next-day acute kidney injury stages (including the no acute kidney injury group). The area under the receiver operating curve and 95% confidence intervals for recurrent neural network and logistic regression models were for no acute kidney injury (0.98 [0.98-0.98] vs 0.93 [0.93-0.93]), stage 1 (0.95 [0.95-0.95] vs. 0.81 [0.80-0.82]), stage 2/3 (0.99 [0.99-0.99] vs 0.96 [0.96-0.97]), and stage 3 with renal replacement therapy (1.0 [1.0-1.0] vs 1.0 [1.0-1.0]. CONCLUSION The proposed model demonstrates that temporal processing of patient information can lead to more granular and dynamic modeling of acute kidney injury status and result in more continuous and accurate acute kidney injury prediction. We showcase the integrated gradients framework's utility as a mechanism for enhancing model explainability, potentially facilitating clinical trust for future implementation.
Collapse
|
11
|
Digital Health Transformers and Opportunities for Artificial Intelligence-Enabled Nephrology. Clin J Am Soc Nephrol 2023; 18:527-529. [PMID: 36750442 PMCID: PMC10103323 DOI: 10.2215/cjn.0000000000000085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
12
|
Determinants of Successful Bridging to Heart Transplantation on Temporary Percutaneous Left Ventricular Support - An Insight Using Artificial Intelligence. J Heart Lung Transplant 2023. [DOI: 10.1016/j.healun.2023.02.764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023] Open
|
13
|
A simulation curriculum for laparoscopic common bile duct exploration, balloon sphincterotomy, and endobiliary stenting: Associations with resident performance and autonomy in the operating room. Surgery 2023; 173:950-956. [PMID: 36517292 DOI: 10.1016/j.surg.2022.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 10/30/2022] [Accepted: 11/11/2022] [Indexed: 12/15/2022]
Abstract
BACKGROUND Laparoscopic common bile duct exploration is safe and effective for managing choledocholithiasis, but laparoscopic common bile duct exploration is rarely performed, which threatens surgical trainee proficiency. This study tests the hypothesis that prior operative or simulation experience with laparoscopic common bile duct exploration is associated with greater resident operative performance and autonomy without adversely affecting patient outcomes. METHODS This longitudinal cohort study included 33 consecutive patients undergoing laparoscopic common bile duct exploration in cases involving postgraduate years 3, 4, and 5 general surgery residents at a single institution during the implementation of a laparoscopic common bile duct exploration simulation curriculum. For each of the 33 cases, resident performance and autonomy were rated by residents and attendings, the resident's prior operative and simulation experience were recorded, and patient outcomes were ascertained from electronic health records for comparison among 3 cohorts: prior operative experience, prior simulation experience, and no prior experience. RESULTS Operative approach was similar among cohorts. Overall morbidity was 6.1% and similar across cohorts. The operative performance scores were higher in prior experience cohorts according to both residents (3.0 [2.8-3.0] vs 2.0 [2.0-3.0]; P = .01) and attendings (3.0 [3.0-4.0]; P < .001). The autonomy scores were higher in prior experience cohorts according to both residents (2.0 [2.0-3.0] vs 2.0 [2.0-2.0]; P = .005) and attendings (2.5 [2.0-3.0] vs 2.0 [1.0-2.0]; P = .001). Prior simulation and prior operative experience had similar associations with performance and autonomy. CONCLUSION Simulation experience with laparoscopic common bile duct exploration was associated with greater resident operative performance and autonomy, with effects that mimic prior operative experience. This illustrates the potential for simulation-based training to improve resident operative performance and autonomy for laparoscopic common bile duct exploration.
Collapse
|
14
|
Reinforcement Learning for Clinical Applications. Clin J Am Soc Nephrol 2023; 18:521-523. [PMID: 36750034 PMCID: PMC10103233 DOI: 10.2215/cjn.0000000000000084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
15
|
Computable Phenotypes to Characterize Changing Patient Brain Dysfunction in the Intensive Care Unit. ARXIV 2023:2303.05504. [PMID: 36945691 PMCID: PMC10029051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Abstract
In the United States, more than 5 million patients are admitted annually to ICUs, with ICU mortality of 10%-29% and costs over $82 billion. Acute brain dysfunction status, delirium, is often underdiagnosed or undervalued. This study's objective was to develop automated computable phenotypes for acute brain dysfunction states and describe transitions among brain dysfunction states to illustrate the clinical trajectories of ICU patients. We created two single-center, longitudinal EHR datasets for 48,817 adult patients admitted to an ICU at UFH Gainesville (GNV) and Jacksonville (JAX). We developed algorithms to quantify acute brain dysfunction status including coma, delirium, normal, or death at 12-hour intervals of each ICU admission and to identify acute brain dysfunction phenotypes using continuous acute brain dysfunction status and k-means clustering approach. There were 49,770 admissions for 37,835 patients in UFH GNV dataset and 18,472 admissions for 10,982 patients in UFH JAX dataset. In total, 18% of patients had coma as the worst brain dysfunction status; every 12 hours, around 4%-7% would transit to delirium, 22%-25% would recover, 3%-4% would expire, and 67%-68% would remain in a coma in the ICU. Additionally, 7% of patients had delirium as the worst brain dysfunction status; around 6%-7% would transit to coma, 40%-42% would be no delirium, 1% would expire, and 51%-52% would remain delirium in the ICU. There were three phenotypes: persistent coma/delirium, persistently normal, and transition from coma/delirium to normal almost exclusively in first 48 hours after ICU admission. We developed phenotyping scoring algorithms that determined acute brain dysfunction status every 12 hours while admitted to the ICU. This approach may be useful in developing prognostic and decision-support tools to aid patients and clinicians in decision-making on resource use and escalation of care.
Collapse
|
16
|
Resident Operative Autonomy and Attending Verbal Feedback Differ by Resident and Attending Gender. ANNALS OF SURGERY OPEN 2023; 4:e256. [PMID: 37600892 PMCID: PMC10431433 DOI: 10.1097/as9.0000000000000256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 01/02/2023] [Indexed: 02/05/2023] Open
Abstract
Objectives This study tests the null hypotheses that overall sentiment and gendered words in verbal feedback and resident operative autonomy relative to performance are similar for female and male residents. Background Female and male surgical residents may experience training differently, affecting the quality of learning and graduated autonomy. Methods A longitudinal, observational study using a Society for Improving Medical Professional Learning collaborative dataset describing resident and attending evaluations of resident operative performance and autonomy and recordings of verbal feedback from attendings from surgical procedures performed at 54 US general surgery residency training programs from 2016 to 2021. Overall sentiment, adjectives, and gendered words in verbal feedback were quantified by natural language processing. Resident operative autonomy and performance, as evaluated by attendings, were reported on 5-point ordinal scales. Performance-adjusted autonomy was calculated as autonomy minus performance. Results The final dataset included objective assessments and dictated feedback for 2683 surgical procedures. Sentiment scores were higher for female residents (95 [interquartile range (IQR), 4-100] vs 86 [IQR 2-100]; P < 0.001). Gendered words were present in a greater proportion of dictations for female residents (29% vs 25%; P = 0.04) due to male attendings disproportionately using male-associated words in feedback for female residents (28% vs 23%; P = 0.01). Overall, attendings reported that male residents received greater performance-adjusted autonomy compared with female residents (P < 0.001). Conclusions Sentiment and gendered words in verbal feedback and performance-adjusted operative autonomy differed for female and male general surgery residents. These findings suggest a need to ensure that trainees are given appropriate and equitable operative autonomy and feedback.
Collapse
|
17
|
Spatially Aware Transformer Networks for Contextual Prediction of Diabetic Nephropathy Progression from Whole Slide Images. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.20.23286044. [PMID: 36865174 PMCID: PMC9980230 DOI: 10.1101/2023.02.20.23286044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Diabetic nephropathy (DN) in the context of type 2 diabetes is the leading cause of end-stage renal disease (ESRD) in the United States. DN is graded based on glomerular morphology and has a spatially heterogeneous presentation in kidney biopsies that complicates pathologists' predictions of disease progression. Artificial intelligence and deep learning methods for pathology have shown promise for quantitative pathological evaluation and clinical trajectory estimation; but, they often fail to capture large-scale spatial anatomy and relationships found in whole slide images (WSIs). In this study, we present a transformer-based, multi-stage ESRD prediction framework built upon nonlinear dimensionality reduction, relative Euclidean pixel distance embeddings between every pair of observable glomeruli, and a corresponding spatial self-attention mechanism for a robust contextual representation. We developed a deep transformer network for encoding WSI and predicting future ESRD using a dataset of 56 kidney biopsy WSIs from DN patients at Seoul National University Hospital. Using a leave-one-out cross-validation scheme, our modified transformer framework outperformed RNNs, XGBoost, and logistic regression baseline models, and resulted in an area under the receiver operating characteristic curve (AUC) of 0.97 (95% CI: 0.90-1.00) for predicting two-year ESRD, compared with an AUC of 0.86 (95% CI: 0.66-0.99) without our relative distance embedding, and an AUC of 0.76 (95% CI: 0.59-0.92) without a denoising autoencoder module. While the variability and generalizability induced by smaller sample sizes are challenging, our distance-based embedding approach and overfitting mitigation techniques yielded results that sugest opportunities for future spatially aware WSI research using limited pathology datasets.
Collapse
|
18
|
Building an automated, machine learning-enabled platform for predicting post-operative complications. Physiol Meas 2023; 44:024001. [PMID: 36657179 PMCID: PMC9910093 DOI: 10.1088/1361-6579/acb4db] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 12/29/2022] [Accepted: 01/19/2023] [Indexed: 01/21/2023]
Abstract
Objective. In 2019, the University of Florida College of Medicine launched theMySurgeryRiskalgorithm to predict eight major post-operative complications using automatically extracted data from the electronic health record.Approach. This project was developed in parallel with our Intelligent Critical Care Center and represents a culmination of efforts to build an efficient and accurate model for data processing and predictive analytics.Main Results and Significance. This paper discusses how our model was constructed and improved upon. We highlight the consolidation of the database, processing of fixed and time-series physiologic measurements, development and training of predictive models, and expansion of those models into different aspects of patient assessment and treatment. We end by discussing future directions of the model.
Collapse
|
19
|
Surgical resident experience with common bile duct exploration and assessment of performance and autonomy with formative feedback. World J Emerg Surg 2023; 18:13. [PMID: 36747289 PMCID: PMC9901129 DOI: 10.1186/s13017-023-00480-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 01/23/2023] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Common bile duct exploration (CBDE) is safe and effective for managing choledocholithiasis, but most US general surgeons have limited experience with CBDE and are uncomfortable performing this procedure in practice. Surgical trainee exposure to CBDE is limited, and their learning curve for achieving autonomous, practice-ready performance has not been previously described. This study tests the hypothesis that receipt of one or more prior CBDE operative performance assessments, combined with formative feedback, is associated with greater resident operative performance and autonomy. METHODS Resident and attending assessments of resident operative performance and autonomy were obtained for 189 laparoscopic or open CBDEs performed at 28 institutions. Performance and autonomy were graded along validated ordinal scales. Cases in which the resident had one or more prior CBDE case evaluations (n = 48) were compared with cases in which the resident had no prior evaluations (n = 141). RESULTS Compared with cases in which the resident had no prior CBDE case evaluations, cases with a prior evaluation had greater proportions of practice-ready or exceptional performance ratings according to both residents (27% vs. 11%, p = .009) and attendings (58% vs. 19%, p < .001) and had greater proportions of passive help or supervision only autonomy ratings according to both residents (17% vs. 4%, p = .009) and attendings (69% vs. 32%, p < .01). CONCLUSIONS Residents with at least one prior CBDE evaluation and formative feedback demonstrated better operative performance and received greater autonomy than residents without prior evaluations, underscoring the propensity of feedback to help residents achieve autonomous, practice-ready performance for rare operations.
Collapse
|
20
|
Overtriage, Undertriage, and Value of Care after Major Surgery: An Automated, Explainable Deep Learning-Enabled Classification System. J Am Coll Surg 2023; 236:279-291. [PMID: 36648256 PMCID: PMC9993068 DOI: 10.1097/xcs.0000000000000471] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
BACKGROUND In single-institution studies, overtriaging low-risk postoperative patients to ICUs has been associated with a low value of care; undertriaging high-risk postoperative patients to general wards has been associated with increased mortality and morbidity. This study tested the reproducibility of an automated postoperative triage classification system to generating an actionable, explainable decision support system. STUDY DESIGN This longitudinal cohort study included adults undergoing inpatient surgery at two university hospitals. Triage classifications were generated by an explainable deep learning model using preoperative and intraoperative electronic health record features. Nearest neighbor algorithms identified risk-matched controls. Primary outcomes were mortality, morbidity, and value of care (inverted risk-adjusted mortality/total direct costs). RESULTS Among 4,669 ICU admissions, 237 (5.1%) were overtriaged. Compared with 1,021 control ward admissions, overtriaged admissions had similar outcomes but higher costs ($15.9K [interquartile range $9.8K to $22.3K] vs $10.7K [$7.0K to $17.6K], p < 0.001) and lower value of care (0.2 [0.1 to 0.3] vs 1.5 [0.9 to 2.2], p < 0.001). Among 8,594 ward admissions, 1,029 (12.0%) were undertriaged. Compared with 2,498 control ICU admissions, undertriaged admissions had longer hospital length-of-stays (6.4 [3.4 to 12.4] vs 5.4 [2.6 to 10.4] days, p < 0.001); greater incidence of hospital mortality (1.7% vs 0.7%, p = 0.03), cardiac arrest (1.4% vs 0.5%, p = 0.04), and persistent acute kidney injury without renal recovery (5.2% vs 2.8%, p = 0.002); similar costs ($21.8K [$13.3K to $34.9K] vs $21.9K [$13.1K to $36.3K]); and lower value of care (0.8 [0.5 to 1.3] vs 1.2 [0.7 to 2.0], p < 0.001). CONCLUSIONS Overtriage was associated with low value of care; undertriage was associated with both low value of care and increased mortality and morbidity. The proposed framework for generating automated postoperative triage classifications is reproducible.
Collapse
|
21
|
Spatially Aware Transformer Networks for Contextual Prediction of Diabetic Nephropathy Progression from Whole Slide Images. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2023; 12471:124710K. [PMID: 37818350 PMCID: PMC10563813 DOI: 10.1117/12.2655266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/12/2023]
Abstract
Diabetic nephropathy (DN) in the context of type 2 diabetes is the leading cause of end-stage renal disease (ESRD) in the United States. DN is graded based on glomerular morphology and has a spatially heterogeneous presentation in kidney biopsies that complicates pathologists' predictions of disease progression. Artificial intelligence and deep learning methods for pathology have shown promise for quantitative pathological evaluation and clinical trajectory estimation; but, they often fail to capture large-scale spatial anatomy and relationships found in whole slide images (WSIs). In this study, we present a transformer-based, multi-stage ESRD prediction framework built upon nonlinear dimensionality reduction, relative Euclidean pixel distance embeddings between every pair of observable glomeruli, and a corresponding spatial self-attention mechanism for a robust contextual representation. We developed a deep transformer network for encoding WSI and predicting future ESRD using a dataset of 56 kidney biopsy WSIs from DN patients at Seoul National University Hospital. Using a leave-one-out cross-validation scheme, our modified transformer framework outperformed RNNs, XGBoost, and logistic regression baseline models, and resulted in an area under the receiver operating characteristic curve (AUC) of 0.97 (95% CI: 0.90-1.00) for predicting two-year ESRD, compared with an AUC of 0.86 (95% CI: 0.66-0.99) without our relative distance embedding, and an AUC of 0.76 (95% CI: 0.59-0.92) without a denoising autoencoder module. While the variability and generalizability induced by smaller sample sizes are challenging, our distance-based embedding approach and overfitting mitigation techniques yielded results that suggest opportunities for future spatially aware WSI research using limited pathology datasets.
Collapse
|
22
|
Abstract
OBJECTIVE We test the hypothesis that for low-acuity surgical patients, postoperative intensive care unit (ICU) admission is associated with lower value of care compared with ward admission. BACKGROUND Overtriaging low-acuity patients to ICU consumes valuable resources and may not confer better patient outcomes. Associations among postoperative overtriage, patient outcomes, costs, and value of care have not been previously reported. METHODS In this longitudinal cohort study, postoperative ICU admissions were classified as overtriaged or appropriately triaged according to machine learning-based patient acuity assessments and requirements for immediate postoperative mechanical ventilation or vasopressor support. The nearest neighbors algorithm identified risk-matched control ward admissions. The primary outcome was value of care, calculated as inverse observed-to-expected mortality ratios divided by total costs. RESULTS Acuity assessments had an area under the receiver operating characteristic curve of 0.92 in generating predictions for triage classifications. Of 8592 postoperative ICU admissions, 423 (4.9%) were overtriaged. These were matched with 2155 control ward admissions with similar comorbidities, incidence of emergent surgery, immediate postoperative vital signs, and do not resuscitate order placement and rescindment patterns. Compared with controls, overtraiged admissions did not have a lower incidence of any measured complications. Total costs for admission were $16.4K for overtriage and $15.9K for controls ( P =0.03). Value of care was lower for overtriaged admissions [2.9 (2.0-4.0)] compared with controls [24.2 (14.1-34.5), P <0.001]. CONCLUSIONS Low-acuity postoperative patients who were overtriaged to ICUs had increased total costs, no improvements in outcomes, and received low-value care.
Collapse
|
23
|
Dynamic predictions of postoperative complications from explainable, uncertainty-aware, and multi-task deep neural networks. Sci Rep 2023; 13:1224. [PMID: 36681755 PMCID: PMC9867692 DOI: 10.1038/s41598-023-27418-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 01/01/2023] [Indexed: 01/22/2023] Open
Abstract
Accurate prediction of postoperative complications can inform shared decisions regarding prognosis, preoperative risk-reduction, and postoperative resource use. We hypothesized that multi-task deep learning models would outperform conventional machine learning models in predicting postoperative complications, and that integrating high-resolution intraoperative physiological time series would result in more granular and personalized health representations that would improve prognostication compared to preoperative predictions. In a longitudinal cohort study of 56,242 patients undergoing 67,481 inpatient surgical procedures at a university medical center, we compared deep learning models with random forests and XGBoost for predicting nine common postoperative complications using preoperative, intraoperative, and perioperative patient data. Our study indicated several significant results across experimental settings that suggest the utility of deep learning for capturing more precise representations of patient health for augmented surgical decision support. Multi-task learning improved efficiency by reducing computational resources without compromising predictive performance. Integrated gradients interpretability mechanisms identified potentially modifiable risk factors for each complication. Monte Carlo dropout methods provided a quantitative measure of prediction uncertainty that has the potential to enhance clinical trust. Multi-task learning, interpretability mechanisms, and uncertainty metrics demonstrated potential to facilitate effective clinical implementation.
Collapse
|
24
|
Artificial intelligence guidance of advanced heart failure therapies: A systematic scoping review. Front Cardiovasc Med 2023; 10:1127716. [PMID: 36910520 PMCID: PMC9999024 DOI: 10.3389/fcvm.2023.1127716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 02/07/2023] [Indexed: 03/14/2023] Open
Abstract
Introduction Artificial intelligence can recognize complex patterns in large datasets. It is a promising technology to advance heart failure practice, as many decisions rely on expert opinions in the absence of high-quality data-driven evidence. Methods We searched Embase, Web of Science, and PubMed databases for articles containing "artificial intelligence," "machine learning," or "deep learning" and any of the phrases "heart transplantation," "ventricular assist device," or "cardiogenic shock" from inception until August 2022. We only included original research addressing post heart transplantation (HTx) or mechanical circulatory support (MCS) clinical care. Review and data extraction were performed in accordance with PRISMA-Scr guidelines. Results Of 584 unique publications detected, 31 met the inclusion criteria. The majority focused on outcome prediction post HTx (n = 13) and post durable MCS (n = 7), as well as post HTx and MCS management (n = 7, n = 3, respectively). One study addressed temporary mechanical circulatory support. Most studies advocated for rapid integration of AI into clinical practice, acknowledging potential improvements in management guidance and reliability of outcomes prediction. There was a notable paucity of external data validation and integration of multiple data modalities. Conclusion Our review showed mounting innovation in AI application in management of MCS and HTx, with the largest evidence showing improved mortality outcome prediction.
Collapse
|
25
|
Evaluation of federated learning variations for COVID-19 diagnosis using chest radiographs from 42 US and European hospitals. J Am Med Inform Assoc 2022; 30:54-63. [PMID: 36214629 PMCID: PMC9619688 DOI: 10.1093/jamia/ocac188] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 08/31/2022] [Accepted: 10/07/2022] [Indexed: 12/31/2022] Open
Abstract
OBJECTIVE Federated learning (FL) allows multiple distributed data holders to collaboratively learn a shared model without data sharing. However, individual health system data are heterogeneous. "Personalized" FL variations have been developed to counter data heterogeneity, but few have been evaluated using real-world healthcare data. The purpose of this study is to investigate the performance of a single-site versus a 3-client federated model using a previously described Coronavirus Disease 19 (COVID-19) diagnostic model. Additionally, to investigate the effect of system heterogeneity, we evaluate the performance of 4 FL variations. MATERIALS AND METHODS We leverage a FL healthcare collaborative including data from 5 international healthcare systems (US and Europe) encompassing 42 hospitals. We implemented a COVID-19 computer vision diagnosis system using the Federated Averaging (FedAvg) algorithm implemented on Clara Train SDK 4.0. To study the effect of data heterogeneity, training data was pooled from 3 systems locally and federation was simulated. We compared a centralized/pooled model, versus FedAvg, and 3 personalized FL variations (FedProx, FedBN, and FedAMP). RESULTS We observed comparable model performance with respect to internal validation (local model: AUROC 0.94 vs FedAvg: 0.95, P = .5) and improved model generalizability with the FedAvg model (P < .05). When investigating the effects of model heterogeneity, we observed poor performance with FedAvg on internal validation as compared to personalized FL algorithms. FedAvg did have improved generalizability compared to personalized FL algorithms. On average, FedBN had the best rank performance on internal and external validation. CONCLUSION FedAvg can significantly improve the generalization of the model compared to other personalization FL algorithms; however, at the cost of poor internal validity. Personalized FL may offer an opportunity to develop both internal and externally validated algorithms.
Collapse
|
26
|
Multi-dimensional patient acuity estimation with longitudinal EHR tokenization and flexible transformer networks. Front Digit Health 2022; 4:1029191. [PMID: 36440460 PMCID: PMC9682245 DOI: 10.3389/fdgth.2022.1029191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 10/14/2022] [Indexed: 11/11/2022] Open
Abstract
Transformer model architectures have revolutionized the natural language processing (NLP) domain and continue to produce state-of-the-art results in text-based applications. Prior to the emergence of transformers, traditional NLP models such as recurrent and convolutional neural networks demonstrated promising utility for patient-level predictions and health forecasting from longitudinal datasets. However, to our knowledge only few studies have explored transformers for predicting clinical outcomes from electronic health record (EHR) data, and in our estimation, none have adequately derived a health-specific tokenization scheme to fully capture the heterogeneity of EHR systems. In this study, we propose a dynamic method for tokenizing both discrete and continuous patient data, and present a transformer-based classifier utilizing a joint embedding space for integrating disparate temporal patient measurements. We demonstrate the feasibility of our clinical AI framework through multi-task ICU patient acuity estimation, where we simultaneously predict six mortality and readmission outcomes. Our longitudinal EHR tokenization and transformer modeling approaches resulted in more accurate predictions compared with baseline machine learning models, which suggest opportunities for future multimodal data integrations and algorithmic support tools using clinical transformer networks.
Collapse
|
27
|
Federated learning for preserving data privacy in collaborative healthcare research. Digit Health 2022; 8:20552076221134455. [PMID: 36325438 PMCID: PMC9619858 DOI: 10.1177/20552076221134455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 10/05/2022] [Indexed: 11/06/2022] Open
Abstract
Generalizability, external validity, and reproducibility are high priorities for artificial intelligence applications in healthcare. Traditional approaches to addressing these elements involve sharing patient data between institutions or practice settings, which can compromise data privacy (individuals' right to prevent the sharing and disclosure of information about themselves) and data security (simultaneously preserving confidentiality, accuracy, fidelity, and availability of data). This article describes insights from real-world implementation of federated learning techniques that offer opportunities to maintain both data privacy and availability via collaborative machine learning that shares knowledge, not data. Local models are trained separately on local data. As they train, they send local model updates (e.g. coefficients or gradients) for consolidation into a global model. In some use cases, global models outperform local models on new, previously unseen local datasets, suggesting that collaborative learning from a greater number of examples, including a greater number of rare cases, may improve predictive performance. Even when sharing model updates rather than data, privacy leakage can occur when adversaries perform property or membership inference attacks which can be used to ascertain information about the training set. Emerging techniques mitigate risk from adversarial attacks, allowing investigators to maintain both data privacy and availability in collaborative healthcare research. When data heterogeneity between participating centers is high, personalized algorithms may offer greater generalizability by improving performance on data from centers with proportionately smaller training sample sizes. Properly applied, federated learning has the potential to optimize the reproducibility and performance of collaborative learning while preserving data security and privacy.
Collapse
|
28
|
Phenotype clustering in health care: A narrative review for clinicians. Front Artif Intell 2022; 5:842306. [PMID: 36034597 PMCID: PMC9411746 DOI: 10.3389/frai.2022.842306] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 07/26/2022] [Indexed: 01/03/2023] Open
Abstract
Human pathophysiology is occasionally too complex for unaided hypothetical-deductive reasoning and the isolated application of additive or linear statistical methods. Clustering algorithms use input data patterns and distributions to form groups of similar patients or diseases that share distinct properties. Although clinicians frequently perform tasks that may be enhanced by clustering, few receive formal training and clinician-centered literature in clustering is sparse. To add value to clinical care and research, optimal clustering practices require a thorough understanding of how to process and optimize data, select features, weigh strengths and weaknesses of different clustering methods, select the optimal clustering method, and apply clustering methods to solve problems. These concepts and our suggestions for implementing them are described in this narrative review of published literature. All clustering methods share the weakness of finding potential clusters even when natural clusters do not exist, underscoring the importance of applying data-driven techniques as well as clinical and statistical expertise to clustering analyses. When applied properly, patient and disease phenotype clustering can reveal obscured associations that can help clinicians understand disease pathophysiology, predict treatment response, and identify patients for clinical trial enrollment.
Collapse
|
29
|
Potentials and Challenges of Pervasive Sensing in the Intensive Care Unit. Front Digit Health 2022; 4:773387. [PMID: 35656333 PMCID: PMC9152012 DOI: 10.3389/fdgth.2022.773387] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 04/08/2022] [Indexed: 11/13/2022] Open
Abstract
Patients in critical care settings often require continuous and multifaceted monitoring. However, current clinical monitoring practices fail to capture important functional and behavioral indices such as mobility or agitation. Recent advances in non-invasive sensing technology, high throughput computing, and deep learning techniques are expected to transform the existing patient monitoring paradigm by enabling and streamlining granular and continuous monitoring of these crucial critical care measures. In this review, we highlight current approaches to pervasive sensing in critical care and identify limitations, future challenges, and opportunities in this emerging field.
Collapse
|
30
|
Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Predict Postoperative Complications and Report on a Mobile Platform. JAMA Netw Open 2022; 5:e2211973. [PMID: 35576007 PMCID: PMC9112066 DOI: 10.1001/jamanetworkopen.2022.11973] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
IMPORTANCE Predicting postoperative complications has the potential to inform shared decisions regarding the appropriateness of surgical procedures, targeted risk-reduction strategies, and postoperative resource use. Realizing these advantages requires that accurate real-time predictions be integrated with clinical and digital workflows; artificial intelligence predictive analytic platforms using automated electronic health record (EHR) data inputs offer an intriguing possibility for achieving this, but there is a lack of high-level evidence from prospective studies supporting their use. OBJECTIVE To examine whether the MySurgeryRisk artificial intelligence system has stable predictive performance between development and prospective validation phases and whether it is feasible to provide automated outputs directly to surgeons' mobile devices. DESIGN, SETTING, AND PARTICIPANTS In this prognostic study, the platform used automated EHR data inputs and machine learning algorithms to predict postoperative complications and provide predictions to surgeons, previously through a web portal and currently through a mobile device application. All patients 18 years or older who were admitted for any type of inpatient surgical procedure (74 417 total procedures involving 58 236 patients) between June 1, 2014, and September 20, 2020, were included. Models were developed using retrospective data from 52 117 inpatient surgical procedures performed between June 1, 2014, and November 27, 2018. Validation was performed using data from 22 300 inpatient surgical procedures collected prospectively from November 28, 2018, to September 20, 2020. MAIN OUTCOMES AND MEASURES Algorithms for generalized additive models and random forest models were developed and validated using real-time EHR data. Model predictive performance was evaluated primarily using area under the receiver operating characteristic curve (AUROC) values. RESULTS Among 58 236 total adult patients who received 74 417 major inpatient surgical procedures, the mean (SD) age was 57 (17) years; 29 226 patients (50.2%) were male. Results reported in this article focus primarily on the validation cohort. The validation cohort included 22 300 inpatient surgical procedures involving 19 132 patients (mean [SD] age, 58 [17] years; 9672 [50.6%] male). A total of 2765 patients (14.5%) were Black or African American, 14 777 (77.2%) were White, 1235 (6.5%) were of other races (including American Indian or Alaska Native, Asian, Native Hawaiian or Pacific Islander, and multiracial), and 355 (1.9%) were of unknown race because of missing data; 979 patients (5.1%) were Hispanic, 17 663 (92.3%) were non-Hispanic, and 490 (2.6%) were of unknown ethnicity because of missing data. A greater number of input features was associated with stable or improved model performance. For example, the random forest model trained with 135 input features had the highest AUROC values for predicting acute kidney injury (0.82; 95% CI, 0.82-0.83); cardiovascular complications (0.81; 95% CI, 0.81-0.82); neurological complications, including delirium (0.87; 95% CI, 0.87-0.88); prolonged intensive care unit stay (0.89; 95% CI, 0.88-0.89); prolonged mechanical ventilation (0.91; 95% CI, 0.90-0.91); sepsis (0.86; 95% CI, 0.85-0.87); venous thromboembolism (0.82; 95% CI, 0.81-0.83); wound complications (0.78; 95% CI, 0.78-0.79); 30-day mortality (0.84; 95% CI, 0.82-0.86); and 90-day mortality (0.84; 95% CI, 0.82-0.85), with accuracy similar to surgeons' predictions. Compared with the original web portal, the mobile device application allowed efficient fingerprint login access and loaded data approximately 10 times faster. The application output displayed patient information, risk of postoperative complications, top 3 risk factors for each complication, and patterns of complications for individual surgeons compared with their colleagues. CONCLUSIONS AND RELEVANCE In this study, automated real-time predictions of postoperative complications with mobile device outputs had good performance in clinical settings with prospective validation, matching surgeons' predictive accuracy.
Collapse
|
31
|
Physiologic signatures within six hours of hospitalization identify acute illness phenotypes. PLOS DIGITAL HEALTH 2022; 1:e0000110. [PMID: 36590701 PMCID: PMC9802629 DOI: 10.1371/journal.pdig.0000110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
During the early stages of hospital admission, clinicians use limited information to make decisions as patient acuity evolves. We hypothesized that clustering analysis of vital signs measured within six hours of hospital admission would reveal distinct patient phenotypes with unique pathophysiological signatures and clinical outcomes. We created a longitudinal electronic health record dataset for 75,762 adult patient admissions to a tertiary care center in 2014-2016 lasting six hours or longer. Physiotypes were derived via unsupervised machine learning in a training cohort of 41,502 patients applying consensus k-means clustering to six vital signs measured within six hours of admission. Reproducibility and correlation with clinical biomarkers and outcomes were assessed in validation cohort of 17,415 patients and testing cohort of 16,845 patients. Training, validation, and testing cohorts had similar age (54-55 years) and sex (55% female), distributions. There were four distinct clusters. Physiotype A had physiologic signals consistent with early vasoplegia, hypothermia, and low-grade inflammation and favorable short-and long-term clinical outcomes despite early, severe illness. Physiotype B exhibited early tachycardia, tachypnea, and hypoxemia followed by the highest incidence of prolonged respiratory insufficiency, sepsis, acute kidney injury, and short- and long-term mortality. Physiotype C had minimal early physiological derangement and favorable clinical outcomes. Physiotype D had the greatest prevalence of chronic cardiovascular and kidney disease, presented with severely elevated blood pressure, and had good short-term outcomes but suffered increased 3-year mortality. Comparing sequential organ failure assessment (SOFA) scores across physiotypes demonstrated that clustering did not simply recapitulate previously established acuity assessments. In a heterogeneous cohort of hospitalized patients, unsupervised machine learning techniques applied to routine, early vital sign data identified physiotypes with unique disease categories and distinct clinical outcomes. This approach has the potential to augment understanding of pathophysiology by distilling thousands of disease states into a few physiological signatures.
Collapse
|
32
|
Ideal algorithms in healthcare: Explainable, dynamic, precise, autonomous, fair, and reproducible. PLOS DIGITAL HEALTH 2022; 1:e0000006. [PMID: 36532301 PMCID: PMC9754299 DOI: 10.1371/journal.pdig.0000006] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Established guidelines describe minimum requirements for reporting algorithms in healthcare; it is equally important to objectify the characteristics of ideal algorithms that confer maximum potential benefits to patients, clinicians, and investigators. We propose a framework for ideal algorithms, including 6 desiderata: explainable (convey the relative importance of features in determining outputs), dynamic (capture temporal changes in physiologic signals and clinical events), precise (use high-resolution, multimodal data and aptly complex architecture), autonomous (learn with minimal supervision and execute without human input), fair (evaluate and mitigate implicit bias and social inequity), and reproducible (validated externally and prospectively and shared with academic communities). We present an ideal algorithms checklist and apply it to highly cited algorithms. Strategies and tools such as the predictive, descriptive, relevant (PDR) framework, the Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence (SPIRIT-AI) extension, sparse regression methods, and minimizing concept drift can help healthcare algorithms achieve these objectives, toward ideal algorithms in healthcare.
Collapse
|
33
|
Abstract
Accurate prediction and monitoring of patient health in the intensive care unit can inform shared decisions regarding appropriateness of care delivery, risk-reduction strategies, and intensive care resource use. Traditionally, algorithmic solutions for patient outcome prediction rely solely on data available from electronic health records (EHR). In this pilot study, we explore the benefits of augmenting existing EHR data with novel measurements from wrist-worn activity sensors as part of a clinical environment known as the Intelligent ICU. We implemented temporal deep learning models based on two distinct sources of patient data: (1) routinely measured vital signs from electronic health records, and (2) activity data collected from wearable sensors. As a proxy for illness severity, our models predicted whether patients leaving the intensive care unit would be successfully or unsuccessfully discharged from the hospital. We overcome the challenge of small sample size in our prospective cohort by applying deep transfer learning using EHR data from a much larger cohort of traditional ICU patients. Our experiments quantify added utility of non-traditional measurements for predicting patient health, especially when applying a transfer learning procedure to small novel Intelligent ICU cohorts of critically ill patients.
Collapse
|
34
|
Reinforcement learning in surgery. Surgery 2021; 170:329-332. [PMID: 33436272 DOI: 10.1016/j.surg.2020.11.040] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 11/24/2020] [Accepted: 11/27/2020] [Indexed: 12/30/2022]
Abstract
Patients and physicians make essential decisions regarding diagnostic and therapeutic interventions. These actions should be performed or deferred under time constraints and uncertainty regarding patients' diagnoses and predicted response to treatment. This may lead to cognitive and judgment errors. Reinforcement learning is a subfield of machine learning that identifies a sequence of actions to increase the probability of achieving a predetermined goal. Reinforcement learning has the potential to assist in surgical decision making by recommending actions at predefined intervals and its ability to utilize complex input data, including text, image, and temporal data, in the decision-making process. The algorithm mimics a human trial-and-error learning process to calculate optimum recommendation policies. The article provides insight regarding challenges in the development and application of reinforcement learning in the medical field, with an emphasis on surgical decision making. The review focuses on challenges in formulating reward function describing the ultimate goal and determination of patient states derived from electronic health records, along with the lack of resources to simulate the potential benefits of suggested actions in response to changing physiological states during and after surgery. Although clinical implementation would require secure, interoperable, livestreaming electronic health record data for use by virtual model, development and validation of personalized reinforcement learning models in surgery can contribute to improving care by helping patients and clinicians make better decisions.
Collapse
|
35
|
Intelligent ICU for Autonomous Patient Monitoring Using Pervasive Sensing and Deep Learning. Sci Rep 2019; 9:8020. [PMID: 31142754 PMCID: PMC6541714 DOI: 10.1038/s41598-019-44004-w] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 05/07/2019] [Indexed: 11/09/2022] Open
Abstract
Currently, many critical care indices are not captured automatically at a granular level, rather are repetitively assessed by overburdened nurses. In this pilot study, we examined the feasibility of using pervasive sensing technology and artificial intelligence for autonomous and granular monitoring in the Intensive Care Unit (ICU). As an exemplary prevalent condition, we characterized delirious patients and their environment. We used wearable sensors, light and sound sensors, and a camera to collect data on patients and their environment. We analyzed collected data to detect and recognize patient's face, their postures, facial action units and expressions, head pose variation, extremity movements, sound pressure levels, light intensity level, and visitation frequency. We found that facial expressions, functional status entailing extremity movement and postures, and environmental factors including the visitation frequency, light and sound pressure levels at night were significantly different between the delirious and non-delirious patients. Our results showed that granular and autonomous monitoring of critically ill patients and their environment is feasible using a noninvasive system, and we demonstrated its potential for characterizing critical care patients and environmental factors.
Collapse
|
36
|
Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE J Biomed Health Inform 2018; 22:1589-1604. [PMID: 29989977 PMCID: PMC6043423 DOI: 10.1109/jbhi.2017.2767063] [Citation(s) in RCA: 409] [Impact Index Per Article: 68.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The past decade has seen an explosion in the amount of digital information stored in electronic health records (EHRs). While primarily designed for archiving patient information and performing administrative healthcare tasks like billing, many researchers have found secondary use of these records for various clinical informatics applications. Over the same period, the machine learning community has seen widespread advances in the field of deep learning. In this review, we survey the current research on applying deep learning to clinical tasks based on EHR data, where we find a variety of deep learning techniques and frameworks being applied to several types of clinical applications including information extraction, representation learning, outcome prediction, phenotyping, and deidentification. We identify several limitations of current research involving topics such as model interpretability, data heterogeneity, and lack of universal benchmarks. We conclude by summarizing the state of the field and identifying avenues of future deep EHR research.
Collapse
|
37
|
Deep neural network architectures for forecasting analgesic response. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2016:2966-2969. [PMID: 28268935 DOI: 10.1109/embc.2016.7591352] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Response to prescribed analgesic drugs varies between individuals, and choosing the right drug/dose often involves a lengthy, iterative process of trial and error. Furthermore, a significant portion of patients experience adverse events such as post-operative urinary retention (POUR) during inpatient management of acute postoperative pain. To better forecast analgesic responses, we compared conventional machine learning methods with modern neural network architectures to gauge their effectiveness at forecasting temporal patterns of postoperative pain and analgesic use, as well as predicting the risk of POUR. Our results indicate that simpler machine learning approaches might offer superior results; however, all of these techniques may play a promising role for developing smarter post-operative pain management strategies.
Collapse
|