1
|
El Moheb M, Gebran A, Maurer LR, Naar L, El Hechi M, Breen K, Dorken-Gallastegi A, Sinyard R, Bertsimas D, Velmahos G, Kaafarani HMA. Artificial intelligence versus surgeon gestalt in predicting risk of emergency general surgery. J Trauma Acute Care Surg 2023; 95:565-572. [PMID: 37314698 DOI: 10.1097/ta.0000000000004030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
BACKGROUND Artificial intelligence (AI) risk prediction algorithms such as the smartphone-available Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) for emergency general surgery (EGS) are superior to traditional risk calculators because they account for complex nonlinear interactions between variables, but how they compare to surgeons' gestalt remains unknown. Herein, we sought to: (1) compare POTTER to surgeons' surgical risk estimation and (2) assess how POTTER influences surgeons' risk estimation. STUDY DESIGN A total of 150 patients who underwent EGS at a large quaternary care center between May 2018 and May 2019 were prospectively followed up for 30-day postoperative outcomes (mortality, septic shock, ventilator dependence, bleeding requiring transfusion, pneumonia), and clinical cases were systematically created representing their initial presentation. POTTER's outcome predictions for each case were also recorded. Thirty acute care surgeons with diverse practice settings and levels of experience were then randomized into two groups: 15 surgeons (SURG) were asked to predict the outcomes without access to POTTER's predictions while the remaining 15 (SURG-POTTER) were asked to predict the same outcomes after interacting with POTTER. Comparing to actual patient outcomes, the area under the curve (AUC) methodology was used to assess the predictive performance of (1) POTTER versus SURG, and (2) SURG versus SURG-POTTER. RESULTS POTTER outperformed SURG in predicting all outcomes (mortality-AUC: 0.880 vs. 0.841; ventilator dependence-AUC: 0.928 vs. 0.833; bleeding-AUC: 0.832 vs. 0.735; pneumonia-AUC: 0.837 vs. 0.753) except septic shock (AUC: 0.816 vs. 0.820). SURG-POTTER outperformed SURG in predicting mortality (AUC: 0.870 vs. 0.841), bleeding (AUC: 0.811 vs. 0.735), pneumonia (AUC: 0.803 vs. 0.753) but not septic shock (AUC: 0.712 vs. 0.820) or ventilator dependence (AUC: 0.834 vs. 0.833). CONCLUSION The AI risk calculator POTTER outperformed surgeons' gestalt in predicting the postoperative mortality and outcomes of EGS patients, and when used, improved the individual surgeons' risk prediction. Artificial intelligence algorithms, such as POTTER, could prove useful as a bedside adjunct to surgeons when preoperatively counseling patients. LEVEL OF EVIDENCE Prognostic and Epidemiological; Level II.
Collapse
Affiliation(s)
- Mohamad El Moheb
- From the Division of Trauma, Emergency Surgery, and Surgical Critical Care (M.E.M., A.G., L.R.M., L.N., M.E.H., K.B., A.D.-G., R.S., G.V., H.M.A.K.), Massachusetts General Hospital, Boston; and Massachusetts Institute of Technology (D.B.), Cambridge, Massachusetts
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2
|
Gebran A, Thakur SS, Maurer LR, Bandi H, Sinyard R, Dorken-Gallastegi A, Bokenkamp M, El Moheb M, Naar L, Vapsi A, Daye D, Velmahos GC, Bertsimas D, Kaafarani HMA. Development of a Machine Learning-Based Prescriptive Tool to Address Racial Disparities in Access to Care After Penetrating Trauma. JAMA Surg 2023; 158:1088-1095. [PMID: 37610746 PMCID: PMC10448365 DOI: 10.1001/jamasurg.2023.2293] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 03/11/2023] [Indexed: 08/24/2023]
Abstract
Importance The use of artificial intelligence (AI) in clinical medicine risks perpetuating existing bias in care, such as disparities in access to postinjury rehabilitation services. Objective To leverage a novel, interpretable AI-based technology to uncover racial disparities in access to postinjury rehabilitation care and create an AI-based prescriptive tool to address these disparities. Design, Setting, and Participants This cohort study used data from the 2010-2016 American College of Surgeons Trauma Quality Improvement Program database for Black and White patients with a penetrating mechanism of injury. An interpretable AI methodology called optimal classification trees (OCTs) was applied in an 80:20 derivation/validation split to predict discharge disposition (home vs postacute care [PAC]). The interpretable nature of OCTs allowed for examination of the AI logic to identify racial disparities. A prescriptive mixed-integer optimization model using age, injury, and gender data was allowed to "fairness-flip" the recommended discharge destination for a subset of patients while minimizing the ratio of imbalance between Black and White patients. Three OCTs were developed to predict discharge disposition: the first 2 trees used unadjusted data (one without and one with the race variable), and the third tree used fairness-adjusted data. Main Outcomes and Measures Disparities and the discriminative performance (C statistic) were compared among fairness-adjusted and unadjusted OCTs. Results A total of 52 468 patients were included; the median (IQR) age was 29 (22-40) years, 46 189 patients (88.0%) were male, 31 470 (60.0%) were Black, and 20 998 (40.0%) were White. A total of 3800 Black patients (12.1%) were discharged to PAC, compared with 4504 White patients (21.5%; P < .001). Examining the AI logic uncovered significant disparities in PAC discharge destination access, with race playing the second most important role. The prescriptive fairness adjustment recommended flipping the discharge destination of 4.5% of the patients, with the performance of the adjusted model increasing from a C statistic of 0.79 to 0.87. After fairness adjustment, disparities disappeared, and a similar percentage of Black and White patients (15.8% vs 15.8%; P = .87) had a recommended discharge to PAC. Conclusions and Relevance In this study, we developed an accurate, machine learning-based, fairness-adjusted model that can identify barriers to discharge to postacute care. Instead of accidentally encoding bias, interpretable AI methodologies are powerful tools to diagnose and remedy system-related bias in care, such as disparities in access to postinjury rehabilitation care.
Collapse
Affiliation(s)
- Anthony Gebran
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston
- Center for Outcomes & Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston
- Department of Surgery, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania
| | | | - Lydia R. Maurer
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston
- Center for Outcomes & Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston
| | - Hari Bandi
- Massachusetts Institute of Technology, Cambridge
| | - Robert Sinyard
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston
- Center for Outcomes & Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston
| | - Ander Dorken-Gallastegi
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston
- Center for Outcomes & Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston
- Department of Surgery, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania
| | - Mary Bokenkamp
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston
- Center for Outcomes & Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston
| | - Mohamad El Moheb
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston
- Center for Outcomes & Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston
| | - Leon Naar
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston
- Center for Outcomes & Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston
| | - Annita Vapsi
- Massachusetts Institute of Technology, Cambridge
| | - Dania Daye
- Division of Interventional Radiology, Massachusetts General Hospital, Boston
| | - George C. Velmahos
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston
| | | | - Haytham M. A. Kaafarani
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston
- Center for Outcomes & Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston
| |
Collapse
|
3
|
Jogerst KM, Park YS, Anteby R, Sinyard R, Coe TM, Cassidy D, McKinley SK, Petrusa E, Phitayakorn R, Mohapatra A, Gee DW. Impact of Rater Training on Residents Technical Skill Assessments: A Randomized Trial. J Surg Educ 2022; 79:e225-e234. [PMID: 36333174 DOI: 10.1016/j.jsurg.2022.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 08/28/2022] [Accepted: 09/23/2022] [Indexed: 06/16/2023]
Abstract
OBJECTIVE The ACS/APDS Resident Skills Curriculum's Objective Structured Assessment of Technical Skills (OSATS) consists of task-specific checklists and a global rating scale (GRS) completed by raters. Prior work demonstrated a need for rater training. This study evaluates the impact of a rater-training curriculum on scoring discrimination, consistency, and validity for handsewn bowel anastomosis (HBA) and vascular anastomosis (VA). DESIGN/ METHODS A rater training video model was developed, which included a GRS orientation and anchoring performances representing the range of potential scores. Faculty raters were randomized to rater training or no rater training and were asked to score videos of resident HBA/VA. Consensus scores were assigned to each video using a modified Delphi process (Gold Score). Trained and untrained scores were analyzed for discrimination and score spread and compared to the Gold Score for relative agreement. RESULTS Eight general and eight vascular surgery faculty were randomized to score 24 HBA/VA videos. Rater training increased rater discrimination and decreased rating scale shrinkage for both VA (mean trained score: 2.83, variance 1.88; mean untrained score: 3.1, variance 1.14, p = 0.007) and HBA (mean trained score: 3.52, variance 1.44; mean untrained score: 3.42, variance 0.96, p = 0.033). On validity analyses, a comparison between each rater group vs Gold Score revealed a moderate training impact for VA, trained κ=0.65 vs untrained κ=0.57 and no impact for HBA, R1 κ = 0.71 vs R2 κ = 0.73. CONCLUSION A rater-training curriculum improved raters' ability to differentiate performance levels and use a wider range of the scoring scale. However, despite rater training, there was persistent disagreement between faculty GRS scores with no groups reaching the agreement threshold for formative assessment. If technical skill exams are incorporated into high stakes assessments, consensus ratings via a standard setting process are likely a more valid option than individual faculty ratings.
Collapse
Affiliation(s)
- Kristen M Jogerst
- Department of General Surgery, Mayo Clinic Arizona, Phoenix, Arizona; Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts
| | - Yoon Soo Park
- Department of Emergency Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - Roi Anteby
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts
| | - Robert Sinyard
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts
| | - Taylor M Coe
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts
| | - Douglas Cassidy
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts
| | - Sophia K McKinley
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts
| | - Emil Petrusa
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts
| | - Roy Phitayakorn
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts
| | - Abhisekh Mohapatra
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts
| | - Denise W Gee
- Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts.
| |
Collapse
|
4
|
Gebran A, Vapsi A, Maurer LR, El Moheb M, Naar L, Thakur SS, Sinyard R, Daye D, Velmahos GC, Bertsimas D, Kaafarani HMA. POTTER-ICU: An artificial intelligence smartphone-accessible tool to predict the need for intensive care after emergency surgery. Surgery 2022; 172:470-475. [PMID: 35489978 DOI: 10.1016/j.surg.2022.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 02/15/2022] [Accepted: 03/15/2022] [Indexed: 11/18/2022]
Abstract
BACKGROUND Delays in admitting high-risk emergency surgery patients to the intensive care unit result in worse outcomes and increased health care costs. We aimed to use interpretable artificial intelligence technology to create a preoperative predictor for postoperative intensive care unit need in emergency surgery patients. METHODS A novel, interpretable artificial intelligence technology called optimal classification trees was leveraged in an 80:20 train:test split of adult emergency surgery patients in the 2007-2017 American College of Surgeons National Surgical Quality Improvement Program database. Demographics, comorbidities, and laboratory values were used to develop, train, and then validate optimal classification tree algorithms to predict the need for postoperative intensive care unit admission. The latter was defined as postoperative death or the development of 1 or more postoperative complications warranting critical care (eg, unplanned intubation, ventilator requirement ≥48 hours, cardiac arrest requiring cardiopulmonary resuscitation, and septic shock). An interactive and user-friendly application was created. C statistics were used to measure performance. RESULTS A total of 464,861 patients were included. The mean age was 55 years, 48% were male, and 11% developed severe postoperative complications warranting critical care. The Predictive OpTimal Trees in Emergency Surgery Risk Intensive Care Unit application was created as the user-friendly interface of the complex optimal classification tree algorithms. The number of questions (ie, tree depths) needed to predict intensive care unit admission ranged from 2 to 11. The Predictive OpTimal Trees in Emergency Surgery Risk Intensive Care Unit application had excellent discrimination for predicting the need for intensive care unit admission (C statistics: 0.89 train, 0.88 test). CONCLUSION We recommend the Predictive OpTimal Trees in Emergency Surgery Risk Intensive Care Unit application as an accurate, artificial intelligence-based tool for predicting severe complications warranting intensive care unit admission after emergency surgery. The Predictive OpTimal Trees in Emergency Surgery Risk Intensive Care Unit application can prove useful to triage patients to the intensive care unit and to potentially decrease failure to rescue in emergency surgery patients.
Collapse
Affiliation(s)
- Anthony Gebran
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston, MA; Center for Outcomes and Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston, MA
| | - Annita Vapsi
- Massachusetts Institute of Technology, Cambridge, MA
| | - Lydia R Maurer
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston, MA; Center for Outcomes and Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston, MA
| | - Mohamad El Moheb
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston, MA; Center for Outcomes and Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston, MA
| | - Leon Naar
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston, MA; Center for Outcomes and Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston, MA
| | | | - Robert Sinyard
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston, MA
| | - Dania Daye
- Center for Outcomes and Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston, MA; Division of Interventional Radiology, Massachusetts General Hospital, Boston, MA
| | - George C Velmahos
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston, MA
| | | | - Haytham M A Kaafarani
- Division of Trauma, Emergency Surgery, and Surgical Critical Care, Department of Surgery, Massachusetts General Hospital, Boston, MA; Center for Outcomes and Patient Safety in Surgery (COMPASS), Massachusetts General Hospital, Boston, MA.
| |
Collapse
|