1
|
Harada Y, Suzuki T, Harada T, Sakamoto T, Ishizuka K, Miyagami T, Kawamura R, Kunitomo K, Nagano H, Shimizu T, Watari T. Performance evaluation of ChatGPT in detecting diagnostic errors and their contributing factors: an analysis of 545 case reports of diagnostic errors. BMJ Open Qual 2024; 13:e002654. [PMID: 38830730 PMCID: PMC11149143 DOI: 10.1136/bmjoq-2023-002654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Manual chart review using validated assessment tools is a standardised methodology for detecting diagnostic errors. However, this requires considerable human resources and time. ChatGPT, a recently developed artificial intelligence chatbot based on a large language model, can effectively classify text based on suitable prompts. Therefore, ChatGPT can assist manual chart reviews in detecting diagnostic errors. OBJECTIVE This study aimed to clarify whether ChatGPT could correctly detect diagnostic errors and possible factors contributing to them based on case presentations. METHODS We analysed 545 published case reports that included diagnostic errors. We imputed the texts of case presentations and the final diagnoses with some original prompts into ChatGPT (GPT-4) to generate responses, including the judgement of diagnostic errors and contributing factors of diagnostic errors. Factors contributing to diagnostic errors were coded according to the following three taxonomies: Diagnosis Error Evaluation and Research (DEER), Reliable Diagnosis Challenges (RDC) and Generic Diagnostic Pitfalls (GDP). The responses on the contributing factors from ChatGPT were compared with those from physicians. RESULTS ChatGPT correctly detected diagnostic errors in 519/545 cases (95%) and coded statistically larger numbers of factors contributing to diagnostic errors per case than physicians: DEER (median 5 vs 1, p<0.001), RDC (median 4 vs 2, p<0.001) and GDP (median 4 vs 1, p<0.001). The most important contributing factors of diagnostic errors coded by ChatGPT were 'failure/delay in considering the diagnosis' (315, 57.8%) in DEER, 'atypical presentation' (365, 67.0%) in RDC, and 'atypical presentation' (264, 48.4%) in GDP. CONCLUSION ChatGPT accurately detects diagnostic errors from case presentations. ChatGPT may be more sensitive than manual reviewing in detecting factors contributing to diagnostic errors, especially for 'atypical presentation'.
Collapse
Affiliation(s)
- Yukinori Harada
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
| | | | - Taku Harada
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
- Nerima Hikarigaoka Hospital, Nerima-ku, Tokyo, Japan
| | - Tetsu Sakamoto
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
| | - Kosuke Ishizuka
- Yokohama City University School of Medicine Graduate School of Medicine, Yokohama, Kanagawa, Japan
| | - Taiju Miyagami
- Department of General Medicine, Faculty of Medicine, Juntendo University, Bunkyo-ku, Tokyo, Japan
| | - Ren Kawamura
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
| | | | - Hiroyuki Nagano
- Department of General Internal Medicine, Tenri Hospital, Tenri, Nara, Japan
| | - Taro Shimizu
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan
| | - Takashi Watari
- Integrated Clinical Education Center, Kyoto University Hospital, Kyoto, Kyoto, Japan
| |
Collapse
|
2
|
Michelson KA, Bachur RG, Dart AH, Chaudhari PP, Cruz AT, Grubenhoff JA, Reeves SD, Monuteaux MC, Finkelstein JA. Identification of delayed diagnosis of paediatric appendicitis in administrative data: a multicentre retrospective validation study. BMJ Open 2023; 13:e064852. [PMID: 36854600 PMCID: PMC9980351 DOI: 10.1136/bmjopen-2022-064852] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/02/2023] Open
Abstract
OBJECTIVE To derive and validate a tool that retrospectively identifies delayed diagnosis of appendicitis in administrative data with high accuracy. DESIGN Cross-sectional study. SETTING Five paediatric emergency departments (EDs). PARTICIPANTS 669 patients under 21 years old with possible delayed diagnosis of appendicitis, defined as two ED encounters within 7 days, the second with appendicitis. OUTCOME Delayed diagnosis was defined as appendicitis being present but not diagnosed at the first ED encounter based on standardised record review. The cohort was split into derivation (2/3) and validation (1/3) groups. We derived a prediction rule using logistic regression, with covariates including variables obtainable only from administrative data. The resulting trigger tool was applied to the validation group to determine area under the curve (AUC). Test characteristics were determined at two predicted probability thresholds. RESULTS Delayed diagnosis occurred in 471 (70.4%) patients. The tool had an AUC of 0.892 (95% CI 0.858 to 0.925) in the derivation group and 0.859 (95% CI 0.806 to 0.912) in the validation group. The positive predictive value (PPV) for delay at a maximal accuracy threshold was 84.7% (95% CI 78.2% to 89.8%) and identified 87.3% of delayed cases. The PPV at a stricter threshold was 94.9% (95% CI 87.4% to 98.6%) and identified 46.8% of delayed cases. CONCLUSIONS This tool accurately identified delayed diagnosis of appendicitis. It may be used to screen for potential missed diagnoses or to specifically identify a cohort of children with delayed diagnosis.
Collapse
Affiliation(s)
| | - Richard G Bachur
- Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Arianna H Dart
- Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Pradip P Chaudhari
- Division of Emergency and Transport Medicine, Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Andrea T Cruz
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Joseph A Grubenhoff
- Section of Pediatric Emergency Medicine, University of Colorado School of Medicine, Aurora, CO, USA
- Children's Hospital Colorado, Aurora, CO, USA
| | - Scott D Reeves
- Division of Pediatric Emergency Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | | | | |
Collapse
|
3
|
Wiegand AA, Dukhanin V, Sheikh T, Zannath F, Jajodia A, Schrandt S, Haskell H, McDonald KM. Human centered design workshops as a meta-solution to diagnostic disparities. Diagnosis (Berl) 2022; 9:458-467. [PMID: 36027891 DOI: 10.1515/dx-2022-0025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 07/19/2022] [Indexed: 12/29/2022]
Abstract
OBJECTIVES Diagnostic errors - inaccurate or untimely diagnoses or failures to communicate diagnoses - are harmful and costly for patients and health systems. Diagnostic disparities occur when diagnostic errors are experienced at disproportionate rates by certain patient subgroups based, for example, on patients' age, sex/gender, or race/ethnicity. We aimed to develop and test the feasibility of a human centered design workshop series that engages diverse stakeholders to develop solutions for mitigating diagnostic disparities. METHODS We employed a series of human centered design workshops supplemented by semi-structured interviews and literature evidence scans. Co-creation sessions and rapid prototyping by patient, clinician, and researcher stakeholders were used to generate design challenges, solution concepts, and prototypes. RESULTS A series of four workshops attended by 25 unique participants was convened in 2019-2021. Workshops generated eight design challenges, envisioned 29 solutions, and formulated principles for developing solutions in an equitable, patient-centered manner. Workshops further resulted in the conceptualization of 37 solutions for addressing diagnostic disparities and prototypes for two of the solutions. Participants agreed that the workshop processes were replicable and could be implemented in other settings to allow stakeholders to generate context-specific solutions. CONCLUSIONS The incorporation of human centered design through a series of workshops promises to be a productive way of engaging patient-researcher stakeholders to mitigate and prevent further exacerbation of diagnostic disparities. Healthcare stakeholders can apply human centered design principles to guide thinking about improving diagnostic performance and to center diverse patients' needs and experiences when implementing quality and safety improvements.
Collapse
Affiliation(s)
- Aaron A Wiegand
- Johns Hopkins University School of Nursing, Baltimore, MD, USA.,Department of Health, Behavior and Society, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, USA
| | - Vadim Dukhanin
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | | - Anushka Jajodia
- Center for Social Design, Maryland Institute College of Art, Baltimore, MD, USA
| | | | | | - Kathryn M McDonald
- Johns Hopkins University School of Nursing, Baltimore, MD, USA.,Department of General Internal Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| |
Collapse
|
4
|
Malik MA, Motta-Calderon D, Piniella N, Garber A, Konieczny K, Lam A, Plombon S, Carr K, Yoon C, Griffin J, Lipsitz S, Schnipper JL, Bates DW, Dalal AK. A structured approach to EHR surveillance of diagnostic error in acute care: an exploratory analysis of two institutionally-defined case cohorts. Diagnosis (Berl) 2022; 9:446-457. [PMID: 35993878 PMCID: PMC9651987 DOI: 10.1515/dx-2022-0032] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 07/12/2022] [Indexed: 12/29/2022]
Abstract
OBJECTIVES To test a structured electronic health record (EHR) case review process to identify diagnostic errors (DE) and diagnostic process failures (DPFs) in acute care. METHODS We adapted validated tools (Safer Dx, Diagnostic Error Evaluation Research [DEER] Taxonomy) to assess the diagnostic process during the hospital encounter and categorized 13 postulated e-triggers. We created two test cohorts of all preventable cases (n=28) and an equal number of randomly sampled non-preventable cases (n=28) from 365 adult general medicine patients who expired and underwent our institution's mortality case review process. After excluding patients with a length of stay of more than one month, each case was reviewed by two blinded clinicians trained in our process and by an expert panel. Inter-rater reliability was assessed. We compared the frequency of DE contributing to death in both cohorts, as well as mean DPFs and e-triggers for DE positive and negative cases within each cohort. RESULTS Twenty-seven (96.4%) preventable and 24 (85.7%) non-preventable cases underwent our review process. Inter-rater reliability was moderate between individual reviewers (Cohen's kappa 0.41) and substantial with the expert panel (Cohen's kappa 0.74). The frequency of DE contributing to death was significantly higher for the preventable compared to the non-preventable cohort (56% vs. 17%, OR 6.25 [1.68, 23.27], p<0.01). Mean DPFs and e-triggers were significantly and non-significantly higher for DE positive compared to DE negative cases in each cohort, respectively. CONCLUSIONS We observed substantial agreement among final consensus and expert panel reviews using our structured EHR case review process. DEs contributing to death associated with DPFs were identified in institutionally designated preventable and non-preventable cases. While e-triggers may be useful for discriminating DE positive from DE negative cases, larger studies are required for validation. Our approach has potential to augment institutional mortality case review processes with respect to DE surveillance.
Collapse
Affiliation(s)
- Maria A. Malik
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Daniel Motta-Calderon
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Nicholas Piniella
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Alison Garber
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Kaitlyn Konieczny
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Alyssa Lam
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Savanna Plombon
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Kevin Carr
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Catherine Yoon
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | | | - Stuart Lipsitz
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Jeffrey L. Schnipper
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - David W. Bates
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Anuj K. Dalal
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| |
Collapse
|
5
|
Lam D, Dominguez F, Leonard J, Wiersma A, Grubenhoff JA. Use of e-triggers to identify diagnostic errors in the paediatric ED. BMJ Qual Saf 2022; 31:735-743. [PMID: 35318272 DOI: 10.1136/bmjqs-2021-013683] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 02/28/2022] [Indexed: 01/26/2023]
Abstract
BACKGROUND Diagnostic errors (DxEs) are an understudied source of patient harm in children rarely captured in current adverse event reporting systems. Applying electronic triggers (e-triggers) to electronic health records shows promise in identifying DxEs but has not been used in the emergency department (ED) setting. OBJECTIVES To assess the performance of an e-trigger and subsequent manual screening for identifying probable DxEs among children with unplanned admission following a prior ED visit and to compare performance to existing incident reporting systems. DESIGN/METHODS Retrospective single-centre cohort study of children ages 0-22 admitted within 14 days of a previous ED visit between 1 January 2018 and 31 December 2019. Subjects were identified by e-trigger, screened to identify cases where index visit and hospital discharge diagnoses were potentially related but pathophysiologically distinct, and then these screened-in cases were reviewed for DxE using the SaferDx Instrument. Cases of DxE identified by e-trigger were cross-referenced against existing institutional incident reporting systems. RESULTS An e-trigger identified 1915 unplanned admissions (7.7% of 24 849 total admissions) with a preceding index visit. 453 (23.7%) were screened in and underwent review using SaferDx. 92 cases were classified as likely DxEs, representing 0.4% of all hospital admissions, 4.8% among those selected by e-trigger and 20.3% among those screened in for review. Half of cases were reviewed by two reviewers using SaferDx with substantial inter-rater reliability (Cohen's κ=0.65 (95% CI 0.54 to 0.75)). Six (6.5%) cases had been reported elsewhere: two to the hospital's incident reporting system and five to the ED case review team (one reported to both). CONCLUSION An e-trigger coupled with manual screening enriched a cohort of patients at risk for DxEs. Fewer than 10% of DxEs were identified through existing surveillance systems, suggesting that they miss a large proportion of DxEs. Further study is required to identify specific clinical presentations at risk of DxEs.
Collapse
Affiliation(s)
- Daniel Lam
- Pediatrics, University of Colorado Denver School of Medicine, Aurora, Colorado, USA
| | - Fidelity Dominguez
- Pediatric Emergency Medicine, Children's Hospital Colorado, Aurora, Colorado, USA
| | - Jan Leonard
- Section of Pediatric Emergency Medicine, University of Colorado Denver School of Medicine, Aurora, Colorado, USA
| | - Alexandria Wiersma
- Section of Pediatric Emergency Medicine, University of Colorado Denver School of Medicine, Aurora, Colorado, USA
| | - Joseph A Grubenhoff
- Section of Pediatric Emergency Medicine, University of Colorado Denver School of Medicine, Aurora, Colorado, USA
| |
Collapse
|
6
|
Stockwell DC, Sharek P. Diagnosing diagnostic errors: it's time to evolve the patient safety research paradigm. BMJ Qual Saf 2022; 31:701-703. [PMID: 35508375 DOI: 10.1136/bmjqs-2021-014517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2022] [Indexed: 11/04/2022]
Affiliation(s)
- David C Stockwell
- Anesthesiology and Critical Care Medicine, Johns Hopkins University, Baltimore, Maryland, USA .,Chief Medical Officer, Johns Hopkins Children's Center, Baltimore, Maryland, USA
| | - Paul Sharek
- General Pediatrics and Hospital Medicine, University of Washington, Seattle, Washington, USA.,Vice President, Chief Quality and Safety Officer, Seattle Children's Hospital, Seattle, Washington, USA
| |
Collapse
|
7
|
Enayati M, Sir M, Zhang X, Parker SJ, Duffy E, Singh H, Mahajan P, Pasupathy KS. Monitoring Diagnostic Safety Risks in Emergency Departments: Protocol for a Machine Learning Study. JMIR Res Protoc 2021; 10:e24642. [PMID: 34125077 PMCID: PMC8240801 DOI: 10.2196/24642] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 03/15/2021] [Accepted: 04/12/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Diagnostic decision making, especially in emergency departments, is a highly complex cognitive process that involves uncertainty and susceptibility to errors. A combination of factors, including patient factors (eg, history, behaviors, complexity, and comorbidity), provider-care team factors (eg, cognitive load and information gathering and synthesis), and system factors (eg, health information technology, crowding, shift-based work, and interruptions) may contribute to diagnostic errors. Using electronic triggers to identify records of patients with certain patterns of care, such as escalation of care, has been useful to screen for diagnostic errors. Once errors are identified, sophisticated data analytics and machine learning techniques can be applied to existing electronic health record (EHR) data sets to shed light on potential risk factors influencing diagnostic decision making. OBJECTIVE This study aims to identify variables associated with diagnostic errors in emergency departments using large-scale EHR data and machine learning techniques. METHODS This study plans to use trigger algorithms within EHR data repositories to generate a large data set of records that are labeled trigger-positive or trigger-negative, depending on whether they meet certain criteria. Samples from both data sets will be validated using medical record reviews, upon which we expect to find a higher number of diagnostic safety events in the trigger-positive subset. Machine learning will be used to evaluate relationships between certain patient factors, provider-care team factors, and system-level risk factors and diagnostic safety signals in the statistically matched groups of trigger-positive and trigger-negative charts. RESULTS This federally funded study was approved by the institutional review board of 2 academic medical centers with affiliated community hospitals. Trigger queries are being developed at both organizations, and sample cohorts will be labeled using the triggers. Machine learning techniques such as association rule mining, chi-square automated interaction detection, and classification and regression trees will be used to discover important variables that could be incorporated within future clinical decision support systems to help identify and reduce risks that contribute to diagnostic errors. CONCLUSIONS The use of large EHR data sets and machine learning to investigate risk factors (related to the patient, provider-care team, and system-level) in the diagnostic process may help create future mechanisms for monitoring diagnostic safety. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/24642.
Collapse
Affiliation(s)
- Moein Enayati
- Health Care Delivery Research, Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, United States
| | | | - Xingyu Zhang
- Thomas E Starzl Transplantation Institute, University of Pittsburgh Medical Center, Pittsburgh, PA, United States
| | - Sarah J Parker
- Department of Emergency Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Elizabeth Duffy
- Department of Emergency Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Hardeep Singh
- Center for Innovations in Quality, Effectiveness and Safety, Michael E DeBakey Veterans Affairs Medical Center, Baylor College of Medicine, Houston, TX, United States
| | - Prashant Mahajan
- Department of Emergency Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Kalyan S Pasupathy
- Health Care Delivery Research, Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|