1
|
Saban M, Alon Y, Luxenburg O, Singer C, Hierath M, Karoussou Schreiner A, Brkljačić B, Sosna J. Comparison of CT referral justification using clinical decision support and large language models in a large European cohort. Eur Radiol 2025:10.1007/s00330-025-11608-y. [PMID: 40287868 DOI: 10.1007/s00330-025-11608-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2024] [Revised: 02/12/2025] [Accepted: 03/24/2025] [Indexed: 04/29/2025]
Abstract
BACKGROUND Ensuring appropriate use of CT scans is critical for patient safety and resource optimization. Decision support tools and artificial intelligence (AI), such as large language models (LLMs), have the potential to improve CT referral justification, yet require rigorous evaluation against established standards and expert assessments. AIM To evaluate the performance of LLMs (Generation Pre-trained Transformer 4 (GPT-4) and Claude-3 Haiku) and independent experts in justifying CT referrals compared to the ESR iGuide clinical decision support system as the reference standard. METHODS CT referral data from 6356 patients were retrospectively analyzed. Recommendations were generated by the ESR iGuide, LLMs, and independent experts, and evaluated for accuracy, precision, recall, F1 score, and Cohen's kappa across medical test, organ, and contrast predictions. Statistical analysis included demographic stratification, confidence intervals, and p-values to ensure robust comparisons. RESULTS Independent experts achieved the highest accuracy (92.4%) for medical test justification, surpassing GPT-4 (88.8%) and Claude-3 Haiku (85.2%). For organ predictions, LLMs performed comparably to experts, achieving accuracies of 75.3-77.8% versus 82.6%. For contrast predictions, GPT-4 showed the highest accuracy (57.4%) among models, while Claude demonstrated poor agreement with guidelines (kappa = 0.006). CONCLUSION Independent experts remain the most reliable, but LLMs show potential for optimization, particularly in organ prediction. A hybrid human-AI approach could enhance CT referral appropriateness and utilization. Further research should focus on improving LLM performance and exploring their integration into clinical workflows. KEY POINTS Question Can GPT-4 and Claude-3 Haiku justify CT referrals as accurately as independent experts, using the ESR iGuide as the gold standard? Findings Independent experts outperformed large language models in test justification. GPT-4 and Claude-3 showed comparable organ prediction but struggled with contrast selection, limiting full automation. Clinical relevance While independent experts remain most reliable, integrating AI with expert oversight may improve CT referral appropriateness, optimizing resource allocation and enhancing clinical decision-making.
Collapse
Affiliation(s)
- Mor Saban
- School of Health Sciences, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel.
- The Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, Tel Hashomer, Ramat-Gan, Israel.
| | - Yaniv Alon
- School of Health Sciences, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Osnat Luxenburg
- Medical Technology, Health Information and Research Directorate, Ministry of Health, Jerusalem, Israel
| | - Clara Singer
- School of Health Sciences, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
| | | | | | - Boris Brkljačić
- Department of Radiology, University Hospital Dubrava, University of Zagreb School of Medicine, Zagreb, Croatia
| | - Jacob Sosna
- Department of Radiology, Hadassah Medical Center, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
2
|
Singer C, Saban M, Luxenburg O, Yellin LB, Hierath M, Sosna J, Karoussou-Schreiner A, Brkljačić B. Computed tomography referral guidelines adherence in Europe: insights from a seven-country audit. Eur Radiol 2025; 35:1166-1177. [PMID: 39384590 PMCID: PMC11835886 DOI: 10.1007/s00330-024-11083-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 07/21/2024] [Accepted: 09/09/2024] [Indexed: 10/11/2024]
Abstract
BACKGROUND Ensuring appropriate computed tomography (CT) utilization optimizes patient care while minimizing radiation exposure. Decision support tools show promise for standardizing appropriateness. OBJECTIVES In the current study, we aimed to assess CT appropriateness rates using the European Society of Radiology (ESR) iGuide criteria across seven European countries. Additional objectives were to identify factors associated with appropriateness variability and examine recommended alternative exams. METHODS As part of the European Commission-funded EU-JUST-CT project, 6734 anonymized CT referrals were audited across 125 centers in Belgium, Denmark, Estonia, Finland, Greece, Hungary, and Slovenia. In each country, two blinded radiologists independently scored each exam's appropriateness using the ESR iGuide and noted any recommended alternatives based on presented indications. Arbitration was used in case auditors disagreed. Associations between appropriateness rate and institution type, patient's age and sex, inpatient/outpatient patient status, anatomical area, and referring physician's specialty were statistically examined within each country. RESULTS The average appropriateness rate was 75%, ranging from 58% in Greece to 86% in Denmark. Higher rates were associated with public hospitals, inpatient settings, and referrals from specialists. Variability in appropriateness existed by country and anatomical area, patient age, and gender. Common alternative exam recommendations included magnetic resonance imaging, X-ray, and ultrasound. CONCLUSION This multi-country evaluation found that even when using a standardized imaging guideline, significant variations in CT appropriateness persist, ranging from 58% to 86% across the participating countries. The study provided valuable insights into real-world utilization patterns and identified opportunities to optimize practices and reduce clinical and demographic disparities in CT use. KEY POINTS Question Largest multinational study (7 EU countries, 6734 CT referrals) assessed real-world CT appropriateness using ESR iGuide, enabling cross-system comparisons. Findings Significant variability in appropriateness rates across institution type, patient status, age, gender, exam area, and physician specialty, highlighted the opportunities to optimize practices based on local factors. Clinical relevance International collaboration on imaging guidelines and decision support can maximize CT benefits while optimizing radiation exposure; ongoing research is crucial for refining evidence-based guidelines globally.
Collapse
Affiliation(s)
- Clara Singer
- The Gertner Institute for Epidemiology and Health Policy Research, Chaim Sheba Medical Center, Tel Hashomer, Ramat-Gan, Israel
| | - Mor Saban
- The Gertner Institute for Epidemiology and Health Policy Research, Chaim Sheba Medical Center, Tel Hashomer, Ramat-Gan, Israel
- Nursing Department, School of Health Sciences, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Osnat Luxenburg
- Medical Technology, Health Information and Research Directorate, Ministry of Health, Jerusalem, Israel
| | - Lucia Bergovoy Yellin
- The Gertner Institute for Epidemiology and Health Policy Research, Chaim Sheba Medical Center, Tel Hashomer, Ramat-Gan, Israel
| | | | - Jacob Sosna
- Department of Radiology, Hadassah Medical Center, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel.
| | | | - Boris Brkljačić
- Department of Radiology, University Hospital Dubrava, School of Medicine, University of Zagreb, Zagreb, Croatia
| |
Collapse
|
3
|
Tay YX, Foley SJ, Ong ME, Chen RC, Chan LP, Killeen R, Tan EJ, Mak MS, McNulty JP. Using evidence-based imaging referral guidelines to facilitate appropriate imaging: Are they all the same? Eur J Radiol 2025; 183:111933. [PMID: 39864244 DOI: 10.1016/j.ejrad.2025.111933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 12/28/2024] [Accepted: 01/14/2025] [Indexed: 01/28/2025]
Abstract
RATIONALE AND OBJECTIVES Countries worldwide have selected, adopted, adapted, and translated evidence-based imaging referral guidelines from radiology professional bodies. This study establishes the concordance of three imaging referral guidelines from the ACR, ESR, and RCR, and examines the emergency department cervical spine imaging appropriateness rates. MATERIALS AND METHODS A retrospective analysis of the electronic medical records was performed between October 1st to December 31st, 2022, evaluating 452 radiography and 153 CT imaging referrals. For each case, the initial clinical diagnosis was integrated with the corresponding clinical notes for analysis. Evaluation of the appropriateness rating was dichotomised to either 'indicated' or 'not indicated' for analytical and practical purposes. The inter-rater agreement for the imaging referral guidelines was calculated using Fleiss' Kappa and Cohen's Kappa. RESULTS The overall appropriateness rate of X-ray cervical spine imaging referrals was 13.3 % -75.2 %, depending on the imaging referral guidelines utilised. The appropriateness rate of CT cervical spine was 90.8 %, which was an identical rate for all three of the guidelines. Fleiss' Kappa indicated the guidelines for X-ray of the cervical spine had slight agreement (κ = 0.135 (95 % CI, 0.088 to 0.183), p < 0.001) and almost perfect agreement amongst guidelines for CT cervical spine (κ = 1.000 (95 % CI, 0.909 to 1.091), p < 0.001). For pairwise comparison, ACR AC and ESR iGuide for X-ray demonstrated moderate agreement (κ = 0.765, p < 0.001); however, RCR iRefer had no level of agreement with both. For CT, there was almost perfect agreement between all the guidelines. CONCLUSION The guidelines demonstrated slight agreement for X-ray cervical spine and almost perfect agreement for CT cervical spine, complicating audit process and influencing audit output. Multidisciplinary buy-in positively impacts CT cervical spine appropriateness rates. Harmonising and prioritising guideline development for commonly encountered clinical scenarios is required.
Collapse
Affiliation(s)
- Yi Xiang Tay
- Radiography and Diagnostic Imaging, School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland; Radiography Department, Allied Health Division, Singapore General Hospital, Outram Road, Singapore 169608, Singapore.
| | - Shane J Foley
- Radiography and Diagnostic Imaging, School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland
| | - Marcus Eh Ong
- Department of Emergency Medicine, Division of Medicine, Singapore General Hospital, Outram Road, Singapore 169608, Singapore; Duke-NUS Graduate Medical School, 8 College Road, Singapore 169857, Singapore
| | - Robert Chun Chen
- Duke-NUS Graduate Medical School, 8 College Road, Singapore 169857, Singapore; Department of Neuroradiology, Division of Radiological Sciences, Singapore General Hospital, Outram Road, Singapore 169608, Singapore; National Neuroscience Institute, 11 Jalan Tan Tock Seng, Singapore 308433, Singapore
| | - Lai Peng Chan
- Duke-NUS Graduate Medical School, 8 College Road, Singapore 169857, Singapore; Department of Diagnostic Radiology, Division of Radiological Sciences, Singapore General Hospital, Outram Road, Singapore 169608, Singapore
| | - Ronan Killeen
- St Vincent's University Hospital, Elm Park, Dublin 4, Ireland; School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland
| | - Eu Jin Tan
- Duke-NUS Graduate Medical School, 8 College Road, Singapore 169857, Singapore; Department of Diagnostic Radiology, Division of Radiological Sciences, Singapore General Hospital, Outram Road, Singapore 169608, Singapore
| | - May San Mak
- Duke-NUS Graduate Medical School, 8 College Road, Singapore 169857, Singapore; Department of Diagnostic Radiology, Division of Radiological Sciences, Singapore General Hospital, Outram Road, Singapore 169608, Singapore
| | - Jonathan P McNulty
- Radiography and Diagnostic Imaging, School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland
| |
Collapse
|
4
|
Tay YX, Foley S, Killeen R, Ong MEH, Chen RC, Chan LP, Mak MS, McNulty JP. Impact and effect of imaging referral guidelines on patients and radiology services: a systematic review. Eur Radiol 2025; 35:532-541. [PMID: 39002059 PMCID: PMC11632068 DOI: 10.1007/s00330-024-10938-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/10/2024] [Accepted: 06/11/2024] [Indexed: 07/15/2024]
Abstract
OBJECTIVES The objective of this systematic review was to offer a comprehensive overview and explore the associated outcomes from imaging referral guidelines on various key stakeholders, such as patients and radiologists. MATERIALS AND METHODS An electronic database search was conducted in Medline, Embase and Web of Science to retrieve citations published between 2013 and 2023. The search was constructed using medical subject headings and keywords. Only full-text articles and reviews written in English were included. The quality of the included papers was assessed using the mixed methods appraisal tool. A narrative synthesis was undertaken for the selected articles. RESULTS The search yielded 4384 records. Following the abstract, full-text screening, and removal of duplication, 31 studies of varying levels of quality were included in the final analysis. Imaging referral guidelines from the American College of Radiology were most commonly used. Clinical decision support systems were the most evaluated mode of intervention, either integrated or standalone. Interventions showed reduced patient radiation doses and waiting times for imaging. There was a general reduction in radiology workload and utilisation of diagnostic imaging. Low-value imaging utilisation decreased with an increase in the appropriateness of imaging referrals and ratings and cost savings. Clinical effectiveness was maintained during the intervention period without notable adverse consequences. CONCLUSION Using evidence-based imaging referral guidelines improves the quality of healthcare and outcomes while reducing healthcare costs. Imaging referral guidelines are one essential component of improving the value of radiology in the healthcare system. CLINICAL RELEVANCE STATEMENT There is a need for broader dissemination of imaging referral guidelines to healthcare providers globally in tandem with the harmonisation of the application of these guidelines to improve the overall value of radiology within the healthcare system. KEY POINTS The application of imaging referral guidelines has an impact and effect on patients, radiologists, and health policymakers. The adoption of imaging referral guidelines in clinical practice can impact healthcare costs and improve healthcare quality and outcomes. Implementing imaging referral guidelines contributes to the attainment of value-based radiology.
Collapse
Affiliation(s)
- Yi Xiang Tay
- Radiography and Diagnostic Imaging, School of Medicine, University College Dublin, Dublin, Ireland.
- Radiography Department, Allied Health Division, Singapore General Hospital, Singapore, Singapore.
| | - Shane Foley
- Radiography and Diagnostic Imaging, School of Medicine, University College Dublin, Dublin, Ireland
| | - Ronan Killeen
- St Vincent's University Hospital, Dublin, Ireland
- School of Medicine, University College Dublin, Dublin, Ireland
| | - Marcus E H Ong
- Department of Emergency Medicine, Division of Medicine, Singapore General Hospital, Singapore, Singapore
- Duke-NUS Graduate Medical School, Singapore, Singapore
| | - Robert Chun Chen
- Duke-NUS Graduate Medical School, Singapore, Singapore
- Department of Neuroradiology, Division of Radiological Sciences, Singapore General Hospital, Singapore, Singapore
- National Neuroscience Institute, Singapore, Singapore
| | - Lai Peng Chan
- Duke-NUS Graduate Medical School, Singapore, Singapore
- Department of Diagnostic Radiology, Division of Radiological Sciences, Singapore General Hospital, Singapore, Singapore
| | - May San Mak
- Duke-NUS Graduate Medical School, Singapore, Singapore
- Department of Diagnostic Radiology, Division of Radiological Sciences, Singapore General Hospital, Singapore, Singapore
| | - Jonathan P McNulty
- Radiography and Diagnostic Imaging, School of Medicine, University College Dublin, Dublin, Ireland
| |
Collapse
|
5
|
Rosen S, Saban M. Evaluating the reliability of ChatGPT as a tool for imaging test referral: a comparative study with a clinical decision support system. Eur Radiol 2024; 34:2826-2837. [PMID: 37828297 DOI: 10.1007/s00330-023-10230-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 07/28/2023] [Accepted: 08/01/2023] [Indexed: 10/14/2023]
Abstract
OBJECTIVES As the technology continues to evolve and advance, we can expect to see artificial intelligence (AI) being used in increasingly sophisticated ways to make a diagnosis and decisions such as suggesting the most appropriate imaging referrals. We aim to explore whether Chat Generative Pretrained Transformer (ChatGPT) can provide accurate imaging referrals for clinical use that are at least as good as the ESR iGuide. METHODS A comparative study was conducted in a tertiary hospital. Data was collected from 97 consecutive cases that were admitted to the emergency department with abdominal complaints. We compared the imaging test referral recommendations suggested by the ESR iGuide and the ChatGPT and analyzed cases of disagreement. In addition, we selected cases where ChatGPT recommended a chest abdominal pelvis (CAP) CT (n = 66), and asked four specialists to grade the appropriateness of the referral. RESULTS ChatGPT recommendations were consistent with the recommendations provided by the ESR iGuide. No statistical differences were found between the appropriateness of referrals by age or gender. Using a sub-analysis of CAP cases, a high agreement between ChatGPT and the specialists was found. Cases of disagreement (12.4%) were further analyzed and presented themes of vague recommendations such as "it would be advisable" and "this would help to rule out." CONCLUSIONS ChatGPT's ability to guide the selection of appropriate tests may be comparable to some degree with the ESR iGuide. Features such as the clinical, ethical, and regulatory implications are still warranted and need to be addressed prior to clinical implementation. Further studies are needed to confirm these findings. CLINICAL RELEVANCE STATEMENT The article explores the potential of using advanced language models, such as ChatGPT, in healthcare as a CDS for selecting appropriate imaging tests. Using ChatGPT can improve the efficiency of the decision-making process KEY POINTS: • ChatGPT recommendations were highly consistent with the recommendations provided by the ESR iGuide. • ChatGPT's ability in guiding the selection of appropriate tests may be comparable to some degree with ESR iGuide's.
Collapse
Affiliation(s)
- Shani Rosen
- Department of Health Technology and Policy Evaluation, Gertner Institute for Epidemiology and Health Policy, Institute of Epidemiology & Health Policy Research, Sheba Medical Center, Tel HaShomer, Ramat-Gan, Israel
- Nursing Department, School of Health Sciences, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Mor Saban
- Nursing Department, School of Health Sciences, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
6
|
Saban M, Sosna J, Singer C, Vaknin S, Myers V, Shaham D, Assaf J, Hershko A, Feder-Bubis P, Wilf-Miron R, Luxenburg O. Clinical decision support system recommendations: how often do radiologists and clinicians accept them? Eur Radiol 2022; 32:4218-4224. [PMID: 35024948 DOI: 10.1007/s00330-021-08479-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/01/2021] [Accepted: 11/22/2021] [Indexed: 11/29/2022]
Abstract
OBJECTIVE To assess the acceptance and reliability of clinical decision support system (CDSS) imaging referral scores (ESR iGuide). METHODS A pilot study was conducted in a tertiary hospital. Four different experts were invited to rate 40 simulated clinical cases on a 5-level scale, for the level of agreement with the ESR iGuide's recommended procedures. In cases of disagreement, physicians were asked to indicate the reason. Descriptive measures were calculated for the level of agreement. We also explored the degree of agreement between four different specialists, and examined the cases in which clinicians disagreed with ESR iGuide best practice recommendations. RESULTS The mean rating of the four experts for the 40 clinical simulated cases was 4.17 ± 0.65, median 4.25 (on a scale of 1-5). All four raters totally agreed with the system recommendation in 75% of cases. No significant relationship was found between the degree of agreement and the number of indications and the patient's age or gender. In an optimistic scenario, using a binary agree/disagree variable, the Overall Percentage Agreement for the rating of the 40 simulated cases between the four experts was 77.28%. There were a total of 20 disagreements out of 160 cases with the ESR iGuide, of which 7 were among the two radiologists. CONCLUSIONS CDSS can be an effective tool for guiding the selection of appropriate imaging examinations, thus cutting costs due to unnecessary imaging scans. Since this is a pilot study, further research on a larger scale, preferably at national level, is required. KEY POINTS • The average of the mean rating of the four experts was 4.17 ± 0.65, median 4.25, on a scale of 1-5 where 5 represents total agreement with the CDSS tool. • In an optimistic scenario, using a binary agree/disagree variable, the Overall Percentage Agreement between the four experts was 77.28%. • Radiologists had fewer disagreements with the recommendations of the CDSS tool than other physicians, indicating a better fit of the support system to radiology experts' perspective.
Collapse
Affiliation(s)
- Mor Saban
- The Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, 526210, Ramat Gan, Israel.
| | - Jacob Sosna
- Department of Radiology, Hadassah Hebrew University Medical Center, Jerusalem, Israel
| | - Clara Singer
- The Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, 526210, Ramat Gan, Israel
| | - Sharona Vaknin
- The Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, 526210, Ramat Gan, Israel
| | - Vicki Myers
- The Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, 526210, Ramat Gan, Israel
| | - Dorit Shaham
- Department of Radiology, Hadassah Hebrew University Medical Center, Jerusalem, Israel
| | - Jacob Assaf
- Emergency Department, Hadassah Hebrew University Medical Center, Jerusalem, Israel
| | - Alon Hershko
- Internal Department, Hadassah Hebrew University Medical Center, Jerusalem, Israel
| | - Paula Feder-Bubis
- Department of Health Systems Management, Ben-Gurion University of the Negev, Beersheba, Israel
| | - Rachel Wilf-Miron
- The Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, 526210, Ramat Gan, Israel.,School of Public Health, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Osnat Luxenburg
- Medical Technology, Health Information and Research Directorate, Ministry of Health, Jerusalem, Israel
| |
Collapse
|