1
|
Kaba E, Çubukçu Y, Solak M, Tabakoğlu S, Beyazal M. Evaluating large language models in supporting bone-RADS scoring: accuracy and inter-model agreement. Clin Imaging 2025; 120:110421. [PMID: 39921937 DOI: 10.1016/j.clinimag.2025.110421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2024] [Accepted: 01/06/2025] [Indexed: 02/10/2025]
Affiliation(s)
- Esat Kaba
- Recep Tayyip Erdogan University, Department of Radiology, Rize, 53100, Turkey.
| | - Yusuf Çubukçu
- Recep Tayyip Erdogan University, Department of Radiology, Rize, 53100, Turkey
| | - Merve Solak
- Recep Tayyip Erdogan University, Department of Radiology, Rize, 53100, Turkey
| | - Serdar Tabakoğlu
- Recep Tayyip Erdogan University, Department of Radiology, Rize, 53100, Turkey
| | - Mehmet Beyazal
- Recep Tayyip Erdogan University, Department of Radiology, Rize, 53100, Turkey
| |
Collapse
|
2
|
Turan Eİ, Baydemir AE, Balıtatlı AB, Şahin AS. Assessing the accuracy of ChatGPT in interpreting blood gas analysis results ChatGPT-4 in blood gas analysis. J Clin Anesth 2025; 102:111787. [PMID: 39986120 DOI: 10.1016/j.jclinane.2025.111787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 12/26/2024] [Accepted: 02/17/2025] [Indexed: 02/24/2025]
Abstract
BACKGROUND Arterial blood gas (ABG) analysis is a critical component of patient management in intensive care units (ICUs), operating rooms, and general wards, providing essential information on acid-base balance, oxygenation, and metabolic status. Interpretation requires a high level of expertise, potentially leading to variability in accuracy. This study explores the feasibility and accuracy of ChatGPT-4, an AI-based model, in interpreting ABG results compared to experienced anesthesiologists. METHODS This prospective observational study, approved by the institutional ethics board, included 400 ABG samples from ICU patients, anonymized and assessed by ChatGPT-4. The model analyzed parameters including acid-base status, oxygenation, hemoglobin levels, and metabolic markers, and provided both diagnostic and treatment recommendations. Two anesthesiologists, trained in ABG interpretation, independently evaluated the model's predictions to determine accuracy in potential diagnoses and treatment. RESULTS ChatGPT-4 achieved high accuracy across most ABG parameters, with 100 % accuracy for pH, oxygenation, sodium, and chloride. Hemoglobin accuracy was 92.5 %, while bilirubin interpretation showed limitations at 72.5 %. In several cases, the model recommended unnecessary bicarbonate treatment, suggesting an area for improvement in clinical judgment for acid-base balance management. The model's overall performance was statistically significant across most parameters (p < 0.05). DISCUSSION ChatGPT-4 demonstrated potential as a supplementary tool for ABG interpretation in high-demand clinical settings, supporting rapid, reliable decision-making. However, the model's limitations in interpreting complex metabolic markers highlight the need for clinician oversight. Future refinements should focus on enhancing AI training for nuanced metabolic interpretation, particularly for markers like bilirubin, to ensure safe and effective application across diverse clinical contexts.
Collapse
Affiliation(s)
- Engin İhsan Turan
- Department of Anesthesiology, Istanbul Health Science University Kanuni Sultan Süleyman Education and Training Hospital, Istanbul, Turkey.
| | | | - Anıl Berkay Balıtatlı
- Department of Anesthesiology, Istanbul Health Science University Kanuni Sultan Süleyman Education and Training Hospital, Istanbul, Turkey
| | - Ayça Sultan Şahin
- Department of Anesthesiology, Istanbul Health Science University Kanuni Sultan Süleyman Education and Training Hospital, Istanbul, Turkey
| |
Collapse
|
3
|
Mohammadi M, Parviz S, Parvaz P, Pirmoradi MM, Afzalimoghaddam M, Mirfazaelian H. Diagnostic performance of ChatGPT in tibial plateau fracture in knee X-ray. Emerg Radiol 2025; 32:59-64. [PMID: 39613920 DOI: 10.1007/s10140-024-02298-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 10/30/2024] [Indexed: 12/01/2024]
Abstract
PURPOSE Tibial plateau fractures are relatively common and require accurate diagnosis. Chat Generative Pre-Trained Transformer (ChatGPT) has emerged as a tool to improve medical diagnosis. This study aims to investigate the accuracy of this tool in diagnosing tibial plateau fractures. METHODS A secondary analysis was performed on 111 knee radiographs from emergency department patients, with 29 confirmed fractures by computed tomography (CT) imaging. The X-rays were reviewed by a board-certified emergency physician (EP) and radiologist and then analyzed by ChatGPT-4 and ChatGPT-4o. The diagnostic performances were compared using the area under the receiver operating characteristic curve (AUC). Sensitivity, specificity, and likelihood ratios were also calculated. RESULTS The results indicated a sensitivity and negative likelihood ratio of 58.6% (95% CI: 38.9 - 76.4%) and 0.4 (95% CI: 0.3-0.7) for the EP, 72.4% (95% CI: 52.7 - 87.2%) and 0.3 (95% CI: 0.2-0.6) for the radiologist, 27.5% (95% CI: 12.7 - 47.2%) and 0.7 (95% CI: 0.6-0.9) for ChatGPT-4, and 55.1% (95% CI: 35.6 - 73.5%) and 0.4 (95% CI: 0.3-0.7) for ChatGPT4o. The specificity and positive likelihood ratio were 85.3% (95% CI: 75.8 - 92.2%) and 4.0 (95% CI: 2.1-7.3) for the EP, 76.8% (95% CI: 66.2 - 85.4%) and 3.1 (95% CI: 1.9-4.9) for the radiologist, 95.1% (95% CI: 87.9 - 98.6%) and 5.6 (95% CI: 1.8-17.3) for ChatGPT-4, and 93.9% (95% CI: 86.3 - 97.9%) and 9.0 (95% CI: 3.6-22.4) for ChatGPT4o. The area under the receiver operating characteristic curve (AUC) was 0.72 (95% CI: 0.6-0.8) for the EP, 0.75 (95% CI: 0.6-0.8) for the radiologist, 0.61 (95% CI: 0.4-0.7) for ChatGPT-4, and 0.74 (95% CI: 0.6-0.8) for ChatGPT4-o. The EP and radiologist significantly outperformed ChatGPT-4 (P value = 0.02 and 0.01, respectively), whereas there was no significant difference between the EP, ChatGPT-4o, and radiologist. CONCLUSION ChatGPT-4o matched the physicians' performance and also had the highest specificity. Similar to the physicians, ChatGPT chatbots were not suitable for ruling out the fracture.
Collapse
Affiliation(s)
| | - Sara Parviz
- Musculoskeletal Imaging Research Center (MIRC), Tehran University of Medical Sciences, Tehran, Iran
| | - Parinaz Parvaz
- Radiology Department, Tehran University of Medical Sciences, Tehran, Iran
| | | | - Mohammad Afzalimoghaddam
- Emergency Medicine Department, Tehran University of Medical Sciences, Tehran, Iran
- Prehospital and Hospital Emergency Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Hadi Mirfazaelian
- Prehospital and Hospital Emergency Research Center, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
4
|
Young A, Wang KE, Jin MX, Avilla K, Gilotra K, Nguyen P, Ros PR. A Hands-Free Approach With Voice to Text and Generative Artificial Intelligence: Streamlining Radiology Reporting. J Am Coll Radiol 2025; 22:200-203. [PMID: 39424018 DOI: 10.1016/j.jacr.2024.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 10/08/2024] [Accepted: 10/11/2024] [Indexed: 10/21/2024]
Affiliation(s)
- Austin Young
- Renaissance School of Medicine at Stony Brook University, Stony Brook, New York
| | - Katherine E Wang
- Renaissance School of Medicine at Stony Brook University, Stony Brook, New York.
| | - Michael X Jin
- Renaissance School of Medicine at Stony Brook University, Stony Brook, New York; Department of Radiology, Stony Brook University Hospital, Stony Brook, New York
| | - Kian Avilla
- Renaissance School of Medicine at Stony Brook University, Stony Brook, New York
| | - Kevin Gilotra
- Renaissance School of Medicine at Stony Brook University, Stony Brook, New York
| | - Pamela Nguyen
- Department of Radiology, Columbia University Irving Medical Center, New York, New York
| | - Pablo R Ros
- Renaissance School of Medicine at Stony Brook University, Stony Brook, New York; Vice Chair for Academic Affairs, Department of Radiology, Stony Brook Medical Center, Stony Brook, New York
| |
Collapse
|
5
|
Xie Y, Zhai Y, Lu G. Evolution of artificial intelligence in healthcare: a 30-year bibliometric study. Front Med (Lausanne) 2025; 11:1505692. [PMID: 39882522 PMCID: PMC11775008 DOI: 10.3389/fmed.2024.1505692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Accepted: 12/31/2024] [Indexed: 01/31/2025] Open
Abstract
Introduction In recent years, the development of artificial intelligence (AI) technologies, including machine learning, deep learning, and large language models, has significantly supported clinical work. Concurrently, the integration of artificial intelligence with the medical field has garnered increasing attention from medical experts. This study undertakes a dynamic and longitudinal bibliometric analysis of AI publications within the healthcare sector over the past three decades to investigate the current status and trends of the fusion between medicine and artificial intelligence. Methods Following a search on the Web of Science, researchers retrieved all reviews and original articles concerning artificial intelligence in healthcare published between January 1993 and December 2023. The analysis employed Bibliometrix, Biblioshiny, and Microsoft Excel, incorporating the bibliometrix R package for data mining and analysis, and visualized the observed trends in bibliometrics. Results A total of 22,950 documents were collected in this study. From 1993 to 2023, there was a discernible upward trajectory in scientific output within bibliometrics. The United States and China emerged as primary contributors to medical artificial intelligence research, with Harvard University leading in publication volume among institutions. Notably, the rapid expansion of emerging topics such as COVID-19 and new drug discovery in recent years is noteworthy. Furthermore, the top five most cited papers in 2023 were all pertinent to the theme of ChatGPT. Conclusion This study reveals a sustained explosive growth trend in AI technologies within the healthcare sector in recent years, with increasingly profound applications in medicine. Additionally, medical artificial intelligence research is dynamically evolving with the advent of new technologies. Moving forward, concerted efforts to bolster international collaboration and enhance comprehension and utilization of AI technologies are imperative for fostering novel innovations in healthcare.
Collapse
Affiliation(s)
- Yaojue Xie
- Yangjiang Bainian Yanshen Medical Technology Co., Ltd., Yangjiang, China
| | - Yuansheng Zhai
- Department of Cardiology, Heart Center, First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
- NHC Key Laboratory of Assisted Circulation (Sun Yat-sen University), Guangzhou, China
| | - Guihua Lu
- Department of Cardiology, Heart Center, First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
- NHC Key Laboratory of Assisted Circulation (Sun Yat-sen University), Guangzhou, China
| |
Collapse
|
6
|
Kaba E, Akkaya S. Performance of Different Large Language Models in the Sample Test of the European Cardiovascular Radiology Board Examination. Acad Radiol 2024; 31:4294-4295. [PMID: 38902112 DOI: 10.1016/j.acra.2024.06.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Accepted: 06/04/2024] [Indexed: 06/22/2024]
Affiliation(s)
- Esat Kaba
- Recep Tayyip Erdogan University, Department of Radiology, Rize, Turkey.
| | - Selçuk Akkaya
- Karadeniz Technical University, Department of Radiology, Trabzon, Turkey
| |
Collapse
|
7
|
AlSaad R, Abd-Alrazaq A, Boughorbel S, Ahmed A, Renault MA, Damseh R, Sheikh J. Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook. J Med Internet Res 2024; 26:e59505. [PMID: 39321458 PMCID: PMC11464944 DOI: 10.2196/59505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 08/07/2024] [Accepted: 08/20/2024] [Indexed: 09/27/2024] Open
Abstract
In the complex and multidimensional field of medicine, multimodal data are prevalent and crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types, including medical images (eg, MRI and CT scans), time-series data (eg, sensor data from wearable devices and electronic health records), audio recordings (eg, heart and respiratory sounds and patient interviews), text (eg, clinical notes and research articles), videos (eg, surgical procedures), and omics data (eg, genomics and proteomics). While advancements in large language models (LLMs) have enabled new applications for knowledge retrieval and processing in the medical field, most LLMs remain limited to processing unimodal data, typically text-based content, and often overlook the importance of integrating the diverse data modalities encountered in clinical practice. This paper aims to present a detailed, practical, and solution-oriented perspective on the use of multimodal LLMs (M-LLMs) in the medical field. Our investigation spanned M-LLM foundational principles, current and potential applications, technical and ethical challenges, and future research directions. By connecting these elements, we aimed to provide a comprehensive framework that links diverse aspects of M-LLMs, offering a unified vision for their future in health care. This approach aims to guide both future research and practical implementations of M-LLMs in health care, positioning them as a paradigm shift toward integrated, multimodal data-driven medical practice. We anticipate that this work will spark further discussion and inspire the development of innovative approaches in the next generation of medical M-LLM systems.
Collapse
Affiliation(s)
- Rawan AlSaad
- Weill Cornell Medicine-Qatar, Education City, Doha, Qatar
| | | | - Sabri Boughorbel
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Arfan Ahmed
- Weill Cornell Medicine-Qatar, Education City, Doha, Qatar
| | | | - Rafat Damseh
- Department of Computer Science and Software Engineering, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Javaid Sheikh
- Weill Cornell Medicine-Qatar, Education City, Doha, Qatar
| |
Collapse
|
8
|
Burti S, Zotti A, Banzato T. Role of AI in diagnostic imaging error reduction. Front Vet Sci 2024; 11:1437284. [PMID: 39280838 PMCID: PMC11392848 DOI: 10.3389/fvets.2024.1437284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 08/21/2024] [Indexed: 09/18/2024] Open
Abstract
The topic of diagnostic imaging error and the tools and strategies for error mitigation are poorly investigated in veterinary medicine. The increasing popularity of diagnostic imaging and the high demand for teleradiology make mitigating diagnostic imaging errors paramount in high-quality services. The different sources of error have been thoroughly investigated in human medicine, and the use of AI-based products is advocated as one of the most promising strategies for error mitigation. At present, AI is still an emerging technology in veterinary medicine and, as such, is raising increasing interest among in board-certified radiologists and general practitioners alike. In this perspective article, the role of AI in mitigating different types of errors, as classified in the human literature, is presented and discussed. Furthermore, some of the weaknesses specific to the veterinary world, such as the absence of a regulatory agency for admitting medical devices to the market, are also discussed.
Collapse
Affiliation(s)
- Silvia Burti
- Department of Animal Medicine, Production and Health, University of Padua, Padua, Italy
| | - Alessandro Zotti
- Department of Animal Medicine, Production and Health, University of Padua, Padua, Italy
| | - Tommaso Banzato
- Department of Animal Medicine, Production and Health, University of Padua, Padua, Italy
| |
Collapse
|
9
|
Ray PP. Need of Fine-Tuned Radiology Aware Open-Source Large Language Models for Neuroradiology. Clin Neuroradiol 2024:10.1007/s00062-024-01454-8. [PMID: 39158608 DOI: 10.1007/s00062-024-01454-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Accepted: 08/05/2024] [Indexed: 08/20/2024]
Affiliation(s)
- Partha Pratim Ray
- Department of Computer Applications, Sikkim University, 6th Mile, PO-Tadong, 737102, Gangtok, Sikkim, India.
| |
Collapse
|
10
|
Ahimaz P, Bergner AL, Florido ME, Harkavy N, Bhattacharyya S. Genetic counselors' utilization of ChatGPT in professional practice: A cross-sectional study. Am J Med Genet A 2024; 194:e63493. [PMID: 38066714 DOI: 10.1002/ajmg.a.63493] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 11/21/2023] [Accepted: 11/22/2023] [Indexed: 03/10/2024]
Abstract
PURPOSE The precision medicine era has seen increased utilization of artificial intelligence (AI) in the field of genetics. We sought to explore the ways that genetic counselors (GCs) currently use the publicly accessible AI tool Chat Generative Pre-trained Transformer (ChatGPT) in their work. METHODS GCs in North America were surveyed about how ChatGPT is used in different aspects of their work. Descriptive statistics were reported through frequencies and means. RESULTS Of 118 GCs who completed the survey, 33.8% (40) reported using ChatGPT in their work; 47.5% (19) use it in clinical practice, 35% (14) use it in education, and 32.5% (13) use it in research. Most GCs (62.7%; 74) felt that it saves time on administrative tasks but the majority (82.2%; 97) felt that a paramount challenge was the risk of obtaining incorrect information. The majority of GCs not using ChatGPT (58.9%; 46) felt it was not necessary for their work. CONCLUSION A considerable number of GCs in the field are using ChatGPT in different ways, but it is primarily helpful with tasks that involve writing. It has potential to streamline workflow issues encountered in clinical genetics, but practitioners need to be informed and uniformly trained about its limitations.
Collapse
Affiliation(s)
- Priyanka Ahimaz
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Pediatrics, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Amanda L Bergner
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Genetics and Development, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Neurology, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Michelle E Florido
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Genetics and Development, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Nina Harkavy
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Obstetrics and Gynecology, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Sriya Bhattacharyya
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Psychiatry, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| |
Collapse
|
11
|
Şenoymak MC, Erbatur NH, Şenoymak İ, Fırat SN. The Role of Artificial Intelligence in Endocrine Management: Assessing ChatGPT's Responses to Prolactinoma Queries. J Pers Med 2024; 14:330. [PMID: 38672957 PMCID: PMC11051052 DOI: 10.3390/jpm14040330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 03/15/2024] [Accepted: 03/21/2024] [Indexed: 04/28/2024] Open
Abstract
This research investigates the utility of Chat Generative Pre-trained Transformer (ChatGPT) in addressing patient inquiries related to hyperprolactinemia and prolactinoma. A set of 46 commonly asked questions from patients with prolactinoma were presented to ChatGPT and responses were evaluated for accuracy with a 6-point Likert scale (1: completely inaccurate to 6: completely accurate) and adequacy with a 5-point Likert scale (1: completely inadequate to 5: completely adequate). Two independent endocrinologists assessed the responses, based on international guidelines. Questions were categorized into groups including general information, diagnostic process, treatment process, follow-up, and pregnancy period. The median accuracy score was 6.0 (IQR, 5.4-6.0), and the adequacy score was 4.5 (IQR, 3.5-5.0). The lowest accuracy and adequacy score assigned by both evaluators was two. Significant agreement was observed between the evaluators, demonstrated by a weighted κ of 0.68 (p = 0.08) for accuracy and a κ of 0.66 (p = 0.04) for adequacy. The Kruskal-Wallis tests revealed statistically significant differences among the groups for accuracy (p = 0.005) and adequacy (p = 0.023). The pregnancy period group had the lowest accuracy score and both pregnancy period and follow-up groups had the lowest adequacy score. In conclusion, ChatGPT demonstrated commendable responses in addressing prolactinoma queries; however, certain limitations were observed, particularly in providing accurate information related to the pregnancy period, emphasizing the need for refining its capabilities in medical contexts.
Collapse
Affiliation(s)
- Mustafa Can Şenoymak
- Department of Endocrinology and Metabolism, University of Health Sciences Sultan, Abdulhamid Han Training and Research Hospital, Istanbul 34668, Turkey
| | - Nuriye Hale Erbatur
- Department of Endocrinology and Metabolism, University of Health Sciences Sultan, Abdulhamid Han Training and Research Hospital, Istanbul 34668, Turkey
| | - İrem Şenoymak
- Family Medicine Department, Usküdar State Hospital, Istanbul 34662, Turkey
| | - Sevde Nur Fırat
- Department of Endocrinology and Metabolism, University of Health Sciences, Ankara Training and Research Hospital, Ankara 06230, Turkey
| |
Collapse
|