1
|
Maniaci A, Fakhry N, Chiesa-Estomba C, Lechien JR, Lavalle S. Synergizing ChatGPT and general AI for enhanced medical diagnostic processes in head and neck imaging. Eur Arch Otorhinolaryngol 2024; 281:3297-3298. [PMID: 38353768 DOI: 10.1007/s00405-024-08511-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Accepted: 01/24/2024] [Indexed: 05/03/2024]
Affiliation(s)
- Antonino Maniaci
- Faculty of Medicine and Surgery, University of Enna Kore, 94100, Enna, Italy
- Head & Neck Study Group, Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), 13005, Marseille, France
| | - Nicolas Fakhry
- Department of Otolaryngology, Head & Neck Surgery, Aix-Marseille University, AP-HM, La Conception Hospital, 147, Boulevard Baille, 13005, Marseille, France
- Head & Neck Study Group, Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), 13005, Marseille, France
| | - Carlos Chiesa-Estomba
- Head & Neck Study Group, Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), 13005, Marseille, France
- Department of Otorhinolaryngology, Head and Neck Surgery, Donostia University Hospital, San Sebastian, Spain
| | - Jerome R Lechien
- Head & Neck Study Group, Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), 13005, Marseille, France
- Department of Human Anatomy and Experimental Oncology, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium
| | - Salvatore Lavalle
- Faculty of Medicine and Surgery, University of Enna Kore, 94100, Enna, Italy.
| |
Collapse
|
2
|
Keshavarz P, Bagherieh S, Nabipoorashrafi SA, Chalian H, Rahsepar AA, Kim GHJ, Hassani C, Raman SS, Bedayat A. ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives. Diagn Interv Imaging 2024:S2211-5684(24)00105-0. [PMID: 38679540 DOI: 10.1016/j.diii.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/11/2024] [Accepted: 04/16/2024] [Indexed: 05/01/2024]
Abstract
PURPOSE The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications. MATERIALS AND METHODS After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications. RESULTS Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists' decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks. CONCLUSION Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.
Collapse
Affiliation(s)
- Pedram Keshavarz
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; School of Science and Technology, The University of Georgia, Tbilisi 0171, Georgia
| | - Sara Bagherieh
- Independent Clinical Radiology Researcher, Los Angeles, CA 90024, USA
| | | | - Hamid Chalian
- Department of Radiology, Cardiothoracic Imaging, University of Washington, Seattle, WA 98195, USA
| | - Amir Ali Rahsepar
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Grace Hyun J Kim
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; Department of Radiological Sciences, Center for Computer Vision and Imaging Biomarkers, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Cameron Hassani
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Steven S Raman
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Arash Bedayat
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA.
| |
Collapse
|
3
|
Bera K, O'Connor G, Jiang S, Tirumani SH, Ramaiya N. Analysis of ChatGPT publications in radiology: Literature so far. Curr Probl Diagn Radiol 2024; 53:215-225. [PMID: 37891083 DOI: 10.1067/j.cpradiol.2023.10.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 10/18/2023] [Indexed: 10/29/2023]
Abstract
OBJECTIVE To perform a detailed qualitative and quantitative analysis of the published literature on ChatGPT and radiology in the nine months since its public release, detailing the scope of the work in the short timeframe. METHODS A systematic literature search was carried out of the MEDLINE, EMBASE databases through August 15, 2023 for articles that were focused on ChatGPT and imaging/radiology. Articles were classified into original research and reviews/perspectives. Quantitative analysis was carried out by two experienced radiologists using objective scoring systems for evaluating original and non-original research. RESULTS 51 articles were published involving ChatGPT and radiology/imaging dating from 26 Jan 2023 to the last article published on 14 Aug 2023. 23 articles were original research while the rest included reviews/perspectives or brief communications. For quantitative analysis scored by two readers, we included 23 original research and 17 non-original research articles (after excluding 11 letters as responses to previous articles). Mean score for original research was 3.20 out of 5 (across five questions), while mean score for non-original research was 1.17 out of 2 (across six questions). Mean score grading performance of ChatGPT in original research was 3.20 out of five (across two questions). DISCUSSION While it is early days for ChatGPT and its impact in radiology, there has already been a plethora of articles talking about the multifaceted nature of the tool and how it can impact every aspect of radiology from patient education, pre-authorization, protocol selection, generating differentials, to structuring radiology reports. Most articles show impressive performance of ChatGPT which can only improve with more research and improvements in the tool itself. There have also been several articles which have highlighted the limitations of ChatGPT in its current iteration, which will allow radiologists and researchers to improve these areas.
Collapse
Affiliation(s)
- Kaustav Bera
- Department of Radiology, University Hospitals Cleveland Medical Center, 11000 Euclid Avenue, Cleveland, OH, 44106, USA.
| | - Gregory O'Connor
- Department of Radiology, University Hospitals Cleveland Medical Center, 11000 Euclid Avenue, Cleveland, OH, 44106, USA
| | - Sirui Jiang
- Department of Radiology, University Hospitals Cleveland Medical Center, 11000 Euclid Avenue, Cleveland, OH, 44106, USA
| | - Sree Harsha Tirumani
- Department of Radiology, University Hospitals Cleveland Medical Center, 11000 Euclid Avenue, Cleveland, OH, 44106, USA
| | - Nikhil Ramaiya
- Department of Radiology, University Hospitals Cleveland Medical Center, 11000 Euclid Avenue, Cleveland, OH, 44106, USA
| |
Collapse
|
4
|
Zaki HA, Aoun A, Munshi S, Abdel-Megid H, Nazario-Johnson L, Ahn SH. The Application of Large Language Models for Radiologic Decision Making. J Am Coll Radiol 2024:S1546-1440(24)00056-5. [PMID: 38224925 DOI: 10.1016/j.jacr.2024.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 01/05/2024] [Accepted: 01/05/2024] [Indexed: 01/17/2024]
Abstract
BACKGROUND AND PURPOSE Large language models (LLMs) have seen explosive growth, but their potential role in medical applications remains underexplored. Our study investigates the capability of LLMs to predict the most appropriate imaging study for specific clinical presentations in various subspecialty areas in radiology. METHODS AND MATERIALS Chat Generative Pretrained Transformer (ChatGPT), by OpenAI and Glass AI by Glass Health were tested on 1,075 clinical scenarios from 11 ACR expert panels to determine the most appropriate imaging study, benchmarked against the ACR Appropriateness Criteria. Two responses per clinical presentation were generated and averaged for the final clinical presentation score. Clinical presentation scores for each topic area were averaged as its final score. The average of the topic scores within a panel determined the final score of each panel. LLM responses were on a scale of 0 to 3. Partial scores were given for nonspecific answers. Pearson correlation coefficient (R-value) was calculated for each panel to determine a context-specific performance. RESULTS Glass AI scored significantly higher than ChatGPT (2.32 ± 0.67 versus 2.08 ± 0.74, P = .002). Both LLMs performed the best in the Polytrauma, Breast, and Vascular panels, and performed the worst in the Neurologic, Musculoskeletal, and Cardiac panels. Glass AI outperformed ChatGPT in 10 of 11 panels, except Obstetrics and Gynecology. Maximum agreement was in the Pediatrics, Neurologic, and Thoracic panels, and the most disagreement occurred in the Vascular, Breast, and Urologic panels. CONCLUSION LLMs can be used to predict imaging studies, with Glass AI's superior performance indicating the benefits of extra medical-text training. This supports the potential of LLMs in radiologic decision making.
Collapse
Affiliation(s)
- Hossam A Zaki
- Department of Diagnostic Imaging, The Warren Alpert Medical School of Brown University/Rhode Island Hospital, Providence, Rhode Island.
| | - Andrew Aoun
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia, University Medical Center, New York, New York
| | - Saminah Munshi
- Department of Diagnostic Imaging, The Warren Alpert Medical School of Brown University/Rhode Island Hospital, Providence, Rhode Island
| | - Hazem Abdel-Megid
- Department of Diagnostic Imaging, The Warren Alpert Medical School of Brown University/Rhode Island Hospital, Providence, Rhode Island
| | - Lleayem Nazario-Johnson
- Department of Diagnostic Imaging, The Warren Alpert Medical School of Brown University/Rhode Island Hospital, Providence, Rhode Island
| | - Sun Ho Ahn
- Professor of Diagnostic Imaging; Interventional Radiology Integrated Residency Program Director, Medical Student Radiology Education Co-Director, Department of Diagnostic Imaging, The Warren Alpert Medical School of Brown University/Rhode Island Hospital, Providence, Rhode Island
| |
Collapse
|
5
|
Koranteng E, Rao A, Flores E, Lev M, Landman A, Dreyer K, Succi M. Empathy and Equity: Key Considerations for Large Language Model Adoption in Health Care. JMIR MEDICAL EDUCATION 2023; 9:e51199. [PMID: 38153778 PMCID: PMC10884892 DOI: 10.2196/51199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 10/01/2023] [Accepted: 10/14/2023] [Indexed: 12/29/2023]
Abstract
The growing presence of large language models (LLMs) in health care applications holds significant promise for innovative advancements in patient care. However, concerns about ethical implications and potential biases have been raised by various stakeholders. Here, we evaluate the ethics of LLMs in medicine along 2 key axes: empathy and equity. We outline the importance of these factors in novel models of care and develop frameworks for addressing these alongside LLM deployment.
Collapse
Affiliation(s)
| | - Arya Rao
- Harvard Medical School, Boston, MA, United States
| | - Efren Flores
- Harvard Medical School, Boston, MA, United States
| | - Michael Lev
- Harvard Medical School, Boston, MA, United States
| | - Adam Landman
- Harvard Medical School, Boston, MA, United States
| | - Keith Dreyer
- Harvard Medical School, Boston, MA, United States
| | - Marc Succi
- Massachusetts General Hospital, Boston, United States
| |
Collapse
|