1
|
Ozkara BB, Boutet A, Comstock BA, Van Goethem J, Huisman TAGM, Ross JS, Saba L, Shah LM, Wintermark M, Castillo M. Artificial Intelligence-Generated Editorials in Radiology: Can Expert Editors Detect Them? AJNR Am J Neuroradiol 2025; 46:559-566. [PMID: 39288967 PMCID: PMC11979811 DOI: 10.3174/ajnr.a8505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 09/16/2024] [Indexed: 09/20/2024]
Abstract
BACKGROUND AND PURPOSE Artificial intelligence is capable of generating complex texts that may be indistinguishable from those written by humans. We aimed to evaluate the ability of GPT-4 to write radiology editorials and to compare these with human-written counterparts, thereby determining their real-world applicability for scientific writing. MATERIALS AND METHODS Sixteen editorials from 8 journals were included. To generate the artificial intelligence (AI)-written editorials, the summary of 16 human-written editorials was fed into GPT-4. Six experienced editors reviewed the articles. First, an unpaired approach was used. The raters were asked to evaluate the content of each article by using a 1-5 Likert scale across specified metrics. Then, they determined whether the editorials were written by humans or AI. The articles were then evaluated in pairs to determine which article was generated by AI and which should be published. Finally, the articles were analyzed with an AI detector and for plagiarism. RESULTS The human-written articles had a median AI probability score of 2.0%, whereas the AI-written articles had 58%. The median similarity score among AI-written articles was 3%. Fifty-eight percent of unpaired articles were correctly classified regarding authorship. Rating accuracy was increased to 70% in the paired setting. AI-written articles received slightly higher scores in most metrics. When stratified by perception, human-written perceived articles were rated higher in most categories. In the paired setting, raters strongly preferred publishing the article they perceived as human-written (82%). CONCLUSIONS GPT-4 can write high-quality articles that iThenticate does not flag as plagiarized, which may go undetected by editors, and that detection tools can detect to a limited extent. Editors showed a positive bias toward human-written articles.
Collapse
Affiliation(s)
- Burak Berksu Ozkara
- From the Department of Neuroradiology (B.B.O., M.W.), The University of Texas MD Anderson Center, Houston, Texas
| | - Alexandre Boutet
- Joint Department of Medical Imaging (A.B.), University of Toronto, Toronto, Ontario, Canada
| | - Bryan A Comstock
- Department of Biostatistics (B.A.C.), University of Washington, Seattle, Washington
| | - Johan Van Goethem
- Department of Radiology (J.V.G.), Antwerp University Hospital, Antwerp, Belgium
| | - Thierry A G M Huisman
- Department of Radiology (T.A.G.M.H.), Texas Children's Hospital and Baylor College of Medicine, Houston, Texas
| | - Jeffrey S Ross
- Department of Radiology (J.S.R.), Mayo Clinic Arizona, Phoenix, Arizona
| | - Luca Saba
- Department of Radiology (L.S.), University of Cagliari, Cagliari, Italy
| | - Lubdha M Shah
- Department of Radiology (L.M.S.), University of Utah, Salt Lake City, Utah
| | - Max Wintermark
- From the Department of Neuroradiology (B.B.O., M.W.), The University of Texas MD Anderson Center, Houston, Texas
| | - Mauricio Castillo
- Department of Radiology (M.C.), University of North Carolina School of Medicine, Chapel Hill, North Carolina
| |
Collapse
|
2
|
Kim MS, Chung P, Aghaeepour N, Kim N. Information Extraction from Clinical Texts with Generative Pre-trained Transformer Models. Int J Med Sci 2025; 22:1015-1028. [PMID: 40027192 PMCID: PMC11866537 DOI: 10.7150/ijms.103332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Accepted: 01/24/2025] [Indexed: 03/05/2025] Open
Abstract
Purpose: Processing and analyzing clinical texts are challenging due to its unstructured nature. This study compared the performance of GPT (Generative Pre-trained Transformer)-3.5 and GPT-4 for extracting information from clinical text. Materials and Methods: Three types of clinical texts, containing patient characteristics, medical history, and clinical test results extracted from case reports in open-access journals were utilized as input. Simple prompts containing queries for information extraction were then applied to both models using the Greedy Approach as the decoding strategy. When GPT models underperformed in certain tasks, we applied alternative decoding strategies or incorporated prompts with task-specific definitions. The outputs generated by GPT models were evaluated as True or False to determine the accuracy of information extraction. Results: Clinical texts containing patient characteristics (60 texts), medical history (50 texts), and clinical test results (25 texts) were extracted from 60 case reports. GPT models could extract information accurately with simple prompts to extract straightforward information from clinical texts. Regarding sex, GPT-4 demonstrated a significantly higher accuracy rate (95%) compared to GPT-3.5 (70%). GPT-3.5 (78%) outperformed GPT-4 (57%) in extracting body mass index (BMI). Utilizing alternative decoding strategies to sex and BMI did not practically improve the performance of the two models. In GPT-4, the revised prompts, including definitions of each sex category or the BMI formula, rectified all incorrect responses regarding sex and BMI generated during the main workflow. Conclusion: GPT models could perform adequately with simple prompts for extracting straightforward information. For complex tasks, incorporating task-specific definitions into the prompts is a suitable strategy than relying solely on simple prompts. Therefore, researchers and clinicians should use their expertise to create effective prompts and monitor LLM outcomes when extracting complex information from clinical texts.
Collapse
Affiliation(s)
- Min-Soo Kim
- Department of Anesthesiology and Pain Medicine, Anesthesia and Pain Research Institute, Yonsei University College of Medicine, Seoul, Republic of Korea
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Philip Chung
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Namo Kim
- Department of Anesthesiology and Pain Medicine, Anesthesia and Pain Research Institute, Yonsei University College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
3
|
Kalidindi S, Baradwaj J. Advancing radiology with GPT-4: Innovations in clinical applications, patient engagement, research, and learning. Eur J Radiol Open 2024; 13:100589. [PMID: 39170856 PMCID: PMC11337693 DOI: 10.1016/j.ejro.2024.100589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Revised: 06/30/2024] [Accepted: 07/08/2024] [Indexed: 08/23/2024] Open
Abstract
The rapid evolution of artificial intelligence (AI) in healthcare, particularly in radiology, underscores a transformative era marked by a potential for enhanced diagnostic precision, increased patient engagement, and streamlined clinical workflows. Amongst the key developments at the heart of this transformation are Large Language Models like the Generative Pre-trained Transformer 4 (GPT-4), whose integration into radiological practices could potentially herald a significant leap by assisting in the generation and summarization of radiology reports, aiding in differential diagnoses, and recommending evidence-based treatments. This review delves into the multifaceted potential applications of Large Language Models within radiology, using GPT-4 as an example, from improving diagnostic accuracy and reporting efficiency to translating complex medical findings into patient-friendly summaries. The review acknowledges the ethical, privacy, and technical challenges inherent in deploying AI technologies, emphasizing the importance of careful oversight, validation, and adherence to regulatory standards. Through a balanced discourse on the potential and pitfalls of GPT-4 in radiology, the article aims to provide a comprehensive overview of how these models have the potential to reshape the future of radiological services, fostering improvements in patient care, educational methodologies, and clinical research.
Collapse
|
4
|
Artsi Y, Sorin V, Glicksberg BS, Nadkarni GN, Klang E. Advancing Clinical Practice: The Potential of Multimodal Technology in Modern Medicine. J Clin Med 2024; 13:6246. [PMID: 39458196 PMCID: PMC11508674 DOI: 10.3390/jcm13206246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 10/15/2024] [Accepted: 10/17/2024] [Indexed: 10/28/2024] Open
Abstract
Multimodal technology is poised to revolutionize clinical practice by integrating artificial intelligence with traditional diagnostic modalities. This evolution traces its roots from Hippocrates' humoral theory to the use of sophisticated AI-driven platforms that synthesize data across multiple sensory channels. The interplay between historical medical practices and modern technology challenges conventional patient-clinician interactions and redefines diagnostic accuracy. Highlighting applications from neurology to radiology, the potential of multimodal technology emerges, suggesting a future where AI not only supports but enhances human sensory inputs in medical diagnostics. This shift invites the medical community to navigate the ethical, practical, and technological changes reshaping the landscape of clinical medicine.
Collapse
Affiliation(s)
- Yaara Artsi
- Azrieli Faculty of Medicine, Bar-Ilan University, Zefat 1311502, Israel
| | - Vera Sorin
- Department of Radiology, Mayo Clinic, Rochester, MN 55905, USA;
| | - Benjamin S. Glicksberg
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; (B.S.G.); (G.N.N.); (E.K.)
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Girish N. Nadkarni
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; (B.S.G.); (G.N.N.); (E.K.)
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; (B.S.G.); (G.N.N.); (E.K.)
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
5
|
Zangrossi P, Martini M, Guerrini F, DE Bonis P, Spena G. Large language model, AI and scientific research: why ChatGPT is only the beginning. J Neurosurg Sci 2024; 68:216-224. [PMID: 38261307 DOI: 10.23736/s0390-5616.23.06171-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
ChatGPT, a conversational artificial intelligence model based on the generative pre-trained transformer GPT architecture, has garnered widespread attention due to its user-friendly nature and diverse capabilities. This technology enables users of all backgrounds to effortlessly engage in human-like conversations and receive coherent and intelligible responses. Beyond casual interactions, ChatGPT offers compelling prospects for scientific research, facilitating tasks like literature review and content summarization, ultimately expediting and enhancing the academic writing process. Still, in the field of medicine and surgery, it has already shown its endless potential in many tasks (enhancing decision-making processes, aiding in surgical planning and simulation, providing real-time assistance during surgery, improving postoperative care and rehabilitation, contributing to training, education, research, and development). However, it is crucial to acknowledge the model's limitations, encompassing knowledge constraints and the potential for erroneous responses, as well as ethical and legal considerations. This paper explores the potential benefits and pitfalls of these innovative technologies in scientific research, shedding light on their transformative impact while addressing concerns surrounding their use.
Collapse
Affiliation(s)
- Pietro Zangrossi
- Department of Neurosurgery, Sant'Anna University Hospital, Ferrara, Italy -
- Department of Translational Medicine, University of Ferrara, Ferrara, Italy -
| | - Massimo Martini
- R&D Department, Gate-away.com, Grottammare, Ascoli Piceno, Italy
| | - Francesco Guerrini
- Department of Neurosurgery, San Matteo Polyclinic IRCCS Foundation, Pavia, Italy
| | - Pasquale DE Bonis
- Department of Neurosurgery, Sant'Anna University Hospital, Ferrara, Italy
- Department of Translational Medicine, University of Ferrara, Ferrara, Italy
- Unit of Minimally Invasive Neurosurgery, Ferrara University Hospital, Ferrara, Italy
| | - Giannantonio Spena
- Department of Neurosurgery, San Matteo Polyclinic IRCCS Foundation, Pavia, Italy
| |
Collapse
|
6
|
Lee TL, Ding J, Trivedi HM, Gichoya JW, Moon JT, Li H. Understanding Radiological Journal Views and Policies on Large Language Models in Academic Writing. J Am Coll Radiol 2024; 21:678-682. [PMID: 37558108 DOI: 10.1016/j.jacr.2023.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 07/31/2023] [Accepted: 08/05/2023] [Indexed: 08/11/2023]
Affiliation(s)
- Tai-Lin Lee
- Department of Radiology and Imaging Science, Emory University School of Medicine, Atlanta, Georgia. https://twitter.com/heyttymonica
| | - Julia Ding
- Emory University School of Medicine, Atlanta, Georgia. https://twitter.com/_juliading
| | - Hari M Trivedi
- Co-Director of Healthcare Innovations and Translational Informatics Lab, Emory University School of Medicine, Atlanta, Georgia. https://twitter.com/HariTrivediMD
| | - Judy W Gichoya
- Co-Director of Healthcare Innovations and Translational Informatics Lab, Emory University School of Medicine, Atlanta, Georgia. https://twitter.com/judywawira
| | - John T Moon
- Founder and PI of Biodesign & Innovation of Minimally-Invasive Technologies Lab, Emory University School of Medicine, Atlanta, Georgia. https://twitter.com/johntmoon
| | - Hanzhou Li
- Department of Radiology and Imaging Science, Emory University School of Medicine, Atlanta, Georgia.
| |
Collapse
|
7
|
Sorin V, Glicksberg BS, Artsi Y, Barash Y, Konen E, Nadkarni GN, Klang E. Utilizing large language models in breast cancer management: systematic review. J Cancer Res Clin Oncol 2024; 150:140. [PMID: 38504034 PMCID: PMC10950983 DOI: 10.1007/s00432-024-05678-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 03/01/2024] [Indexed: 03/21/2024]
Abstract
PURPOSE Despite advanced technologies in breast cancer management, challenges remain in efficiently interpreting vast clinical data for patient-specific insights. We reviewed the literature on how large language models (LLMs) such as ChatGPT might offer solutions in this field. METHODS We searched MEDLINE for relevant studies published before December 22, 2023. Keywords included: "large language models", "LLM", "GPT", "ChatGPT", "OpenAI", and "breast". The risk bias was evaluated using the QUADAS-2 tool. RESULTS Six studies evaluating either ChatGPT-3.5 or GPT-4, met our inclusion criteria. They explored clinical notes analysis, guideline-based question-answering, and patient management recommendations. Accuracy varied between studies, ranging from 50 to 98%. Higher accuracy was seen in structured tasks like information retrieval. Half of the studies used real patient data, adding practical clinical value. Challenges included inconsistent accuracy, dependency on the way questions are posed (prompt-dependency), and in some cases, missing critical clinical information. CONCLUSION LLMs hold potential in breast cancer care, especially in textual information extraction and guideline-driven clinical question-answering. Yet, their inconsistent accuracy underscores the need for careful validation of these models, and the importance of ongoing supervision.
Collapse
Affiliation(s)
- Vera Sorin
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Affiliated to the Sackler School of Medicine, Tel-Aviv University, Emek Haela St. 1, 52621, Ramat Gan, Israel.
- DeepVision Lab, Chaim Sheba Medical Center, Tel Hashomer, Israel.
| | - Benjamin S Glicksberg
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yaara Artsi
- Azrieli Faculty of Medicine, Bar-Ilan University, Zefat, Israel
| | - Yiftach Barash
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Affiliated to the Sackler School of Medicine, Tel-Aviv University, Emek Haela St. 1, 52621, Ramat Gan, Israel
- DeepVision Lab, Chaim Sheba Medical Center, Tel Hashomer, Israel
| | - Eli Konen
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Affiliated to the Sackler School of Medicine, Tel-Aviv University, Emek Haela St. 1, 52621, Ramat Gan, Israel
| | - Girish N Nadkarni
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|