1
|
Wei S, Hu A, Liang Y, Yang J, Yu L, Li W, Yang B, Qiu J. Feasibility study of automatic radiotherapy treatment planning for cervical cancer using a large language model. Radiat Oncol 2025; 20:77. [PMID: 40375332 PMCID: PMC12083153 DOI: 10.1186/s13014-025-02660-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Accepted: 05/06/2025] [Indexed: 05/18/2025] Open
Abstract
BACKGROUND Radiotherapy treatment planning traditionally involves complex and time-consuming processes, often relying on trial-and-error methods. The emergence of artificial intelligence, particularly Large Language Models (LLMs), surpassing human capabilities and existing algorithms in various domains, presents an opportunity to automate and enhance this optimization process. PURPOSE This study seeks to evaluate the capacity of LLMs to generate radiotherapy treatment plans comparable to those crafted by human medical physicists, focusing on target volume conformity and organs-at-risk (OARs) dose sparing. The goal is to automate the optimization process of radiotherapy treatment plans through the utilization of LLMs. METHODS Multiple LLMs were employed to adjust optimization parameters for radiotherapy treatment plans, using a dataset comprising 35 cervical cancer patients treated with volumetric modulated arc therapy (VMAT). Customized prompts were applied to 5 patients to tailor the LLMs, which were subsequently tested on 30 patients. Evaluation metrics included target volume conformity, dose homogeneity, monitor units (MU) value, and OARs dose sparing, comparing plans generated by various LLMs to manual plans. RESULTS With the exception of Gemini-1.5-flash, which faced challenges due to hallucinations, Qwen-2.5-max and Llama-3.2 produced acceptable VMAT plans in 16.3 ± 5.0 and 9.8 ± 2.1 min, respectively, outperforming an experienced human physicist's time cost of about 20 min. The average conformity index (CI) for Qwen-2.5-max plans, Llama-3.2 plans, and manual plans on the test set were 0.929 ± 0.007, 0.928 ± 0.007, and 0.926 ± 0.007, respectively. The average homogeneity index (HI) was 0.058 ± 0.006, 0.059 ± 0.005, and 0.065 ± 0.006, respectively. While there was a significant difference in target volume conformity between LLM plans and manual plans, OARs dose sparing showed no significant variations. In lateral comparisons among different LLMs, no statistically significant differences were observed in the PTV dose, OARs dose sparing, and target volume conformity between Qwen-2.5-max and Llama-3.2 plans. CONCLUSIONS Through an assessment of LLM-generated plans and clinical plans in terms of target volume conformity and OARs dose sparing, this study provides preliminary evidence supporting the viability of LLMs for optimizing radiotherapy treatment plans. The implementation of LLMs demonstrates the potential for enhancing clinical workflows and reducing the workload associated with treatment planning.
Collapse
Affiliation(s)
- Shuoyang Wei
- Department of Radiotherapy, Peking Union Medical College Hospital, Beijing, 100730, China
| | - Ankang Hu
- Department of Engineering Physics, Tsinghua University, Beijing, 100084, China
- Key Laboratory of Particle & Radiation Imaging (Tsinghua University), Ministry of Education, Beijing, 100084, China
| | - Yongguang Liang
- Department of Radiotherapy, Peking Union Medical College Hospital, Beijing, 100730, China
| | - Jingru Yang
- Department of Radiotherapy, Peking Union Medical College Hospital, Beijing, 100730, China
| | - Lang Yu
- Department of Radiotherapy, Peking Union Medical College Hospital, Beijing, 100730, China
| | - Wenbo Li
- Department of Radiotherapy, Peking Union Medical College Hospital, Beijing, 100730, China
| | - Bo Yang
- Department of Radiotherapy, Peking Union Medical College Hospital, Beijing, 100730, China.
| | - Jie Qiu
- Department of Radiotherapy, Peking Union Medical College Hospital, Beijing, 100730, China
| |
Collapse
|
2
|
Chuang WK, Kao YS, Liu YT, Lee CY. Assessing ChatGPT for clinical decision-making in radiation oncology, with open-ended questions and images. Pract Radiat Oncol 2025:S1879-8500(25)00115-8. [PMID: 40311921 DOI: 10.1016/j.prro.2025.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2025] [Revised: 04/10/2025] [Accepted: 04/23/2025] [Indexed: 05/03/2025]
Abstract
PURPOSE This study assesses the practicality and correctness of ChatGPT-4 and 4O's answers to clinical inquiries in radiation oncology, and evaluates ChatGPT-4O for staging NPC cases with MR images. METHODS 164 open-ended questions covering representative professional domains (Clinical_G: knowledge on standardized guidelines; Clinical_C: complex clinical scenarios; Nursing: nursing and health education; Technology: Radiation technology and dosimetry) were prospectively formulated by experts and presented to ChatGPT-4 and 4O. Each ChatGPT's answer was graded as 1 (Directly practical for clinical decision-making), 2 (Correct but inadequate), 3 (Mixed with correct and incorrect information), or 4 (Completely incorrect). ChatGPT-4O was presented with the representative diagnostic MR images of 20 NPC patients across different T stages, and asked to determine the T stage of each case. RESULTS The proportions of ChatGPT's answers that were practical (Grade 1) varied across professional domains (p<0.01), higher in Nursing (GPT-4: 91.9%; GPT-4O: 94.6%) and Clinical_G (GPT-4: 82.2%; GPT-4O: 88.9%) domains than in Clinical_C (GPT-4: 54.1%; GPT-4O: 62.2%) and Technology (GPT-4: 64.4%; GPT-4O: 77.8%) domains. The proportions of correct (Grade 1+2) answers (GPT-4: 89.6%; GPT-4O: 98.8%; p<0.01) were universally high across all professional domains. However, ChatGPT-4O failed to stage NPC cases via MR images, indiscriminately assigning T4 to all actually non-T4 cases (κ=0; 95% C.I. :-0.253∼0.253). CONCLUSIONS ChatGPT could be a safe clinical decision-support tool in radiation oncology, as it correctly answered the vast majority of clinical inquiries across professional domains. However, its clinical practicality should be cautiously weighted particularly in the Clinical_C and Technology domains. ChatGPT-4O is not yet mature to interpret diagnostic images for cancer staging.
Collapse
Affiliation(s)
- Wei-Kai Chuang
- Department of Radiation Oncology, Shuang Ho Hospital, Taipei Medical University, New Taipei City 235, Taiwan; Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei 112, Taiwan
| | - Yung-Shuo Kao
- Department of Radiation Oncology, Taoyuan General Hospital, Ministry of Health and Welfare, Taoyuan 330, Taiwan
| | - Yen-Ting Liu
- Division of Radiation Oncology, Department of Oncology, National Taiwan University Hospital Yunlin Branch, Yunlin County 632, Taiwan; Department of Biomedical Engineering, National Taiwan University, Taipei 100, Taiwan; Division of Radiation Oncology, Department of Oncology, National Taiwan University Hospital, Taipei 100, Taiwan.
| | - Cho-Yin Lee
- Department of Radiation Oncology, Taoyuan General Hospital, Ministry of Health and Welfare, Taoyuan 330, Taiwan,; Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei 112, Taiwan.
| |
Collapse
|
3
|
Piras A, Mastroleo F, Colciago RR, Morelli I, D'Aviero A, Longo S, Grassi R, Iorio GC, De Felice F, Boldrini L, Desideri I, Salvestrini V. How Italian radiation oncologists use ChatGPT: a survey by the young group of the Italian association of radiotherapy and clinical oncology (yAIRO). LA RADIOLOGIA MEDICA 2025; 130:453-462. [PMID: 39690359 DOI: 10.1007/s11547-024-01945-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 12/11/2024] [Indexed: 12/19/2024]
Abstract
PURPOSE To investigate the awareness and the spread of ChatGPT and its possible role in both scientific research and clinical practice among the young radiation oncologists (RO). MATERIAL AND METHODS An anonymous, online survey via Google Forms (including 24 questions) was distributed among young (< 40 years old) ROs in Italy through the yAIRO network, from March 15, 2024, to 31, 2024. These ROs were officially registered with yAIRO in 2023. We particularly focused on the emerging use of ChatGPT and its future perspectives in clinical practice. RESULTS A total of 76 young physicians answered the survey. Seventy-three participants declared to be familiar with ChatGPT, and 71.1% of the surveyed physicians have already used ChatGPT. Thirty-one (40.8%) participants strongly agreed that AI has the potential to change the medical landscape in the future. Additionally, 79.1% of respondents agreed that AI will be mainly successful in research processes such as literature review and drafting articles/protocols. The belief in ChatGPT's potential results in direct use in daily practice in 43.4% of the cases, with mainly a fair grade of satisfaction (43.2%). A large part of participants (69.7%) believes in the implementation of ChatGPT into clinical practice, even though 53.9% fear an overall negative impact. CONCLUSIONS The results of the present survey clearly highlight the attitude of young Italian ROs toward the implementation of ChatGPT into clinical and academic RO practice. ChatGPT is considered a valuable and effective tool that can ease current and future workflows.
Collapse
Affiliation(s)
- Antonio Piras
- UO Radioterapia Oncologica, Villa Santa Teresa, 90011, Bagheria, Palermo, Italy
- Ri.Med Foundation, 90133, Palermo, Italy
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, Molecular and Clinical Medicine, University of Palermo, 90127, Palermo, Italy
- Radiation Oncology, Mater Olbia Hospital, Olbia, Sassari, Italy
| | - Federico Mastroleo
- Division of Radiation Oncology, IEO European Institute of Oncology IRCCS, 20141, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, 20141, Milan, Italy
| | - Riccardo Ray Colciago
- School of Medicine and Surgery, University of Milano Bicocca, Piazza Dell'Ateneo Nuovo, 1, 20126, Milan, Italy.
| | - Ilaria Morelli
- Radiation Oncology Unit, Department of Experimental and Clinical Biomedical Sciences, Azienda Ospedaliero-Universitaria Careggi, University of Florence, Florence, Italy
| | - Andrea D'Aviero
- Department of Radiation Oncology, "S.S Annunziata" Chieti Hospital, Chieti, Italy
- Department of Medical, Oral and Biotechnogical Sciences, "G.D'Annunzio" University of Chieti, Chieti, Italy
| | - Silvia Longo
- UOC Radioterapia Oncologica, Fondazione Policlinico Universitario "A. Gemelli" IRCCS, Rome, Italy
| | - Roberta Grassi
- Department of Precision Medicine, University of Campania "L. Vanvitelli", Naples, Italy
| | | | - Francesca De Felice
- Radiation Oncology, Policlinico Umberto I, Department of Radiological, Oncological and Pathological Sciences, "Sapienza" University of Rome, Rome, Italy
| | - Luca Boldrini
- UOC Radioterapia Oncologica, Fondazione Policlinico Universitario "A. Gemelli" IRCCS, Rome, Italy
- Università Cattolica del Sacro Cuore, Rome, Italy
| | - Isacco Desideri
- Radiation Oncology Unit, Department of Experimental and Clinical Biomedical Sciences, Azienda Ospedaliero-Universitaria Careggi, University of Florence, Florence, Italy
| | - Viola Salvestrini
- Radiation Oncology Unit, Department of Experimental and Clinical Biomedical Sciences, Azienda Ospedaliero-Universitaria Careggi, University of Florence, Florence, Italy
| |
Collapse
|
4
|
Busch F, Kaibel L, Nguyen H, Lemke T, Ziegelmayer S, Graf M, Marka AW, Endrös L, Prucker P, Spitzl D, Mergen M, Makowski MR, Bressem KK, Petzoldt S, Adams LC, Landgraf T. Evaluation of a Retrieval-Augmented Generation-Powered Chatbot for Pre-CT Informed Consent: a Prospective Comparative Study. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01483-w. [PMID: 40119020 DOI: 10.1007/s10278-025-01483-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2025] [Revised: 02/23/2025] [Accepted: 03/11/2025] [Indexed: 03/24/2025]
Abstract
This study aims to investigate the feasibility, usability, and effectiveness of a Retrieval-Augmented Generation (RAG)-powered Patient Information Assistant (PIA) chatbot for pre-CT information counseling compared to the standard physician consultation and informed consent process. This prospective comparative study included 86 patients scheduled for CT imaging between November and December 2024. Patients were randomly assigned to either the PIA group (n = 43), who received pre-CT information via the PIA chat app, or the control group (n = 43), with standard doctor-led consultation. Patient satisfaction, information clarity and comprehension, and concerns were assessed using six ten-point Likert-scale questions after information counseling with PIA or the doctor's consultation. Additionally, consultation duration was measured, and PIA group patients were asked about their preference for pre-CT consultation, while two radiologists rated each PIA chat in five categories. Both groups reported similarly high ratings for information clarity (PIA: 8.64 ± 1.69; control: 8.86 ± 1.28; p = 0.82) and overall comprehension (PIA: 8.81 ± 1.40; control: 8.93 ± 1.61; p = 0.35). However, the doctor consultation group showed greater effectiveness in alleviating patient concerns (8.30 ± 2.63 versus 6.46 ± 3.29; p = 0.003). The PIA group demonstrated significantly shorter subsequent consultation times (median: 120 s [interquartile range (IQR): 100-140] versus 195 s [IQR: 170-220]; p = 0.04). Both radiologists rated overall quality, scientific and clinical evidence, clinical usefulness and relevance, consistency, and up-to-dateness of PIA high. The RAG-powered PIA effectively provided pre-CT information while significantly reducing physician consultation time. While both methods achieved comparable patient satisfaction and comprehension, physicians were more effective at addressing worries or concerns regarding the examination.
Collapse
Affiliation(s)
- Felix Busch
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum Rechts Der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany.
| | - Lukas Kaibel
- Institute for Computer Science, Free University of Berlin, Berlin, Germany
| | - Hai Nguyen
- Institute for Computer Science, Free University of Berlin, Berlin, Germany
| | - Tristan Lemke
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum Rechts Der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Sebastian Ziegelmayer
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum Rechts Der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Markus Graf
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum Rechts Der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Alexander W Marka
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum Rechts Der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Lukas Endrös
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum Rechts Der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Philipp Prucker
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum Rechts Der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Daniel Spitzl
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum Rechts Der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Markus Mergen
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum Rechts Der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Marcus R Makowski
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum Rechts Der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Keno K Bressem
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum Rechts Der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
- School of Medicine and Health, Institute for Cardiovascular Radiology and Nuclear Medicine, German Heart Center Munich, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Sebastian Petzoldt
- Clinic for General, Visceral and Minimally Invasive Surgery, DRK Kliniken Berlin Köpenick, Berlin, Germany
| | - Lisa C Adams
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum Rechts Der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Tim Landgraf
- Institute for Computer Science, Free University of Berlin, Berlin, Germany.
| |
Collapse
|
5
|
Alfonzetti T, Xia J. Transforming the Landscape of Clinical Information Retrieval Using Generative Artificial Intelligence: An Application in Machine Fault Analysis. Pract Radiat Oncol 2025:S1879-8500(25)00058-X. [PMID: 40024439 DOI: 10.1016/j.prro.2025.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 02/07/2025] [Accepted: 02/13/2025] [Indexed: 03/04/2025]
Abstract
In a radiation oncology clinic, machine downtime can be a serious burden to the entire department. This study investigates using increasingly popular generative artificial intelligence (AI) techniques to assist medical physicists in troubleshooting linear accelerator issues. Google's NotebookLM, supplemented with background information on linear accelerator issues/solutions, was used as a machine troubleshooting assistant for this purpose. Two board-certified medical physicists evaluated the large language model's responses based on hallucination, relevancy, correctness, and completeness. Results indicated that responses improved with increasing source data context and more specific prompt construction. Keeping risk mitigation and the inherent limitations of AI in mind, this work offers a viable, low-risk method to improve efficiency in radiation oncology. This work uses a "Machine Troubleshooting Assistance" application to provide an adaptable example of how radiation oncology clinics can begin using generative AI to enhance clinical efficiency.
Collapse
Affiliation(s)
- Tyler Alfonzetti
- Department of Radiation Oncology, Mount Sinai Hospital, New York, New York.
| | - Junyi Xia
- Department of Radiation Oncology, Mount Sinai Hospital, New York, New York
| |
Collapse
|
6
|
Busch F, Hoffmann L, Rueger C, van Dijk EH, Kader R, Ortiz-Prado E, Makowski MR, Saba L, Hadamitzky M, Kather JN, Truhn D, Cuocolo R, Adams LC, Bressem KK. Current applications and challenges in large language models for patient care: a systematic review. COMMUNICATIONS MEDICINE 2025; 5:26. [PMID: 39838160 PMCID: PMC11751060 DOI: 10.1038/s43856-024-00717-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 12/17/2024] [Indexed: 01/23/2025] Open
Abstract
BACKGROUND The introduction of large language models (LLMs) into clinical practice promises to improve patient education and empowerment, thereby personalizing medical care and broadening access to medical knowledge. Despite the popularity of LLMs, there is a significant gap in systematized information on their use in patient care. Therefore, this systematic review aims to synthesize current applications and limitations of LLMs in patient care. METHODS We systematically searched 5 databases for qualitative, quantitative, and mixed methods articles on LLMs in patient care published between 2022 and 2023. From 4349 initial records, 89 studies across 29 medical specialties were included. Quality assessment was performed using the Mixed Methods Appraisal Tool 2018. A data-driven convergent synthesis approach was applied for thematic syntheses of LLM applications and limitations using free line-by-line coding in Dedoose. RESULTS We show that most studies investigate Generative Pre-trained Transformers (GPT)-3.5 (53.2%, n = 66 of 124 different LLMs examined) and GPT-4 (26.6%, n = 33/124) in answering medical questions, followed by patient information generation, including medical text summarization or translation, and clinical documentation. Our analysis delineates two primary domains of LLM limitations: design and output. Design limitations include 6 second-order and 12 third-order codes, such as lack of medical domain optimization, data transparency, and accessibility issues, while output limitations include 9 second-order and 32 third-order codes, for example, non-reproducibility, non-comprehensiveness, incorrectness, unsafety, and bias. CONCLUSIONS This review systematically maps LLM applications and limitations in patient care, providing a foundational framework and taxonomy for their implementation and evaluation in healthcare settings.
Collapse
Affiliation(s)
- Felix Busch
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany.
| | - Lena Hoffmann
- Department of Neuroradiology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany
| | - Christopher Rueger
- Department of Neuroradiology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany
| | - Elon Hc van Dijk
- Department of Ophthalmology, Leiden University Medical Center, Leiden, The Netherlands
- Department of Ophthalmology, Sir Charles Gairdner Hospital, Perth, Australia
| | - Rawen Kader
- Division of Surgery and Interventional Sciences, University College London, London, United Kingdom
| | - Esteban Ortiz-Prado
- One Health Research Group, Faculty of Health Science, Universidad de Las Américas, Quito, Ecuador
| | - Marcus R Makowski
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Luca Saba
- Department of Radiology, Azienda Ospedaliero Universitaria (A.O.U.), Cagliari, Italy
| | - Martin Hadamitzky
- School of Medicine and Health, Institute for Cardiovascular Radiology and Nuclear Medicine, German Heart Center Munich, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Jakob Nikolas Kather
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany
| | - Renato Cuocolo
- Department of Medicine, Surgery and Dentistry, University of Salerno, Baronissi, Italy
| | - Lisa C Adams
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Keno K Bressem
- School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany
- School of Medicine and Health, Institute for Cardiovascular Radiology and Nuclear Medicine, German Heart Center Munich, TUM University Hospital, Technical University of Munich, Munich, Germany
| |
Collapse
|
7
|
Wong J, Kriegler C, Shrivastava A, Duimering A, Le C. Utility of Chatbot Literature Search in Radiation Oncology. JOURNAL OF CANCER EDUCATION : THE OFFICIAL JOURNAL OF THE AMERICAN ASSOCIATION FOR CANCER EDUCATION 2024:10.1007/s13187-024-02547-1. [PMID: 39673022 DOI: 10.1007/s13187-024-02547-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 11/23/2024] [Indexed: 12/15/2024]
Abstract
Artificial intelligence and natural language processing tools have shown promise in oncology by assisting with medical literature retrieval and providing patient support. The potential for these technologies to generate inaccurate yet seemingly correct information poses significant challenges. This study evaluates the effectiveness, benefits, and limitations of ChatGPT for clinical use in conducting literature reviews of radiation oncology treatments. This cross-sectional study used ChatGPT version 3.5 to generate literature searches on radiotherapy options for seven tumor sites, with prompts issued five times per site to generate up to 50 publications per tumor type. The publications were verified using the Scopus database and categorized as correct, irrelevant, or non-existent. Statistical analysis with one-way ANOVA compared the impact factors and citation counts across different tumor sites. Among the 350 publications generated, there were 44 correct, 298 non-existent, and 8 irrelevant papers. The average publication year of all generated papers was 2011, compared to 2009 for the correct papers. The average impact factor of all generated papers was 38.8, compared to 113.8 for the correct papers. There were significant differences in the publication year, impact factor, and citation counts between tumor sites for both correct and non-existent papers. Our study highlights both the potential utility and significant limitations of using AI, specifically ChatGPT 3.5, in radiation oncology literature reviews. The findings emphasize the need for verification of AI outputs, development of standardized quality assurance protocols, and continued research into AI biases to ensure reliable integration into clinical practice.
Collapse
Affiliation(s)
- Justina Wong
- Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Conley Kriegler
- Division of Radiation Oncology, Department of Oncology, University of Alberta, Cross Cancer Institute, 11560 University Ave, Edmonton, AB, T6G 1Z2, Canada
| | - Ananya Shrivastava
- Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Adele Duimering
- Division of Radiation Oncology, Department of Oncology, University of Alberta, Cross Cancer Institute, 11560 University Ave, Edmonton, AB, T6G 1Z2, Canada
| | - Connie Le
- Division of Radiation Oncology, Department of Oncology, University of Alberta, Cross Cancer Institute, 11560 University Ave, Edmonton, AB, T6G 1Z2, Canada.
| |
Collapse
|
8
|
Chow R, Hasan S, Zheng A, Gao C, Valdes G, Yu F, Chhabra A, Raman S, Choi JI, Lin H, Simone CB. The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions. J Am Coll Radiol 2024; 21:1800-1804. [PMID: 39098369 DOI: 10.1016/j.jacr.2024.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 07/22/2024] [Accepted: 07/22/2024] [Indexed: 08/06/2024]
Abstract
The aim of this study is to assess the accuracy of Chat Generative Pretrained Transformer (ChatGPT) in response to oncology examination questions in the setting of one-shot learning. Consecutive national radiation oncology in-service multiple-choice examinations were collected and inputted into ChatGPT 4o and ChatGPT 3.5 to determine ChatGPT's answers. ChatGPT's answers were then compared with the answer keys to determine whether ChatGPT correctly or incorrectly answered each question and to determine if improvements in responses were seen with the newer ChatGPT version. A total of 600 consecutive questions were inputted into ChatGPT. ChatGPT 4o answered 72.2% questions correctly, whereas 3.5 answered 53.8% questions correctly. There was a significant difference in performance by question category (P < .01). ChatGPT performed poorer with respect to knowledge of landmark studies and treatment recommendations and planning. ChatGPT is a promising technology, with the latest version showing marked improvement. Although it still has limitations, with further evolution, it may be considered a reliable resource for medical training and decision making in the oncology space.
Collapse
Affiliation(s)
- Ronald Chow
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada; New York Proton Center, New York, New York
| | | | - Ajay Zheng
- New York Proton Center, New York, New York
| | - Chenxi Gao
- New York Proton Center, New York, New York
| | - Gilmer Valdes
- University of California San Francisco School of Medicine, University of California San Francisco, San Francisco, California; Vice Chair, Department of Machine Learning, Moffitt Cancer Center, Tampa, Florida
| | - Francis Yu
- New York Proton Center, New York, New York
| | - Arpit Chhabra
- Director of Education, New York Proton Center, New York, New York
| | - Srinivas Raman
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - J Isabelle Choi
- Director of Research, New York Proton Center, New York, New York
| | - Haibo Lin
- Director of Medical Physics, New York Proton Center, New York, New York
| | - Charles B Simone
- Chief Medical Officer and Research Professor, New York Proton Center, New York, New York; Member, Memorial Sloan Kettering Cancer Center.
| |
Collapse
|
9
|
Ruiz Sarrias O, Martínez del Prado MP, Sala Gonzalez MÁ, Azcuna Sagarduy J, Casado Cuesta P, Figaredo Berjano C, Galve-Calvo E, López de San Vicente Hernández B, López-Santillán M, Nuño Escolástico M, Sánchez Togneri L, Sande Sardina L, Pérez Hoyos MT, Abad Villar MT, Zabalza Zudaire M, Sayar Beristain O. Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions. Cancers (Basel) 2024; 16:2830. [PMID: 39199603 PMCID: PMC11352281 DOI: 10.3390/cancers16162830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 08/04/2024] [Accepted: 08/09/2024] [Indexed: 09/01/2024] Open
Abstract
INTRODUCTION Large Language Models (LLMs), such as the GPT model family from OpenAI, have demonstrated transformative potential across various fields, especially in medicine. These models can understand and generate contextual text, adapting to new tasks without specific training. This versatility can revolutionize clinical practices by enhancing documentation, patient interaction, and decision-making processes. In oncology, LLMs offer the potential to significantly improve patient care through the continuous monitoring of chemotherapy-induced toxicities, which is a task that is often unmanageable for human resources alone. However, existing research has not sufficiently explored the accuracy of LLMs in identifying and assessing subjective toxicities based on patient descriptions. This study aims to fill this gap by evaluating the ability of LLMs to accurately classify these toxicities, facilitating personalized and continuous patient care. METHODS This comparative pilot study assessed the ability of an LLM to classify subjective toxicities from chemotherapy. Thirteen oncologists evaluated 30 fictitious cases created using expert knowledge and OpenAI's GPT-4. These evaluations, based on the CTCAE v.5 criteria, were compared to those of a contextualized LLM model. Metrics such as mode and mean of responses were used to gauge consensus. The accuracy of the LLM was analyzed in both general and specific toxicity categories, considering types of errors and false alarms. The study's results are intended to justify further research involving real patients. RESULTS The study revealed significant variability in oncologists' evaluations due to the lack of interaction with fictitious patients. The LLM model achieved an accuracy of 85.7% in general categories and 64.6% in specific categories using mean evaluations with mild errors at 96.4% and severe errors at 3.6%. False alarms occurred in 3% of cases. When comparing the LLM's performance to that of expert oncologists, individual accuracy ranged from 66.7% to 89.2% for general categories and 57.0% to 76.0% for specific categories. The 95% confidence intervals for the median accuracy of oncologists were 81.9% to 86.9% for general categories and 67.6% to 75.6% for specific categories. These benchmarks highlight the LLM's potential to achieve expert-level performance in classifying chemotherapy-induced toxicities. DISCUSSION The findings demonstrate that LLMs can classify subjective toxicities from chemotherapy with accuracy comparable to expert oncologists. The LLM achieved 85.7% accuracy in general categories and 64.6% in specific categories. While the model's general category performance falls within expert ranges, specific category accuracy requires improvement. The study's limitations include the use of fictitious cases, lack of patient interaction, and reliance on audio transcriptions. Nevertheless, LLMs show significant potential for enhancing patient monitoring and reducing oncologists' workload. Future research should focus on the specific training of LLMs for medical tasks, conducting studies with real patients, implementing interactive evaluations, expanding sample sizes, and ensuring robustness and generalization in diverse clinical settings. CONCLUSIONS This study concludes that LLMs can classify subjective toxicities from chemotherapy with accuracy comparable to expert oncologists. The LLM's performance in general toxicity categories is within the expert range, but there is room for improvement in specific categories. LLMs have the potential to enhance patient monitoring, enable early interventions, and reduce severe complications, improving care quality and efficiency. Future research should involve specific training of LLMs, validation with real patients, and the incorporation of interactive capabilities for real-time patient interactions. Ethical considerations, including data accuracy, transparency, and privacy, are crucial for the safe integration of LLMs into clinical practice.
Collapse
Affiliation(s)
- Oskitz Ruiz Sarrias
- Department of Mathematics and Statistic, NNBi 2020 SL, 31110 Noain, Navarra, Spain;
| | - María Purificación Martínez del Prado
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | - María Ángeles Sala Gonzalez
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | - Josune Azcuna Sagarduy
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | - Pablo Casado Cuesta
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | - Covadonga Figaredo Berjano
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | - Elena Galve-Calvo
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | - Borja López de San Vicente Hernández
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | - María López-Santillán
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | - Maitane Nuño Escolástico
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | - Laura Sánchez Togneri
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | - Laura Sande Sardina
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | - María Teresa Pérez Hoyos
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | - María Teresa Abad Villar
- Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
| | | | | |
Collapse
|
10
|
Panettieri V, Gagliardi G. Artificial Intelligence and the future of radiotherapy planning: The Australian radiation therapists prepare to be ready. J Med Radiat Sci 2024; 71:174-176. [PMID: 38641984 PMCID: PMC11177026 DOI: 10.1002/jmrs.791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 04/04/2024] [Indexed: 04/21/2024] Open
Abstract
The use of artificial intelligence (AI) solutions is rapidly changing the way radiation therapy tasks, traditionally relying on human skills, are approached by enabling fast automation. This evolution represents a paradigm shift in all aspects of the profession, particularly for treatment planning applications, opening up opportunities but also causing concerns for the future of the multidisciplinary team. In Australia, radiation therapists (RTs), largely responsible for both treatment planning and delivery, are discussing the impact of the introduction of AI and the potential developments in the future of their role. As medical physicists, who are part of the multidisciplinary team, in this editorial we reflect on the considerations of RTs, and on the implications of this transition to AI.
Collapse
Affiliation(s)
- Vanessa Panettieri
- Department of Physical SciencesPeter MacCallum Cancer CentreMelbourneVictoriaAustralia
- Sir Peter MacCallum Department of OncologyThe University of MelbourneMelbourneVictoriaAustralia
- Central Clinical SchoolMonash UniversityMelbourneVictoriaAustralia
- Department of Medical Imaging and Radiation SciencesMonash UniversityClaytonVictoriaAustralia
| | - Giovanna Gagliardi
- Medical Radiation Physics DepartmentKarolinska University HospitalStockholmSweden
- Department of Oncology‐PathologyKarolinska InstitutetStockholmSweden
| |
Collapse
|
11
|
Pandey VK, Munshi A, Mohanti BK, Bansal K, Rastogi K. Evaluating ChatGPT to test its robustness as an interactive information database of radiation oncology and to assess its responses to common queries from radiotherapy patients: A single institution investigation. Cancer Radiother 2024; 28:258-264. [PMID: 38866652 DOI: 10.1016/j.canrad.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 11/26/2023] [Accepted: 11/29/2023] [Indexed: 06/14/2024]
Abstract
PURPOSE Commercial vendors have created artificial intelligence (AI) tools for use in all aspects of life and medicine, including radiation oncology. AI innovations will likely disrupt workflows in the field of radiation oncology. However, limited data exist on using AI-based chatbots about the quality of radiation oncology information. This study aims to assess the accuracy of ChatGPT, an AI-based chatbot, in answering patients' questions during their first visit to the radiation oncology outpatient department and test knowledge of ChatGPT in radiation oncology. MATERIAL AND METHODS Expert opinion was formulated using a set of ten standard questions of patients encountered in outpatient department practice. A blinded expert opinion was taken for the ten questions on common queries of patients in outpatient department visits, and the same questions were evaluated on ChatGPT version 3.5 (ChatGPT 3.5). The answers by expert and ChatGPT were independently evaluated for accuracy by three scientific reviewers. Additionally, a comparison was made for the extent of similarity of answers between ChatGPT and experts by a response scoring for each answer. Word count and Flesch-Kincaid readability score and grade were done for the responses obtained from expert and ChatGPT. A comparison of the answers of ChatGPT and expert was done with a Likert scale. As a second component of the study, we tested the technical knowledge of ChatGPT. Ten multiple choice questions were framed with increasing order of difficulty - basic, intermediate and advanced, and the responses were evaluated on ChatGPT. Statistical testing was done using SPSS version 27. RESULTS After expert review, the accuracy of expert opinion was 100%, and ChatGPT's was 80% (8/10) for regular questions encountered in outpatient department visits. A noticeable difference was observed in word count and readability of answers from expert opinion or ChatGPT. Of the ten multiple-choice questions for assessment of radiation oncology database, ChatGPT had an accuracy rate of 90% (9 out of 10). One answer to a basic-level question was incorrect, whereas all answers to intermediate and difficult-level questions were correct. CONCLUSION ChatGPT provides reasonably accurate information about routine questions encountered in the first outpatient department visit of the patient and also demonstrated a sound knowledge of the subject. The result of our study can inform the future development of educational tools in radiation oncology and may have implications in other medical fields. This is the first study that provides essential insight into the potentially positive capabilities of two components of ChatGPT: firstly, ChatGPT's response to common queries of patients at OPD visits, and secondly, the assessment of the radiation oncology knowledge base of ChatGPT.
Collapse
Affiliation(s)
- V K Pandey
- Radiation Oncology, Manipal Hospital Dwarka, Delhi, India.
| | - A Munshi
- Radiation Oncology, Manipal Hospital Dwarka, Delhi, India
| | - B K Mohanti
- Radiation Oncology, Kalinga Institute of Medical Sciences, Bhubaneswar, Odisha, India
| | - K Bansal
- Radiation Oncology, Narayana Hospital, Gurugram, Haryana, India
| | - K Rastogi
- Radiation Oncology, Sterling Hospital, Gandhidham, Gujrat, India
| |
Collapse
|