1
|
Banyi N, Ma B, Amanian A, Bur A, Abdalkhani A. Applications of Natural Language Processing in Otolaryngology: A Scoping Review. Laryngoscope 2025. [PMID: 40309961 DOI: 10.1002/lary.32198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 02/17/2025] [Accepted: 03/14/2025] [Indexed: 05/02/2025]
Abstract
OBJECTIVE To review the current literature on the applications of natural language processing (NLP) within the field of otolaryngology. DATA SOURCES MEDLINE, EMBASE, SCOPUS, Cochrane Library, Web of Science, and CINAHL. METHODS The preferred reporting Items for systematic reviews and meta-analyzes extension for scoping reviews checklist was followed. Databases were searched from the date of inception up to Dec 26, 2023. Original articles on the application of language-based models to otolaryngology patient care and research, regardless of publication date, were included. The studies were classified under the 2011 Oxford CEBM levels of evidence. RESULTS One-hundred sixty-six papers with a median publication year of 2024 (range 1982, 2024) were included. Sixty-one percent (102/166) of studies used ChatGPT and were published in 2023 or 2024. Sixty studies used NLP for clinical education and decision support, 42 for patient education, 14 for electronic medical record improvement, 5 for triaging, 4 for trainee education, 4 for patient monitoring, 3 for telemedicine, and 1 for medical translation. For research, 37 studies used NLP for extraction, classification, or analysis of data, 17 for thematic analysis, 5 for evaluating scientific reporting, and 4 for manuscript preparation. CONCLUSION The role of NLP in otolaryngology is evolving, with ChatGPT passing OHNS board simulations, though its clinical application requires improvement. NLP shows potential in patient education and post-treatment monitoring. NLP is effective at extracting data from unstructured or large data sets. There is limited research on NLP in trainee education and administrative tasks. Guidelines for NLP use in research are critical.
Collapse
Affiliation(s)
- Norbert Banyi
- The University of British Columbia, Faculty of Medicine, Vancouver, Canada
| | - Brian Ma
- Department of Cellular & Physiological Sciences, University of British Columbia, Vancouver, Canada
| | - Ameen Amanian
- Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, University of British Columbia, Vancouver, Canada
| | - Andrés Bur
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Centre, Kansas City, Kansas, USA
| | - Arman Abdalkhani
- Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, University of British Columbia, Vancouver, Canada
| |
Collapse
|
2
|
Rao KN, Fernandez-Alvarez V, Guntinas-Lichius O, Sreeram MP, de Bree R, Kowalski LP, Forastiere A, Pace-Asciak P, Rodrigo JP, Saba NF, Ronen O, Florek E, Randolph GW, Sanabria A, Vermorken JB, Hanna EY, Ferlito A. The Limitations of Artificial Intelligence in Head and Neck Oncology. Adv Ther 2025:10.1007/s12325-025-03198-4. [PMID: 40299277 DOI: 10.1007/s12325-025-03198-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Accepted: 04/04/2025] [Indexed: 04/30/2025]
Abstract
Artificial intelligence (AI) is revolutionizing head and neck oncology, offering innovations in tumor detection, treatment planning, and patient management. However, its integration into clinical practice is hindered by several limitations. These include clinician mistrust due to a lack of understanding of AI mechanisms, biases in algorithm development, and the potential over-reliance on technology, which may undermine clinical expertise. Data-related challenges, such as inconsistent quality and limited representativeness of datasets, further complicate AI's application. Ethical, legal, and privacy concerns also pose significant barriers. Addressing these issues through transparent AI systems, clinician education, and clear regulations is essential for ensuring responsible, equitable use in head and neck oncology. This manuscript explores the limitations of AI in head and neck oncology.
Collapse
Affiliation(s)
- Karthik N Rao
- Department of Head and Neck Oncology, Sri Shankara Cancer Foundation, Bangalore, India.
| | - Veronica Fernandez-Alvarez
- Department of Vascular and Endovascular Surgery, Hospital Universitario Central de Asturias, Oviedo, Spain
| | | | - M P Sreeram
- Department of Head and Neck Oncology, Sri Shankara Cancer Foundation, Bangalore, India
| | - Remco de Bree
- Department of Head and Neck Surgical Oncology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Luiz P Kowalski
- Head and Neck Surgery and Otorhinolaryngology Department, A C Camargo Cancer Center, Sao Paulo, Brazil
- Head and Neck Surgery, Faculty of Medicine, University of Sao Paulo, Sao Paulo, Brazil
| | - Arlene Forastiere
- The Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, USA
| | - Pia Pace-Asciak
- Department of Otolaryngology, Head and Neck Surgery, University of Toronto, Toronto, ON, Canada
| | - Juan P Rodrigo
- Instituto Universitario de Oncología del Principado de Asturias, University of Oviedo, Oviedo, Spain
- Department of Otolaryngology, Hospital Universitario Central de Asturias, Oviedo, Spain
- CIBERONC, Madrid, Spain
| | - Nabil F Saba
- Department of Hematology and Medical Oncology, Winship Cancer Institute, Emory University School of Medicine, Atlanta, Georgia
| | - Ohad Ronen
- Department of Otolaryngology, Head and Neck Surgery, Galilee Medical Center, Affiliated with Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Ewa Florek
- Laboratory of Environmental Research, Department of Toxicology, Poznan University of Medical Sciences, 60-806, Poznan, Poland
| | - Gregory W Randolph
- Massachusetts Eye and Ear Infirmary, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Alvaro Sanabria
- Department of Surgery, School of Medicine, Universidad de Antioquia/Hospital Universitario San Vicente Fundación-CEXCA Centro de Excelencia en Enfermedades de Cabeza y Cuello, Medellín, Colombia
| | - Jan B Vermorken
- Department of Medical Oncology, Antwerp University Hospital, Edegem, Belgium
- Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium
| | - Ehab Y Hanna
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Alfio Ferlito
- Coordinator of the International Head and Neck Scientific Group, 35100, Padua, Italy
| |
Collapse
|
3
|
Chen D, Avison K, Alnassar S, Huang RS, Raman S. Medical accuracy of artificial intelligence chatbots in oncology: a scoping review. Oncologist 2025; 30:oyaf038. [PMID: 40285677 PMCID: PMC12032582 DOI: 10.1093/oncolo/oyaf038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2024] [Accepted: 01/03/2025] [Indexed: 04/29/2025] Open
Abstract
BACKGROUND Recent advances in large language models (LLM) have enabled human-like qualities of natural language competency. Applied to oncology, LLMs have been proposed to serve as an information resource and interpret vast amounts of data as a clinical decision-support tool to improve clinical outcomes. OBJECTIVE This review aims to describe the current status of medical accuracy of oncology-related LLM applications and research trends for further areas of investigation. METHODS A scoping literature search was conducted on Ovid Medline for peer-reviewed studies published since 2000. We included primary research studies that evaluated the medical accuracy of a large language model applied in oncology settings. Study characteristics and primary outcomes of included studies were extracted to describe the landscape of oncology-related LLMs. RESULTS Sixty studies were included based on the inclusion and exclusion criteria. The majority of studies evaluated LLMs in oncology as a health information resource in question-answer style examinations (48%), followed by diagnosis (20%) and management (17%). The number of studies that evaluated the utility of fine-tuning and prompt-engineering LLMs increased over time from 2022 to 2024. Studies reported the advantages of LLMs as an accurate information resource, reduction of clinician workload, and improved accessibility and readability of clinical information, while noting disadvantages such as poor reliability, hallucinations, and need for clinician oversight. DISCUSSION There exists significant interest in the application of LLMs in clinical oncology, with a particular focus as a medical information resource and clinical decision support tool. However, further research is needed to validate these tools in external hold-out datasets for generalizability and to improve medical accuracy across diverse clinical scenarios, underscoring the need for clinician supervision of these tools.
Collapse
Affiliation(s)
- David Chen
- Princess Margaret Hospital Cancer Centre, Radiation Medicine Program, Toronto, ON M5G 2C4, Canada
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 3K3, Canada
| | - Kate Avison
- Princess Margaret Hospital Cancer Centre, Radiation Medicine Program, Toronto, ON M5G 2C4, Canada
- Department of Systems Design Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Saif Alnassar
- Princess Margaret Hospital Cancer Centre, Radiation Medicine Program, Toronto, ON M5G 2C4, Canada
- Department of Systems Design Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Ryan S Huang
- Princess Margaret Hospital Cancer Centre, Radiation Medicine Program, Toronto, ON M5G 2C4, Canada
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 3K3, Canada
| | - Srinivas Raman
- Princess Margaret Hospital Cancer Centre, Radiation Medicine Program, Toronto, ON M5G 2C4, Canada
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON M5S 3K3, Canada
- Department of Radiation Oncology, University of Toronto, Toronto, ON M5T 1P5, Canada
- Department of Radiation Oncology, BC Cancer, Vancouver, BC V5Z 1G1, Canada
- Division of Radiation Oncology, University of British Columbia, Vancouver, BC V5Z 1M9, Canada
| |
Collapse
|
4
|
Vaira LA, Lechien JR, Abbate V, Gabriele G, Frosolini A, De Vito A, Maniaci A, Mayo‐Yáñez M, Boscolo‐Rizzo P, Saibene AM, Maglitto F, Salzano G, Califano G, Troise S, Chiesa‐Estomba CM, De Riu G. Enhancing AI Chatbot Responses in Health Care: The SMART Prompt Structure in Head and Neck Surgery. OTO Open 2025; 9:e70075. [PMID: 39822375 PMCID: PMC11736147 DOI: 10.1002/oto2.70075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 12/19/2024] [Accepted: 12/27/2024] [Indexed: 01/19/2025] Open
Abstract
Objective This study aims to evaluate the impact of prompt construction on the quality of artificial intelligence (AI) chatbot responses in the context of head and neck surgery. Study Design Observational and evaluative study. Setting An international collaboration involving 16 researchers from 11 European centers specializing in head and neck surgery. Methods A total of 24 questions, divided into clinical scenarios, theoretical questions, and patient inquiries, were developed. These questions were entered into ChatGPT-4o both with and without the use of a structured prompt format, known as SMART (Seeker, Mission, AI Role, Register, Targeted Question). The AI-generated responses were evaluated by experienced head and neck surgeons using the Quality Analysis of Medical Artificial Intelligence instrument (QAMAI), which assesses accuracy, clarity, relevance, completeness, source quality, and usefulness. Results The responses generated using the SMART prompt scored significantly higher across all QAMAI dimensions compared to those without contextualized prompts. Median QAMAI scores for SMART prompts were 27.5 (interquartile range [IQR] 25-29) versus 24 (IQR 21.8-25) for unstructured prompts (P < .001). Clinical scenarios and patient inquiries showed the most significant improvements, while theoretical questions also benefited, but to a lesser extent. The AI's source quality improved notably with the SMART prompt, particularly in theoretical questions. Conclusion This study suggests that the structured SMART prompt format significantly enhances the quality of AI chatbot responses in head and neck surgery. This approach improves the accuracy, relevance, and completeness of AI-generated information, underscoring the importance of well-constructed prompts in clinical applications. Further research is warranted to explore the applicability of SMART prompts across different medical specialties and AI platforms.
Collapse
Affiliation(s)
- Luigi Angelo Vaira
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and PharmacyUniversity of SassariSassariItaly
| | - Jerome R. Lechien
- Department of Surgery, Mons School of Medicine, UMONS Research Institute for Health Sciences and TechnologyUniversity of Mons (UMons)MonsBelgium
- Department of Otolaryngology–Head and Neck SurgeryElsan HospitalParisFrance
| | - Vincenzo Abbate
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological ScienceFederico II University of NaplesNaplesItaly
| | - Guido Gabriele
- Maxillofacial Surgery Unit, Department of Medical BiotechnologiesUniversity of SienaSienaItaly
| | - Andrea Frosolini
- Maxillofacial Surgery Unit, Department of Medical BiotechnologiesUniversity of SienaSienaItaly
| | - Andrea De Vito
- Department of Medicine, Surgery and PharmacyUniversity of SassariSassariItaly
| | - Antonino Maniaci
- Department of Medicine and SurgeryUniversity of Enna KoreEnnaItaly
| | - Miguel Mayo‐Yáñez
- Otorhinolaryngology, Head and Neck Surgery DepartmentComplexo Hospitalario Universitario A Coruña (CHUAC)A CoruñaSpain
| | - Paolo Boscolo‐Rizzo
- Section of Otolaryngology, Department of Medical, Surgical and Health SciencesUniversity of TriesteTriesteItaly
| | - Alberto Maria Saibene
- Otolaryngology Unit, Department of Health Sciences, Santi Paolo e Carlo HospitalUniversity of MilanMilanItaly
| | - Fabio Maglitto
- Maxillo‐Facial Surgery UnitUniversity of Bari “Aldo Moro”BariItaly
| | - Giovanni Salzano
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological ScienceFederico II University of NaplesNaplesItaly
- Maxillo‐Facial Surgery UnitUniversity of Bari “Aldo Moro”BariItaly
| | - Gianluigi Califano
- Department of Neurosciences, Reproductive and Odontostomatological ScienceFederico II University of NaplesNaplesItaly
| | - Stefania Troise
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological ScienceFederico II University of NaplesNaplesItaly
| | | | - Giacomo De Riu
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and PharmacyUniversity of SassariSassariItaly
| |
Collapse
|
5
|
Abou-Abdallah M, Dar T, Mahmudzade Y, Michaels J, Talwar R, Tornari C. The quality and readability of patient information provided by ChatGPT: can AI reliably explain common ENT operations? Eur Arch Otorhinolaryngol 2024; 281:6147-6153. [PMID: 38530460 DOI: 10.1007/s00405-024-08598-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 03/04/2024] [Indexed: 03/28/2024]
Abstract
PURPOSE Access to high-quality and comprehensible patient information is crucial. However, information provided by increasingly prevalent Artificial Intelligence tools has not been thoroughly investigated. This study assesses the quality and readability of information from ChatGPT regarding three index ENT operations: tonsillectomy, adenoidectomy, and grommets. METHODS We asked ChatGPT standard and simplified questions. Readability was calculated using Flesch-Kincaid Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI) and Simple Measure of Gobbledygook (SMOG) scores. We assessed quality using the DISCERN instrument and compared these with ENT UK patient leaflets. RESULTS ChatGPT readability was poor, with mean FRES of 38.9 and 55.1 pre- and post-simplification, respectively. Simplified information from ChatGPT was 43.6% more readable (FRES) but scored 11.6% lower for quality. ENT UK patient information readability and quality was consistently higher. CONCLUSIONS ChatGPT can simplify information at the expense of quality, resulting in shorter answers with important omissions. Limitations in knowledge and insight curb its reliability for healthcare information. Patients should use reputable sources from professional organisations alongside clear communication with their clinicians for well-informed consent and making decisions.
Collapse
Affiliation(s)
- Michel Abou-Abdallah
- Ear, Nose and Throat Department, Luton and Dunstable University Hospital, Lewsey Rd, Luton, LU4 0DZ, UK.
| | - Talib Dar
- Ear, Nose and Throat Department, Luton and Dunstable University Hospital, Lewsey Rd, Luton, LU4 0DZ, UK
| | - Yasamin Mahmudzade
- Foundation Programme, East and North Hertfordshire NHS Trust, Stevenage, UK
| | - Joshua Michaels
- Ear, Nose and Throat Department, Luton and Dunstable University Hospital, Lewsey Rd, Luton, LU4 0DZ, UK
| | - Rishi Talwar
- Ear, Nose and Throat Department, Luton and Dunstable University Hospital, Lewsey Rd, Luton, LU4 0DZ, UK
| | - Chrysostomos Tornari
- Ear, Nose and Throat Department, Luton and Dunstable University Hospital, Lewsey Rd, Luton, LU4 0DZ, UK
| |
Collapse
|
6
|
Aydin S, Karabacak M, Vlachos V, Margetis K. Large language models in patient education: a scoping review of applications in medicine. Front Med (Lausanne) 2024; 11:1477898. [PMID: 39534227 PMCID: PMC11554522 DOI: 10.3389/fmed.2024.1477898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 10/03/2024] [Indexed: 11/16/2024] Open
Abstract
Introduction Large Language Models (LLMs) are sophisticated algorithms that analyze and generate vast amounts of textual data, mimicking human communication. Notable LLMs include GPT-4o by Open AI, Claude 3.5 Sonnet by Anthropic, and Gemini by Google. This scoping review aims to synthesize the current applications and potential uses of LLMs in patient education and engagement. Materials and methods Following the PRISMA-ScR checklist and methodologies by Arksey, O'Malley, and Levac, we conducted a scoping review. We searched PubMed in June 2024, using keywords and MeSH terms related to LLMs and patient education. Two authors conducted the initial screening, and discrepancies were resolved by consensus. We employed thematic analysis to address our primary research question. Results The review identified 201 studies, predominantly from the United States (58.2%). Six themes emerged: generating patient education materials, interpreting medical information, providing lifestyle recommendations, supporting customized medication use, offering perioperative care instructions, and optimizing doctor-patient interaction. LLMs were found to provide accurate responses to patient queries, enhance existing educational materials, and translate medical information into patient-friendly language. However, challenges such as readability, accuracy, and potential biases were noted. Discussion LLMs demonstrate significant potential in patient education and engagement by creating accessible educational materials, interpreting complex medical information, and enhancing communication between patients and healthcare providers. Nonetheless, issues related to the accuracy and readability of LLM-generated content, as well as ethical concerns, require further research and development. Future studies should focus on improving LLMs and ensuring content reliability while addressing ethical considerations.
Collapse
Affiliation(s)
- Serhat Aydin
- School of Medicine, Koç University, Istanbul, Türkiye
| | - Mert Karabacak
- Department of Neurosurgery, Mount Sinai Health System, New York, NY, United States
| | - Victoria Vlachos
- College of Human Ecology, Cornell University, Ithaca, NY, United States
| | | |
Collapse
|
7
|
Carl N, Schramm F, Haggenmüller S, Kather JN, Hetz MJ, Wies C, Michel MS, Wessels F, Brinker TJ. Large language model use in clinical oncology. NPJ Precis Oncol 2024; 8:240. [PMID: 39443582 PMCID: PMC11499929 DOI: 10.1038/s41698-024-00733-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 10/12/2024] [Indexed: 10/25/2024] Open
Abstract
Large language models (LLMs) are undergoing intensive research for various healthcare domains. This systematic review and meta-analysis assesses current applications, methodologies, and the performance of LLMs in clinical oncology. A mixed-methods approach was used to extract, summarize, and compare methodological approaches and outcomes. This review includes 34 studies. LLMs are primarily evaluated on their ability to answer oncologic questions across various domains. The meta-analysis highlights a significant performance variance, influenced by diverse methodologies and evaluation criteria. Furthermore, differences in inherent model capabilities, prompting strategies, and oncological subdomains contribute to heterogeneity. The lack of use of standardized and LLM-specific reporting protocols leads to methodological disparities, which must be addressed to ensure comparability in LLM research and ultimately leverage the reliable integration of LLM technologies into clinical practice.
Collapse
Affiliation(s)
- Nicolas Carl
- Department of Digital Prevention, Diagnostics and Therapy Guidance, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Urology and Urological Surgery, University Medical Center Mannheim, Ruprecht-Karls University Heidelberg, Mannheim, Germany
| | - Franziska Schramm
- Department of Digital Prevention, Diagnostics and Therapy Guidance, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Sarah Haggenmüller
- Department of Digital Prevention, Diagnostics and Therapy Guidance, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany
| | - Martin J Hetz
- Department of Digital Prevention, Diagnostics and Therapy Guidance, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Christoph Wies
- Department of Digital Prevention, Diagnostics and Therapy Guidance, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Medical Faculty, Ruprecht-Karls University Heidelberg, Heidelberg, Germany
| | - Maurice Stephan Michel
- Department of Urology and Urological Surgery, University Medical Center Mannheim, Ruprecht-Karls University Heidelberg, Mannheim, Germany
| | - Frederik Wessels
- Department of Urology and Urological Surgery, University Medical Center Mannheim, Ruprecht-Karls University Heidelberg, Mannheim, Germany
| | - Titus J Brinker
- Department of Digital Prevention, Diagnostics and Therapy Guidance, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
8
|
Yüceler Kaçmaz H, Kahraman H, Akutay S, Dağdelen D. Development and Validation of an Artificial Intelligence-Assisted Patient Education Material for Ostomy Patients: A Methodological Study. J Adv Nurs 2024. [PMID: 39422196 DOI: 10.1111/jan.16542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/26/2024] [Accepted: 10/02/2024] [Indexed: 10/19/2024]
Abstract
AIM To develop and test the validity of an artificial intelligence-assisted patient education material for ostomy patients. DESIGN A methodological study. METHODS The study was carried out in two main stages and five steps: (1) determining the information needs of ostomy patients, (2) creating educational content, (3) converting the educational content into patient education material, (4) validation of patient education material based on expert review and (5) measuring the readability of the patient education material. We used ChatGPT 4.0 to determine the information needs and create patient education material content, and Publuu Online Flipbook Maker was used to convert the educational content into patient education material. Understandability and applicability scores were assessed using the Patient Education Materials Assessment Tool submitted to 10 expert reviews. The tool inter-rater reliability was determined via the intraclass correlation coefficient. Readability was analysed using the Flesch-Kincaid Grade Level, Gunning Fog Index and Simple Measure of Gobbledygook formula. RESULTS The mean Patient Education Materials Assessment Tool understandability score of the patient education material was 81.91%, and the mean Patient Education Materials Assessment Tool actionability score was 85.33%. The scores for the readability indicators were calculated to be Flesch-Kincaid Grade Level: 8.53, Gunning Fog: 10.9 and Simple Measure of Gobbledygook: 7.99. CONCLUSIONS The AI-assisted patient education material for ostomy patients provided accurate information with understandable and actionable responses to patients, but is at a high reading level for patients. IMPLICATIONS FOR THE PROFESSION AND PATIENT CARE Artificial intelligence-assisted patient education materials can significantly increase patient information rates in the health system regarding ease of practice. Artificial intelligence is currently not an option for creating patient education material, and their impact on the patient is not fully known. REPORTING METHOD The study followed the STROBE checklist guidelines. PATIENT OR PUBLIC CONTRIBUTION No patient or public contributions.
Collapse
Affiliation(s)
- Hatice Yüceler Kaçmaz
- Department of Surgical Nursing, Faculty of Health Sciences, Erciyes University, Kayseri, Turkey
| | - Hilal Kahraman
- Department of Surgical Nursing, Faculty of Health Sciences, Erciyes University, Kayseri, Turkey
| | - Seda Akutay
- Department of Surgical Nursing, Faculty of Health Sciences, Erciyes University, Kayseri, Turkey
| | - Derya Dağdelen
- Department of Public Health Nursing, Faculty of Health Sciences, Erciyes University, Kayseri, Turkey
| |
Collapse
|
9
|
Burnette H, Pabani A, von Itzstein MS, Switzer B, Fan R, Ye F, Puzanov I, Naidoo J, Ascierto PA, Gerber DE, Ernstoff MS, Johnson DB. Use of artificial intelligence chatbots in clinical management of immune-related adverse events. J Immunother Cancer 2024; 12:e008599. [PMID: 38816231 PMCID: PMC11141185 DOI: 10.1136/jitc-2023-008599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/14/2024] [Indexed: 06/01/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) chatbots have become a major source of general and medical information, though their accuracy and completeness are still being assessed. Their utility to answer questions surrounding immune-related adverse events (irAEs), common and potentially dangerous toxicities from cancer immunotherapy, are not well defined. METHODS We developed 50 distinct questions with answers in available guidelines surrounding 10 irAE categories and queried two AI chatbots (ChatGPT and Bard), along with an additional 20 patient-specific scenarios. Experts in irAE management scored answers for accuracy and completion using a Likert scale ranging from 1 (least accurate/complete) to 4 (most accurate/complete). Answers across categories and across engines were compared. RESULTS Overall, both engines scored highly for accuracy (mean scores for ChatGPT and Bard were 3.87 vs 3.5, p<0.01) and completeness (3.83 vs 3.46, p<0.01). Scores of 1-2 (completely or mostly inaccurate or incomplete) were particularly rare for ChatGPT (6/800 answer-ratings, 0.75%). Of the 50 questions, all eight physician raters gave ChatGPT a rating of 4 (fully accurate or complete) for 22 questions (for accuracy) and 16 questions (for completeness). In the 20 patient scenarios, the average accuracy score was 3.725 (median 4) and the average completeness was 3.61 (median 4). CONCLUSIONS AI chatbots provided largely accurate and complete information regarding irAEs, and wildly inaccurate information ("hallucinations") was uncommon. However, until accuracy and completeness increases further, appropriate guidelines remain the gold standard to follow.
Collapse
Affiliation(s)
- Hannah Burnette
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Aliyah Pabani
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland, USA
| | - Mitchell S von Itzstein
- Harold C Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Benjamin Switzer
- Department of Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USA
| | - Run Fan
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Fei Ye
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Igor Puzanov
- Department of Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USA
| | | | - Paolo A Ascierto
- Department of Melanoma, Cancer Immunotherapy and Development Therapeutics, Istituto Nazionale Tumori IRCCS Fondazione Pascale, Napoli, Campania, Italy
| | - David E Gerber
- Harold C Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Marc S Ernstoff
- ImmunoOncology Branch (IOB), Developmental Therapeutics Program, Cancer Therapy and Diagnosis Division, National Cancer Institute (NCI), National Institutes of Health, Bethesda, Maryland, USA
| | - Douglas B Johnson
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|