Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gordon EB, Towbin AJ, Wingrove P, Shafique U, Haas B, Kitts AB, Feldman J, Furlan A. Enhancing Patient Communication With Chat-GPT in Radiology: Evaluating the Efficacy and Readability of Answers to Common Imaging-Related Questions. J Am Coll Radiol 2024;21:353-359. [PMID: 37863153 DOI: 10.1016/j.jacr.2023.09.011] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 09/12/2023] [Accepted: 09/20/2023] [Indexed: 10/22/2023]

For:	Gordon EB, Towbin AJ, Wingrove P, Shafique U, Haas B, Kitts AB, Feldman J, Furlan A. Enhancing Patient Communication With Chat-GPT in Radiology: Evaluating the Efficacy and Readability of Answers to Common Imaging-Related Questions. J Am Coll Radiol 2024;21:353-359. [PMID: 37863153 DOI: 10.1016/j.jacr.2023.09.011] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 09/12/2023] [Accepted: 09/20/2023] [Indexed: 10/22/2023]

Number

Cited by Other Article(s)

Lone MR, Sohail SS. Comment on "Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT". Clin Imaging 2024;114:110272. [PMID: 39243497 DOI: 10.1016/j.clinimag.2024.110272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Accepted: 08/24/2024] [Indexed: 09/09/2024]

Villarreal-Espinosa JB, Berreta RS, Allende F, Garcia JR, Ayala S, Familiari F, Chahla J. Accuracy assessment of ChatGPT responses to frequently asked questions regarding anterior cruciate ligament surgery. Knee 2024;51:84-92. [PMID: 39241674 DOI: 10.1016/j.knee.2024.08.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/21/2024] [Accepted: 08/14/2024] [Indexed: 09/09/2024]

Warrier A, Singh R, Haleem A, Zaki H, Eloy JA. The Comparative Diagnostic Capability of Large Language Models in Otolaryngology. Laryngoscope 2024;134:3997-4002. [PMID: 38563415 DOI: 10.1002/lary.31434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/05/2024] [Accepted: 03/21/2024] [Indexed: 04/04/2024]

Gupta M, Gupta P, Ho C, Wood J, Guleria S, Virostko J. Can generative AI improve the readability of patient education materials at a radiology practice? Clin Radiol 2024:S0009-9260(24)00431-8. [PMID: 39266371 DOI: 10.1016/j.crad.2024.08.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 08/08/2024] [Accepted: 08/12/2024] [Indexed: 09/14/2024]

Spina A, Andalib S, Flores D, Vermani R, Halaseh FF, Nelson AM. Evaluation of Generative Language Models in Personalizing Medical Information: Instrument Validation Study. JMIR AI 2024;3:e54371. [PMID: 39137416 DOI: 10.2196/54371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 04/28/2024] [Accepted: 06/29/2024] [Indexed: 08/15/2024]

Abstract

BACKGROUND

Although uncertainties exist regarding implementation, artificial intelligence-driven generative language models (GLMs) have enormous potential in medicine. Deployment of GLMs could improve patient comprehension of clinical texts and improve low health literacy.

OBJECTIVE

The goal of this study is to evaluate the potential of ChatGPT-3.5 and GPT-4 to tailor the complexity of medical information to patient-specific input education level, which is crucial if it is to serve as a tool in addressing low health literacy.

METHODS

Input templates related to 2 prevalent chronic diseases-type II diabetes and hypertension-were designed. Each clinical vignette was adjusted for hypothetical patient education levels to evaluate output personalization. To assess the success of a GLM (GPT-3.5 and GPT-4) in tailoring output writing, the readability of pre- and posttransformation outputs were quantified using the Flesch reading ease score (FKRE) and the Flesch-Kincaid grade level (FKGL).

RESULTS

Responses (n=80) were generated using GPT-3.5 and GPT-4 across 2 clinical vignettes. For GPT-3.5, FKRE means were 57.75 (SD 4.75), 51.28 (SD 5.14), 32.28 (SD 4.52), and 28.31 (SD 5.22) for 6th grade, 8th grade, high school, and bachelor's, respectively; FKGL mean scores were 9.08 (SD 0.90), 10.27 (SD 1.06), 13.4 (SD 0.80), and 13.74 (SD 1.18). GPT-3.5 only aligned with the prespecified education levels at the bachelor's degree. Conversely, GPT-4's FKRE mean scores were 74.54 (SD 2.6), 71.25 (SD 4.96), 47.61 (SD 6.13), and 13.71 (SD 5.77), with FKGL mean scores of 6.3 (SD 0.73), 6.7 (SD 1.11), 11.09 (SD 1.26), and 17.03 (SD 1.11) for the same respective education levels. GPT-4 met the target readability for all groups except the 6th-grade FKRE average. Both GLMs produced outputs with statistically significant differences (P<.001; 8th grade P<.001; high school P<.001; bachelors P=.003; FKGL: 6th grade P=.001; 8th grade P<.001; high school P<.001; bachelors P<.001) between mean FKRE and FKGL across input education levels.

CONCLUSIONS

GLMs can change the structure and readability of medical text outputs according to input-specified education. However, GLMs categorize input education designation into 3 broad tiers of output readability: easy (6th and 8th grade), medium (high school), and difficult (bachelor's degree). This is the first result to suggest that there are broader boundaries in the success of GLMs in output text simplification. Future research must establish how GLMs can reliably personalize medical texts to prespecified education levels to enable a broader impact on health care literacy.

Collapse

Yong LPX, Tung JYM, Lee ZY, Kuan WS, Chua MT. Performance of Large Language Models in Patient Complaint Resolution: Web-Based Cross-Sectional Survey. J Med Internet Res 2024;26:e56413. [PMID: 39121468 PMCID: PMC11344182 DOI: 10.2196/56413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 04/07/2024] [Accepted: 07/05/2024] [Indexed: 08/11/2024] Open

Abstract

BACKGROUND

Patient complaints are a perennial challenge faced by health care institutions globally, requiring extensive time and effort from health care workers. Despite these efforts, patient dissatisfaction remains high. Recent studies on the use of large language models (LLMs) such as the GPT models developed by OpenAI in the health care sector have shown great promise, with the ability to provide more detailed and empathetic responses as compared to physicians. LLMs could potentially be used in responding to patient complaints to improve patient satisfaction and complaint response time.

OBJECTIVE

This study aims to evaluate the performance of LLMs in addressing patient complaints received by a tertiary health care institution, with the goal of enhancing patient satisfaction.

METHODS

Anonymized patient complaint emails and associated responses from the patient relations department were obtained. ChatGPT-4.0 (OpenAI, Inc) was provided with the same complaint email and tasked to generate a response. The complaints and the respective responses were uploaded onto a web-based questionnaire. Respondents were asked to rate both responses on a 10-point Likert scale for 4 items: appropriateness, completeness, empathy, and satisfaction. Participants were also asked to choose a preferred response at the end of each scenario.

RESULTS

There was a total of 188 respondents, of which 115 (61.2%) were health care workers. A majority of the respondents, including both health care and non-health care workers, preferred replies from ChatGPT (n=164, 87.2% to n=183, 97.3%). GPT-4.0 responses were rated higher in all 4 assessed items with all median scores of 8 (IQR 7-9) compared to human responses (appropriateness 5, IQR 3-7; empathy 4, IQR 3-6; quality 5, IQR 3-6; satisfaction 5, IQR 3-6; P<.001) and had higher average word counts as compared to human responses (238 vs 76 words). Regression analyses showed that a higher word count was a statistically significant predictor of higher score in all 4 items, with every 1-word increment resulting in an increase in scores of between 0.015 and 0.019 (all P<.001). However, on subgroup analysis by authorship, this only held true for responses written by patient relations department staff and not those generated by ChatGPT which received consistently high scores irrespective of response length.

CONCLUSIONS

This study provides significant evidence supporting the effectiveness of LLMs in resolution of patient complaints. ChatGPT demonstrated superiority in terms of response appropriateness, empathy, quality, and overall satisfaction when compared against actual human responses to patient complaints. Future research can be done to measure the degree of improvement that artificial intelligence generated responses can bring in terms of time savings, cost-effectiveness, patient satisfaction, and stress reduction for the health care system.

Collapse

Swisher AR, Wu AW, Liu GC, Lee MK, Carle TR, Tang DM. Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT's Large Language Model. Otolaryngol Head Neck Surg 2024. [PMID: 39105460 DOI: 10.1002/ohn.927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 07/03/2024] [Accepted: 07/20/2024] [Indexed: 08/07/2024]

Abstract

OBJECTIVE

To use an artificial intelligence (AI)-powered large language model (LLM) to improve readability of patient handouts.

STUDY DESIGN

Review of online material modified by AI.

SETTING

Academic center.

METHODS

Five handout materials obtained from the American Rhinologic Society (ARS) and the American Academy of Facial Plastic and Reconstructive Surgery websites were assessed using validated readability metrics. The handouts were inputted into OpenAI's ChatGPT-4 after prompting: "Rewrite the following at a 6th-grade reading level." The understandability and actionability of both native and LLM-revised versions were evaluated using the Patient Education Materials Assessment Tool (PEMAT). Results were compared using Wilcoxon rank-sum tests.

RESULTS

The mean readability scores of the standard (ARS, American Academy of Facial Plastic and Reconstructive Surgery) materials corresponded to "difficult," with reading categories ranging between high school and university grade levels. Conversely, the LLM-revised handouts had an average seventh-grade reading level. LLM-revised handouts had better readability in nearly all metrics tested: Flesch-Kincaid Reading Ease (70.8 vs 43.9; P < .05), Gunning Fog Score (10.2 vs 14.42; P < .05), Simple Measure of Gobbledygook (9.9 vs 13.1; P < .05), Coleman-Liau (8.8 vs 12.6; P < .05), and Automated Readability Index (8.2 vs 10.7; P = .06). PEMAT scores were significantly higher in the LLM-revised handouts for understandability (91 vs 74%; P < .05) with similar actionability (42 vs 34%; P = .15) when compared to the standard materials.

CONCLUSION

Patient-facing handouts can be augmented by ChatGPT with simple prompting to tailor information with improved readability. This study demonstrates the utility of LLMs to aid in rewriting patient handouts and may serve as a tool to help optimize education materials.

LEVEL OF EVIDENCE

Level VI.

Collapse

Triana Rodriguez GA, Rojas-Rojas MM, Sotomayor K, Ovalle JP, Cardona Ortegón JD. Generative Artificial Intelligence: A Promising Instrument for Daily Living and Clinical Practice. J Am Coll Radiol 2024;21:1158-1159. [PMID: 38237881 DOI: 10.1016/j.jacr.2023.12.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 12/15/2023] [Indexed: 02/12/2024]

Gumilar KE, Indraprasta BR, Hsu YC, Yu ZY, Chen H, Irawan B, Tambunan Z, Wibowo BM, Nugroho H, Tjokroprawiro BA, Dachlan EG, Mulawardhana P, Rahestyningtyas E, Pramuditya H, Putra VGE, Waluyo ST, Tan NR, Folarin R, Ibrahim IH, Lin CH, Hung TY, Lu TF, Chen YF, Shih YH, Wang SJ, Huang J, Yates CC, Lu CH, Liao LN, Tan M. Disparities in medical recommendations from AI-based chatbots across different countries/regions. Sci Rep 2024;14:17052. [PMID: 39048640 PMCID: PMC11269683 DOI: 10.1038/s41598-024-67689-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 07/15/2024] [Indexed: 07/27/2024] Open

Affiliation(s)

Khanisyah E Gumilar Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan. Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga-Faculty of Medicine, Universitas Airlangga, Jl. Dharmahusada Permai, Mulyorejo, Kec. Mulyorejo, Surabaya, Jawa Timur, 60115, Indonesia.
Birama R Indraprasta Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
Yu-Cheng Hsu Department of Public Health, China Medical University, No. 100, Sec. 1, Jingmao Rd, Beitun Dist, Taichung, 406040, Taiwan, ROC School of Chinese Medicine, China Medical University, Taichung, Taiwan
Zih-Ying Yu Department of Public Health, China Medical University, No. 100, Sec. 1, Jingmao Rd, Beitun Dist, Taichung, 406040, Taiwan, ROC
Hong Chen Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan
Budi Irawan Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
Zulkarnain Tambunan Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
Bagus M Wibowo Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
Hari Nugroho Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
Brahmana A Tjokroprawiro Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
Erry G Dachlan Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
Pungky Mulawardhana Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga-Faculty of Medicine, Universitas Airlangga, Jl. Dharmahusada Permai, Mulyorejo, Kec. Mulyorejo, Surabaya, Jawa Timur, 60115, Indonesia
Eccita Rahestyningtyas Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga-Faculty of Medicine, Universitas Airlangga, Jl. Dharmahusada Permai, Mulyorejo, Kec. Mulyorejo, Surabaya, Jawa Timur, 60115, Indonesia
Herlangga Pramuditya Department of Obstetrics and Gynecology, Dr. Ramelan Naval Hospital, Surabaya, Indonesia
Very Great E Putra Department of Obstetrics and Gynecology, Dr. Kariadi Central General Hospital, Semarang, Indonesia
Setyo T Waluyo Department of Obstetrics and Gynecology, Ulin General Hospital, Banjarmasin, Indonesia
Nathan R Tan Department of Modern and Classical Languages and Literature, University of South Alabama, Mobile, AL, USA
Royhaan Folarin Department of Anatomy, Faculty of Basic Medical Sciences, Olabisi Onabanjo University, Sagamu, Nigeria
Ibrahim H Ibrahim Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan
Cheng-Han Lin Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan
Tai-Yu Hung Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan
Ting-Fang Lu Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC
Yen-Fu Chen Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC
Yu-Hsiang Shih Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC
Shao-Jing Wang Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC
Jingshan Huang School of Computing and College of Medicine, University of South Alabama, Mobile, AL, USA
Clayton C Yates Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
Chien-Hsing Lu Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC.
Li-Na Liao Department of Public Health, China Medical University, No. 100, Sec. 1, Jingmao Rd, Beitun Dist, Taichung, 406040, Taiwan, ROC.
Ming Tan Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan. Institute of Biochemistry and Molecular Biology, Graduate Institute of Biomedical Sciences, China Medical University (Taiwan), No. 100, Sec. 1, Jingmao Rd, Beitun Dist, Taichung, 406040, Taiwan, ROC.

Collapse

Keshavarz P, Bagherieh S, Nabipoorashrafi SA, Chalian H, Rahsepar AA, Kim GHJ, Hassani C, Raman SS, Bedayat A. ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives. Diagn Interv Imaging 2024;105:251-265. [PMID: 38679540 DOI: 10.1016/j.diii.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/11/2024] [Accepted: 04/16/2024] [Indexed: 05/01/2024]

Abstract

PURPOSE

The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications.

MATERIALS AND METHODS

After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications.

RESULTS

Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists' decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks.

CONCLUSION

Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.

Collapse

Cardona Ortegón JD, Serrano S, Romero Cortes D. Re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol 2024;85:146-53. Eur Urol 2024;86:e22. [PMID: 38644147 DOI: 10.1016/j.eururo.2024.02.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 02/21/2024] [Indexed: 04/23/2024]

Ye H. Other possible perspectives for solving the negative outcome penalty paradox in the application of artificial intelligence in clinical diagnostics. JOURNAL OF MEDICAL ETHICS 2024:jme-2024-109968. [PMID: 38871400 DOI: 10.1136/jme-2024-109968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 06/05/2024] [Indexed: 06/15/2024]

Marti-Aguado D, Pazó J, Diaz-Gonzalez A, de Las Heras Páez de la Cadena B, Conthe A, Gallego Duran R, Rodríguez-Gandía MA, Turnes J, Romero-Gomez M. LiverAI: New tool in the landscape for liver health. GASTROENTEROLOGIA Y HEPATOLOGIA 2024;47:646-648. [PMID: 38582150 DOI: 10.1016/j.gastrohep.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 04/02/2024] [Indexed: 04/08/2024]

Tu W, Joe BN. The Era of ChatGPT and Large Language Models: Can We Advance Patient-centered Communications Appropriately and Safely? Radiol Imaging Cancer 2024;6:e240038. [PMID: 38668641 PMCID: PMC11148828 DOI: 10.1148/rycan.240038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 03/20/2024] [Accepted: 03/21/2024] [Indexed: 05/18/2024]

Kaba E, Hürsoy N, Solak M, Çeliker FB. Accuracy of Large Language Models in Thyroid Nodule-Related Questions Based on the Korean Thyroid Imaging Reporting and Data System (K-TIRADS). Korean J Radiol 2024;25:499-500. [PMID: 38685738 PMCID: PMC11058430 DOI: 10.3348/kjr.2024.0229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 03/10/2024] [Accepted: 03/15/2024] [Indexed: 05/02/2024] Open

Humbsch P, Horn E, Bohm K, Gintrowicz R. [ChatGPT for use in technology-enhanced learning in anesthesiology and emergency medicine and potential clinical application of AI language models : Between hype and reality around artificial intelligence in medical use]. DIE ANAESTHESIOLOGIE 2024;73:324-335. [PMID: 38691128 PMCID: PMC11076380 DOI: 10.1007/s00101-024-01403-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 02/04/2024] [Accepted: 03/10/2024] [Indexed: 05/03/2024]

Abstract

BACKGROUND

The utilization of AI language models in education and academia is currently a subject of research, and applications in clinical settings are also being tested. Studies conducted by various research groups have demonstrated that language models can answer questions related to medical board examinations, and there are potential applications of these models in medical education as well.

RESEARCH QUESTION

This study aims to investigate the extent to which current version language models prove effective for addressing medical inquiries, their potential utility in medical education, and the challenges that still exist in the functioning of AI language models.

METHOD

The program ChatGPT, based on GPT 3.5, had to answer 1025 questions from the second part (M2) of the medical board examination. The study examined whether any errors and what types of errors occurred. Additionally, the language model was asked to generate essays on the learning objectives outlined in the standard curriculum for specialist training in anesthesiology and the supplementary qualification in emergency medicine. These essays were analyzed afterwards and checked for errors and anomalies.

RESULTS

The findings indicated that ChatGPT was able to correctly answer the questions with an accuracy rate exceeding 69%, even when the questions included references to visual aids. This represented an improvement in the accuracy of answering board examination questions compared to a study conducted in March; however, when it came to generating essays a high error rate was observed.

DISCUSSION

Considering the current pace of ongoing improvements in AI language models, widespread clinical implementation, especially in emergency departments as well as emergency and intensive care medicine with the assistance of medical trainees, is a plausible scenario. These models can provide insights to support medical professionals in their work, without relying solely on the language model. Although the use of these models in education holds promise, it currently requires a significant amount of supervision. Due to hallucinations caused by inadequate training environments for the language model, the generated texts might deviate from the current state of scientific knowledge. Direct deployment in patient care settings without permanent physician supervision does not yet appear to be achievable at present.

Collapse

Doğan L, Özçakmakcı GB, Yılmaz ĬE. The Performance of Chatbots and the AAPOS Website as a Tool for Amblyopia Education. J Pediatr Ophthalmol Strabismus 2024:1-7. [PMID: 38661309 DOI: 10.3928/01913913-20240409-01] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]

Abstract

PURPOSE

To evaluate the understandability, actionability, and readability of responses provided by the website of the American Association for Pediatric Ophthalmology and Strabismus (AAPOS), ChatGPT-3.5, Bard, and Bing Chat about amblyopia and the appropriateness of the responses generated by the chatbots.

METHOD

Twenty-five questions provided by the AAPOS website were directed three times to fresh ChatGPT-3.5, Bard, and Bing Chat interfaces. Two experienced pediatric ophthalmologists categorized the responses of the chatbots in terms of their appropriateness. Flesch Reading Ease (FRE), Flesch Kincaid Grade Level (FKGL), and Coleman-Liau Index (CLI) were used to evaluate the readability of the responses of the AAPOS website and chatbots. Furthermore, the understandability scores were evaluated using the Patient Education Materials Assessment Tool (PEMAT).

RESULTS

The appropriateness of the chatbots' responses was 84.0% for ChatGPT-3.5 and Bard and 80% for Bing Chat (P > .05). For understandability (mean PEMAT-U score AAPOS website: 81.5%, Bard: 77.6%, ChatGPT-3.5: 76.1%, and Bing Chat: 71.5%, P < .05) and actionability (mean PEMAT-A score AAPOS website: 74.6%, Bard: 69.2%, ChatGPT-3.5: 67.8%, and Bing Chat: 64.8%, P < .05), the AAPOs website scored better than the chat-bots. Three readability analyses showed that Bard had the highest mean score, followed by the AAPOS website, Bing Chat, and ChatGPT-3.5, and these scores were more challenging than the recommended level.

CONCLUSIONS

Chatbots have the potential to provide detailed and appropriate responses at acceptable levels. The AAPOS website has the advantage of providing information that is more understandable and actionable. The AAPOS website and chatbots, especially Chat-GPT, provided difficult-to-read data for patient education regarding amblyopia. [J Pediatr Ophthalmol Strabismus. 20XX;X(X):XXX-XXX.].

Collapse

Till T, Tschauner S, Singer G, Lichtenegger K, Till H. Development and optimization of AI algorithms for wrist fracture detection in children using a freely available dataset. Front Pediatr 2023;11:1291804. [PMID: 38188914 PMCID: PMC10768054 DOI: 10.3389/fped.2023.1291804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 12/05/2023] [Indexed: 01/09/2024] Open

Abstract

Introduction

In the field of pediatric trauma computer-aided detection (CADe) and computer-aided diagnosis (CADx) systems have emerged offering a promising avenue for improved patient care. Especially children with wrist fractures may benefit from machine learning (ML) solutions, since some of these lesions may be overlooked on conventional X-ray due to minimal compression without dislocation or mistaken for cartilaginous growth plates. In this article, we describe the development and optimization of AI algorithms for wrist fracture detection in children.

Methods

A team of IT-specialists, pediatric radiologists and pediatric surgeons used the freely available GRAZPEDWRI-DX dataset containing annotated pediatric trauma wrist radiographs of 6,091 patients, a total number of 10,643 studies (20,327 images). First, a basic object detection model, a You Only Look Once object detector of the seventh generation (YOLOv7) was trained and tested on these data. Then, team decisions were taken to adjust data preparation, image sizes used for training and testing, and configuration of the detection model. Furthermore, we investigated each of these models using an Explainable Artificial Intelligence (XAI) method called Gradient Class Activation Mapping (Grad-CAM). This method visualizes where a model directs its attention to before classifying and regressing a certain class through saliency maps.

Results

Mean average precision (mAP) improved when applying optimizations pre-processing the dataset images (maximum increases of + 25.51% mAP@0.5 and + 39.78% mAP@[0.5:0.95]), as well as the object detection model itself (maximum increases of + 13.36% mAP@0.5 and + 27.01% mAP@[0.5:0.95]). Generally, when analyzing the resulting models using XAI methods, higher scoring model variations in terms of mAP paid more attention to broader regions of the image, prioritizing detection accuracy over precision compared to the less accurate models.

Discussion

This paper supports the implementation of ML solutions for pediatric trauma care. Optimization of a large X-ray dataset and the YOLOv7 model improve the model's ability to detect objects and provide valid diagnostic support to health care specialists. Such optimization protocols must be understood and advocated, before comparing ML performances against health care specialists.

Collapse

Coraci D, Maccarone MC, Regazzo G, Accordi G, Papathanasiou JV, Masiero S. ChatGPT in the development of medical questionnaires. The example of the low back pain. Eur J Transl Myol 2023;33:12114. [PMID: 38112605 PMCID: PMC10811646 DOI: 10.4081/ejtm.2023.12114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 12/04/2023] [Indexed: 12/21/2023] Open