Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rahsepar AA, Tavakoli N, Kim GHJ, Hassani C, Abtin F, Bedayat A. How AI Responds to Common Lung Cancer Questions: ChatGPT vs Google Bard. Radiology 2023;307:e230922. [PMID: 37310252 DOI: 10.1148/radiol.230922] [Citation(s) in RCA: 52] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

For:	Rahsepar AA, Tavakoli N, Kim GHJ, Hassani C, Abtin F, Bedayat A. How AI Responds to Common Lung Cancer Questions: ChatGPT vs Google Bard. Radiology 2023;307:e230922. [PMID: 37310252 DOI: 10.1148/radiol.230922] [Citation(s) in RCA: 52] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Number

Cited by Other Article(s)

Monroe CL, Abdelhafez YG, Atsina K, Aman E, Nardo L, Madani MH. Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT. Clin Imaging 2024;112:110193. [PMID: 38820977 DOI: 10.1016/j.clinimag.2024.110193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/22/2024] [Accepted: 05/20/2024] [Indexed: 06/02/2024]

Kooraki S, Hosseiny M, Jalili MH, Rahsepar AA, Imanzadeh A, Kim GH, Hassani C, Abtin F, Moriarty JM, Bedayat A. Evaluation of ChatGPT-Generated Educational Patient Pamphlets for Common Interventional Radiology Procedures. Acad Radiol 2024:S1076-6332(24)00307-6. [PMID: 38839458 DOI: 10.1016/j.acra.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/10/2024] [Accepted: 05/13/2024] [Indexed: 06/07/2024]

Daraqel B, Wafaie K, Mohammed H, Cao L, Mheissen S, Liu Y, Zheng L. The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard. Am J Orthod Dentofacial Orthop 2024;165:652-662. [PMID: 38493370 DOI: 10.1016/j.ajodo.2024.01.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 01/01/2024] [Accepted: 01/01/2024] [Indexed: 03/18/2024]

Moll M, Heilemann G, Georg D, Kauer-Dorner D, Kuess P. The role of artificial intelligence in informed patient consent for radiotherapy treatments-a case report. Strahlenther Onkol 2024;200:544-548. [PMID: 38180493 DOI: 10.1007/s00066-023-02190-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 12/03/2023] [Indexed: 01/06/2024]

Patil NS, Huang R, Mihalache A, Kisilevsky E, Kwok J, Popovic MM, Nassrallah G, Chan C, Mallipatna A, Kertes PJ, Muni RH. THE ABILITY OF ARTIFICIAL INTELLIGENCE CHATBOTS ChatGPT AND GOOGLE BARD TO ACCURATELY CONVEY PREOPERATIVE INFORMATION FOR PATIENTS UNDERGOING OPHTHALMIC SURGERIES. Retina 2024;44:950-953. [PMID: 38215455 DOI: 10.1097/iae.0000000000004044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2024]

Protocol for the development of the Chatbot Assessment Reporting Tool (CHART) for clinical advice. BMJ Open 2024;14:e081155. [PMID: 38772889 PMCID: PMC11110548 DOI: 10.1136/bmjopen-2023-081155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 03/26/2024] [Indexed: 05/23/2024] Open

Abstract

INTRODUCTION

Large language model (LLM)-linked chatbots are being increasingly applied in healthcare due to their impressive functionality and public availability. Studies have assessed the ability of LLM-linked chatbots to provide accurate clinical advice. However, the methods applied in these Chatbot Assessment Studies are inconsistent due to the lack of reporting standards available, which obscures the interpretation of their study findings. This protocol outlines the development of the Chatbot Assessment Reporting Tool (CHART) reporting guideline.

METHODS AND ANALYSIS

The development of the CHART reporting guideline will consist of three phases, led by the Steering Committee. During phase one, the team will identify relevant reporting guidelines with artificial intelligence extensions that are published or in development by searching preprint servers, protocol databases, and the Enhancing the Quality and Transparency of health research Network. During phase two, we will conduct a scoping review to identify studies that have addressed the performance of LLM-linked chatbots in summarising evidence and providing clinical advice. The Steering Committee will identify methodology used in previous Chatbot Assessment Studies. Finally, the study team will use checklist items from prior reporting guidelines and findings from the scoping review to develop a draft reporting checklist. We will then perform a Delphi consensus and host two synchronous consensus meetings with an international, multidisciplinary group of stakeholders to refine reporting checklist items and develop a flow diagram.

ETHICS AND DISSEMINATION

We will publish the final CHART reporting guideline in peer-reviewed journals and will present findings at peer-reviewed meetings. Ethical approval was submitted to the Hamilton Integrated Research Ethics Board and deemed "not required" in accordance with the Tri-Council Policy Statement (TCPS2) for the development of the CHART reporting guideline (#17025).

REGISTRATION

This study protocol is preregistered with Open Science Framework: https://doi.org/10.17605/OSF.IO/59E2Q.

Collapse

Pressman SM, Borna S, Gomez-Cabello CA, Haider SA, Forte AJ. AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries. J Clin Med 2024;13:2832. [PMID: 38792374 PMCID: PMC11122623 DOI: 10.3390/jcm13102832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 04/29/2024] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open

Jedrzejczak WW, Skarzynski PH, Raj-Koziak D, Sanfins MD, Hatzopoulos S, Kochanek K. ChatGPT for Tinnitus Information and Support: Response Accuracy and Retest after Three and Six Months. Brain Sci 2024;14:465. [PMID: 38790444 PMCID: PMC11118795 DOI: 10.3390/brainsci14050465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 05/03/2024] [Accepted: 05/05/2024] [Indexed: 05/26/2024] Open

Tu W, Joe BN. The Era of ChatGPT and Large Language Models: Can We Advance Patient-centered Communications Appropriately and Safely? Radiol Imaging Cancer 2024;6:e240038. [PMID: 38668641 PMCID: PMC11148828 DOI: 10.1148/rycan.240038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 03/20/2024] [Accepted: 03/21/2024] [Indexed: 05/18/2024]

Cesur T, Güneş YC. Optimizing Diagnostic Performance of ChatGPT: The Impact of Prompt Engineering on Thoracic Radiology Cases. Cureus 2024;16:e60009. [PMID: 38854352 PMCID: PMC11162509 DOI: 10.7759/cureus.60009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2024] [Indexed: 06/11/2024] Open

Niko MM, Karbasi Z, Kazemi M, Zahmatkeshan M. Comparing ChatGPT and Bing, in response to the Home Blood Pressure Monitoring (HBPM) knowledge checklist. Hypertens Res 2024;47:1401-1409. [PMID: 38438722 DOI: 10.1038/s41440-024-01624-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/23/2024] [Accepted: 01/27/2024] [Indexed: 03/06/2024]

Al-Sharif EM, Penteado RC, Dib El Jalbout N, Topilow NJ, Shoji MK, Kikkawa DO, Liu CY, Korn BS. Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence. Ophthalmic Plast Reconstr Surg 2024;40:303-311. [PMID: 38215452 DOI: 10.1097/iop.0000000000002567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2024]

Abstract

PURPOSE

This study evaluates and compares the accuracy of responses from 2 artificial intelligence platforms to patients' oculoplastics-related questions.

METHODS

Questions directed toward oculoplastic surgeons were collected, rephrased, and input independently into ChatGPT-3.5 and BARD chatbots, using the prompt: "As an oculoplastic surgeon, how can I respond to my patient's question?." Responses were independently evaluated by 4 experienced oculoplastic specialists as comprehensive, correct but inadequate, mixed correct and incorrect/outdated data, and completely incorrect. Additionally, the empathy level, length, and automated readability index of the responses were assessed.

RESULTS

A total of 112 patient questions underwent evaluation. The rates of comprehensive, correct but inadequate, mixed, and completely incorrect answers for ChatGPT were 71.4%, 12.9%, 10.5%, and 5.1%, respectively, compared with 53.1%, 18.3%, 18.1%, and 10.5%, respectively, for BARD. ChatGPT showed more empathy (48.9%) than BARD (13.2%). All graders found that ChatGPT outperformed BARD in question categories of postoperative healing, medical eye conditions, and medications. Categorizing questions by anatomy, ChatGPT excelled in answering lacrimal questions (83.8%), while BARD performed best in the eyelid group (60.4%). ChatGPT's answers were longer and potentially more challenging to comprehend than BARD's.

CONCLUSION

This study emphasizes the promising role of artificial intelligence-powered chatbots in oculoplastic patient education and support. With continued development, these chatbots may potentially assist physicians and offer patients accurate information, ultimately contributing to improved patient care while alleviating surgeon burnout. However, it is crucial to highlight that artificial intelligence may be good at answering questions, but physician oversight remains essential to ensure the highest standard of care and address complex medical cases.

Collapse

Affiliation(s)

Eman M Al-Sharif Division of Oculofacial Plastic and Reconstructive Surgery, Viterbi Family Department of Ophthalmology, UC San Diego Shiley Eye Institute, La Jolla, California, U.S.A Clinical Sciences Department, College of Medicine, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
Rafaella C Penteado Division of Oculofacial Plastic and Reconstructive Surgery, Viterbi Family Department of Ophthalmology, UC San Diego Shiley Eye Institute, La Jolla, California, U.S.A
Nahia Dib El Jalbout Division of Oculofacial Plastic and Reconstructive Surgery, Viterbi Family Department of Ophthalmology, UC San Diego Shiley Eye Institute, La Jolla, California, U.S.A
Nicole J Topilow Division of Oculofacial Plastic and Reconstructive Surgery, Viterbi Family Department of Ophthalmology, UC San Diego Shiley Eye Institute, La Jolla, California, U.S.A
Marissa K Shoji Division of Oculofacial Plastic and Reconstructive Surgery, Viterbi Family Department of Ophthalmology, UC San Diego Shiley Eye Institute, La Jolla, California, U.S.A
Don O Kikkawa Division of Oculofacial Plastic and Reconstructive Surgery, Viterbi Family Department of Ophthalmology, UC San Diego Shiley Eye Institute, La Jolla, California, U.S.A Division of Plastic and Reconstructive Surgery, Department of Surgery, UC San Diego School of Medicine, La Jolla, California, U.S.A
Catherine Y Liu Division of Oculofacial Plastic and Reconstructive Surgery, Viterbi Family Department of Ophthalmology, UC San Diego Shiley Eye Institute, La Jolla, California, U.S.A
Bobby S Korn Division of Oculofacial Plastic and Reconstructive Surgery, Viterbi Family Department of Ophthalmology, UC San Diego Shiley Eye Institute, La Jolla, California, U.S.A Division of Plastic and Reconstructive Surgery, Department of Surgery, UC San Diego School of Medicine, La Jolla, California, U.S.A

Collapse

Yang J, Ardavanis KS, Slack KE, Fernando ND, Della Valle CJ, Hernandez NM. Chat Generative Pretrained Transformer (ChatGPT) and Bard: Artificial Intelligence Does not yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis. J Arthroplasty 2024;39:1184-1190. [PMID: 38237878 DOI: 10.1016/j.arth.2024.01.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 01/08/2024] [Accepted: 01/11/2024] [Indexed: 02/22/2024] Open

Abstract

BACKGROUND

Advancements in artificial intelligence (AI) have led to the creation of large language models (LLMs), such as Chat Generative Pretrained Transformer (ChatGPT) and Bard, that analyze online resources to synthesize responses to user queries. Despite their popularity, the accuracy of LLM responses to medical questions remains unknown. This study aimed to compare the responses of ChatGPT and Bard regarding treatments for hip and knee osteoarthritis with the American Academy of Orthopaedic Surgeons (AAOS) Evidence-Based Clinical Practice Guidelines (CPGs) recommendations.

METHODS

Both ChatGPT (Open AI) and Bard (Google) were queried regarding 20 treatments (10 for hip and 10 for knee osteoarthritis) from the AAOS CPGs. Responses were classified by 2 reviewers as being in "Concordance," "Discordance," or "No Concordance" with AAOS CPGs. A Cohen's Kappa coefficient was used to assess inter-rater reliability, and Chi-squared analyses were used to compare responses between LLMs.

RESULTS

Overall, ChatGPT and Bard provided responses that were concordant with the AAOS CPGs for 16 (80%) and 12 (60%) treatments, respectively. Notably, ChatGPT and Bard encouraged the use of non-recommended treatments in 30% and 60% of queries, respectively. There were no differences in performance when evaluating by joint or by recommended versus non-recommended treatments. Studies were referenced in 6 (30%) of the Bard responses and none (0%) of the ChatGPT responses. Of the 6 Bard responses, studies could only be identified for 1 (16.7%). Of the remaining, 2 (33.3%) responses cited studies in journals that did not exist, 2 (33.3%) cited studies that could not be found with the information given, and 1 (16.7%) provided links to unrelated studies.

CONCLUSIONS

Both ChatGPT and Bard do not consistently provide responses that align with the AAOS CPGs. Consequently, physicians and patients should temper expectations on the guidance AI platforms can currently provide.

Collapse

Schlussel L, Samaan JS, Chan Y, Chang B, Yeo YH, Ng WH, Rezaie A. Evaluating the accuracy and reproducibility of ChatGPT-4 in answering patient questions related to small intestinal bacterial overgrowth. Artif Intell Gastroenterol 2024;5:90503. [DOI: 10.35712/aig.v5.i1.90503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 03/27/2024] [Accepted: 04/16/2024] [Indexed: 04/29/2024] Open

Abstract

BACKGROUND

Small intestinal bacterial overgrowth (SIBO) poses diagnostic and treatment challenges due to its complex management and evolving guidelines. Patients often seek online information related to their health, prompting interest in large language models, like GPT-4, as potential sources of patient education.

AIM

To investigate ChatGPT-4's accuracy and reproducibility in responding to patient questions related to SIBO.

METHODS

A total of 27 patient questions related to SIBO were curated from professional societies, Facebook groups, and Reddit threads. Each question was entered into GPT-4 twice on separate days to examine reproducibility of accuracy on separate occasions. GPT-4 generated responses were independently evaluated for accuracy and reproducibility by two motility fellowship-trained gastroenterologists. A third senior fellowship-trained gastroenterologist resolved disagreements. Accuracy of responses were graded using the scale: (1) Comprehensive; (2) Correct but inadequate; (3) Some correct and some incorrect; or (4) Completely incorrect. Two responses were generated for every question to evaluate reproducibility in accuracy.

RESULTS

In evaluating GPT-4's effectiveness at answering SIBO-related questions, it provided responses with correct information to 18/27 (66.7%) of questions, with 16/27 (59.3%) of responses graded as comprehensive and 2/27 (7.4%) responses graded as correct but inadequate. The model provided responses with incorrect information to 9/27 (33.3%) of questions, with 4/27 (14.8%) of responses graded as completely incorrect and 5/27 (18.5%) of responses graded as mixed correct and incorrect data. Accuracy varied by question category, with questions related to “basic knowledge” achieving the highest proportion of comprehensive responses (90%) and no incorrect responses. On the other hand, the “treatment” related questions yielded the lowest proportion of comprehensive responses (33.3%) and highest percent of completely incorrect responses (33.3%). A total of 77.8% of questions yielded reproducible responses.

CONCLUSION

Though GPT-4 shows promise as a supplementary tool for SIBO-related patient education, the model requires further refinement and validation in subsequent iterations prior to its integration into patient care.

Collapse

Keshavarz P, Bagherieh S, Nabipoorashrafi SA, Chalian H, Rahsepar AA, Kim GHJ, Hassani C, Raman SS, Bedayat A. ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives. Diagn Interv Imaging 2024:S2211-5684(24)00105-0. [PMID: 38679540 DOI: 10.1016/j.diii.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/11/2024] [Accepted: 04/16/2024] [Indexed: 05/01/2024]

Abstract

PURPOSE

The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications.

MATERIALS AND METHODS

After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications.

RESULTS

Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists' decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks.

CONCLUSION

Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.

Collapse

Lv X, Zhang X, Li Y, Ding X, Lai H, Shi J. Leveraging Large Language Models for Improved Patient Access and Self-Management: Assessor-Blinded Comparison Between Expert- and AI-Generated Content. J Med Internet Res 2024;26:e55847. [PMID: 38663010 PMCID: PMC11082737 DOI: 10.2196/55847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 03/04/2024] [Accepted: 03/19/2024] [Indexed: 05/12/2024] Open

Abstract

BACKGROUND

While large language models (LLMs) such as ChatGPT and Google Bard have shown significant promise in various fields, their broader impact on enhancing patient health care access and quality, particularly in specialized domains such as oral health, requires comprehensive evaluation.

OBJECTIVE

This study aims to assess the effectiveness of Google Bard, ChatGPT-3.5, and ChatGPT-4 in offering recommendations for common oral health issues, benchmarked against responses from human dental experts.

METHODS

This comparative analysis used 40 questions derived from patient surveys on prevalent oral diseases, which were executed in a simulated clinical environment. Responses, obtained from both human experts and LLMs, were subject to a blinded evaluation process by experienced dentists and lay users, focusing on readability, appropriateness, harmlessness, comprehensiveness, intent capture, and helpfulness. Additionally, the stability of artificial intelligence responses was also assessed by submitting each question 3 times under consistent conditions.

RESULTS

Google Bard excelled in readability but lagged in appropriateness when compared to human experts (mean 8.51, SD 0.37 vs mean 9.60, SD 0.33; P=.03). ChatGPT-3.5 and ChatGPT-4, however, performed comparably with human experts in terms of appropriateness (mean 8.96, SD 0.35 and mean 9.34, SD 0.47, respectively), with ChatGPT-4 demonstrating the highest stability and reliability. Furthermore, all 3 LLMs received superior harmlessness scores comparable to human experts, with lay users finding minimal differences in helpfulness and intent capture between the artificial intelligence models and human responses.

CONCLUSIONS

LLMs, particularly ChatGPT-4, show potential in oral health care, providing patient-centric information for enhancing patient education and clinical care. The observed performance variations underscore the need for ongoing refinement and ethical considerations in health care settings. Future research focuses on developing strategies for the safe integration of LLMs in health care settings.

Collapse

Affiliation(s)

Xiaolei Lv Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China College of Stomatology, Shanghai Jiao Tong University, Shanghai, China National Center for Stomatology, Shanghai, China National Clinical Research Center for Oral Diseases, Shanghai, China Shanghai Key Laboratory of Stomatology, Shanghai, China Shanghai Research Institute of Stomatology, Shanghai, China
Xiaomeng Zhang Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China College of Stomatology, Shanghai Jiao Tong University, Shanghai, China National Center for Stomatology, Shanghai, China National Clinical Research Center for Oral Diseases, Shanghai, China Shanghai Key Laboratory of Stomatology, Shanghai, China Shanghai Research Institute of Stomatology, Shanghai, China
Yuan Li Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China College of Stomatology, Shanghai Jiao Tong University, Shanghai, China National Center for Stomatology, Shanghai, China National Clinical Research Center for Oral Diseases, Shanghai, China Shanghai Key Laboratory of Stomatology, Shanghai, China Shanghai Research Institute of Stomatology, Shanghai, China
Xinxin Ding Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China College of Stomatology, Shanghai Jiao Tong University, Shanghai, China National Center for Stomatology, Shanghai, China National Clinical Research Center for Oral Diseases, Shanghai, China Shanghai Key Laboratory of Stomatology, Shanghai, China Shanghai Research Institute of Stomatology, Shanghai, China
Hongchang Lai Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China College of Stomatology, Shanghai Jiao Tong University, Shanghai, China National Center for Stomatology, Shanghai, China National Clinical Research Center for Oral Diseases, Shanghai, China Shanghai Key Laboratory of Stomatology, Shanghai, China Shanghai Research Institute of Stomatology, Shanghai, China
Junyu Shi Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China College of Stomatology, Shanghai Jiao Tong University, Shanghai, China National Center for Stomatology, Shanghai, China National Clinical Research Center for Oral Diseases, Shanghai, China Shanghai Key Laboratory of Stomatology, Shanghai, China Shanghai Research Institute of Stomatology, Shanghai, China

Collapse

Bhayana R, Biswas S, Cook TS, Kim W, Kitamura FC, Gichoya J, Yi PH. From Bench to Bedside With Large Language Models: AJR Expert Panel Narrative Review. AJR Am J Roentgenol 2024. [PMID: 38598354 DOI: 10.2214/ajr.24.30928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]

Alanezi F. Examining the role of ChatGPT in promoting health behaviors and lifestyle changes among cancer patients. Nutr Health 2024:2601060241244563. [PMID: 38567408 DOI: 10.1177/02601060241244563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]

Lehnen NC, Dorn F, Wiest IC, Zimmermann H, Radbruch A, Kather JN, Paech D. Data Extraction from Free-Text Reports on Mechanical Thrombectomy in Acute Ischemic Stroke Using ChatGPT: A Retrospective Analysis. Radiology 2024;311:e232741. [PMID: 38625006 DOI: 10.1148/radiol.232741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]

Abstract

Background Procedural details of mechanical thrombectomy in patients with ischemic stroke are important predictors of clinical outcome and are collected for prospective studies or national stroke registries. To date, these data are collected manually by human readers, a labor-intensive task that is prone to errors. Purpose To evaluate the use of the large language models (LLMs) GPT-4 and GPT-3.5 to extract data from neuroradiology reports on mechanical thrombectomy in patients with ischemic stroke. Materials and Methods This retrospective study included consecutive reports from patients with ischemic stroke who underwent mechanical thrombectomy between November 2022 and September 2023 at institution 1 and between September 2016 and December 2019 at institution 2. A set of 20 reports was used to optimize the prompt, and the ability of the LLMs to extract procedural data from the reports was compared using the McNemar test. Data manually extracted by an interventional neuroradiologist served as the reference standard. Results A total of 100 internal reports from 100 patients (mean age, 74.7 years ± 13.2 [SD]; 53 female) and 30 external reports from 30 patients (mean age, 72.7 years ± 13.5; 18 male) were included. All reports were successfully processed by GPT-4 and GPT-3.5. Of 2800 data entries, 2631 (94.0% [95% CI: 93.0, 94.8]; range per category, 61%-100%) data points were correctly extracted by GPT-4 without the need for further postprocessing. With 1788 of 2800 correct data entries, GPT-3.5 produced fewer correct data entries than did GPT-4 (63.9% [95% CI: 62.0, 65.6]; range per category, 14%-99%; P < .001). For the external reports, GPT-4 extracted 760 of 840 (90.5% [95% CI: 88.3, 92.4]) correct data entries, while GPT-3.5 extracted 539 of 840 (64.2% [95% CI: 60.8, 67.4]; P < .001). Conclusion Compared with GPT-3.5, GPT-4 more frequently extracted correct procedural data from free-text reports on mechanical thrombectomy performed in patients with ischemic stroke. © RSNA, 2024 Supplemental material is available for this article.

Collapse

Affiliation(s)

Nils C Lehnen From the Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127 Bonn, Germany (N.C.L., F.D., A.R., D.P.); Research Group Clinical Neuroimaging, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany (N.C.L., A.R.); Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany (I.C.W.); Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany (I.C.W., J.N.K.); Institute of Neuroradiology, University Hospital, LMU Munich, Munich, Germany (H.Z.); and Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Mass (D.P.)
Franziska Dorn From the Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127 Bonn, Germany (N.C.L., F.D., A.R., D.P.); Research Group Clinical Neuroimaging, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany (N.C.L., A.R.); Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany (I.C.W.); Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany (I.C.W., J.N.K.); Institute of Neuroradiology, University Hospital, LMU Munich, Munich, Germany (H.Z.); and Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Mass (D.P.)
Isabella C Wiest From the Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127 Bonn, Germany (N.C.L., F.D., A.R., D.P.); Research Group Clinical Neuroimaging, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany (N.C.L., A.R.); Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany (I.C.W.); Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany (I.C.W., J.N.K.); Institute of Neuroradiology, University Hospital, LMU Munich, Munich, Germany (H.Z.); and Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Mass (D.P.)
Hanna Zimmermann From the Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127 Bonn, Germany (N.C.L., F.D., A.R., D.P.); Research Group Clinical Neuroimaging, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany (N.C.L., A.R.); Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany (I.C.W.); Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany (I.C.W., J.N.K.); Institute of Neuroradiology, University Hospital, LMU Munich, Munich, Germany (H.Z.); and Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Mass (D.P.)
Alexander Radbruch From the Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127 Bonn, Germany (N.C.L., F.D., A.R., D.P.); Research Group Clinical Neuroimaging, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany (N.C.L., A.R.); Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany (I.C.W.); Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany (I.C.W., J.N.K.); Institute of Neuroradiology, University Hospital, LMU Munich, Munich, Germany (H.Z.); and Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Mass (D.P.)
Jakob Nikolas Kather From the Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127 Bonn, Germany (N.C.L., F.D., A.R., D.P.); Research Group Clinical Neuroimaging, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany (N.C.L., A.R.); Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany (I.C.W.); Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany (I.C.W., J.N.K.); Institute of Neuroradiology, University Hospital, LMU Munich, Munich, Germany (H.Z.); and Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Mass (D.P.)
Daniel Paech From the Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127 Bonn, Germany (N.C.L., F.D., A.R., D.P.); Research Group Clinical Neuroimaging, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany (N.C.L., A.R.); Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany (I.C.W.); Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany (I.C.W., J.N.K.); Institute of Neuroradiology, University Hospital, LMU Munich, Munich, Germany (H.Z.); and Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, Mass (D.P.)

Collapse

Kim H, Kim P, Joo I, Kim JH, Park CM, Yoon SH. ChatGPT Vision for Radiological Interpretation: An Investigation Using Medical School Radiology Examinations. Korean J Radiol 2024;25:403-406. [PMID: 38528699 PMCID: PMC10973733 DOI: 10.3348/kjr.2024.0017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 01/11/2024] [Accepted: 01/14/2024] [Indexed: 03/27/2024] Open

Cozzi A, Pinker K, Hidber A, Zhang T, Bonomo L, Lo Gullo R, Christianson B, Curti M, Rizzo S, Del Grande F, Mann RM, Schiaffino S, Panzer A. BI-RADS Category Assignments by GPT-3.5, GPT-4, and Google Bard: A Multilanguage Study. Radiology 2024;311:e232133. [PMID: 38687216 PMCID: PMC11070611 DOI: 10.1148/radiol.232133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 03/08/2024] [Accepted: 03/12/2024] [Indexed: 05/02/2024]

Abstract

Background The performance of publicly available large language models (LLMs) remains unclear for complex clinical tasks. Purpose To evaluate the agreement between human readers and LLMs for Breast Imaging Reporting and Data System (BI-RADS) categories assigned based on breast imaging reports written in three languages and to assess the impact of discordant category assignments on clinical management. Materials and Methods This retrospective study included reports for women who underwent MRI, mammography, and/or US for breast cancer screening or diagnostic purposes at three referral centers. Reports with findings categorized as BI-RADS 1-5 and written in Italian, English, or Dutch were collected between January 2000 and October 2023. Board-certified breast radiologists and the LLMs GPT-3.5 and GPT-4 (OpenAI) and Bard, now called Gemini (Google), assigned BI-RADS categories using only the findings described by the original radiologists. Agreement between human readers and LLMs for BI-RADS categories was assessed using the Gwet agreement coefficient (AC1 value). Frequencies were calculated for changes in BI-RADS category assignments that would affect clinical management (ie, BI-RADS 0 vs BI-RADS 1 or 2 vs BI-RADS 3 vs BI-RADS 4 or 5) and compared using the McNemar test. Results Across 2400 reports, agreement between the original and reviewing radiologists was almost perfect (AC1 = 0.91), while agreement between the original radiologists and GPT-4, GPT-3.5, and Bard was moderate (AC1 = 0.52, 0.48, and 0.42, respectively). Across human readers and LLMs, differences were observed in the frequency of BI-RADS category upgrades or downgrades that would result in changed clinical management (118 of 2400 [4.9%] for human readers, 611 of 2400 [25.5%] for Bard, 573 of 2400 [23.9%] for GPT-3.5, and 435 of 2400 [18.1%] for GPT-4; P < .001) and that would negatively impact clinical management (37 of 2400 [1.5%] for human readers, 435 of 2400 [18.1%] for Bard, 344 of 2400 [14.3%] for GPT-3.5, and 255 of 2400 [10.6%] for GPT-4; P < .001). Conclusion LLMs achieved moderate agreement with human reader-assigned BI-RADS categories across reports written in three languages but also yielded a high percentage of discordant BI-RADS categories that would negatively impact clinical management. © RSNA, 2024 Supplemental material is available for this article.

Collapse

Affiliation(s)

Andrea Cozzi
Katja Pinker
Andri Hidber From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Tianyu Zhang From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Luca Bonomo From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Roberto Lo Gullo From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Blake Christianson From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Marco Curti From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Stefania Rizzo From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Filippo Del Grande From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)
Ritse M. Mann
Simone Schiaffino
Ariane Panzer From the Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale, Via Tesserete 46, 6900 Lugano, Switzerland (A.C., L.B., M.C., S.R., F.D.G., S.S.); Breast Imaging Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY (K.P., R.L.G., B.C.); Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland (A.H., S.R., F.D.G., S.S.); Department of Radiology, Netherlands Cancer Institute, Amsterdam, the Netherlands (T.Z., R.M.M.); Department of Diagnostic Imaging, Radboud University Medical Center, Nijmegen, the Netherlands (T.Z., R.M.M.); and GROW Research Institute for Oncology and Reproduction, Maastricht University, Maastricht, the Netherlands (T.Z.)

Collapse

Peled T, Sela HY, Weiss A, Grisaru-Granovsky S, Agrawal S, Rottenstreich M. Evaluating the validity of ChatGPT responses on common obstetric issues: Potential clinical applications and implications. Int J Gynaecol Obstet 2024. [PMID: 38523565 DOI: 10.1002/ijgo.15501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 02/29/2024] [Accepted: 03/10/2024] [Indexed: 03/26/2024]

Abstract

OBJECTIVE

To evaluate the quality of ChatGPT responses to common issues in obstetrics and assess its ability to provide reliable responses to pregnant individuals. The study aimed to examine the responses based on expert opinions using predetermined criteria, including "accuracy," "completeness," and "safety."

METHODS

We curated 15 common and potentially clinically significant questions that pregnant women are asking. Two native English-speaking women were asked to reframe the questions in their own words, and we employed the ChatGPT language model to generate responses to the questions. To evaluate the accuracy, completeness, and safety of the ChatGPT's generated responses, we developed a questionnaire with a scale of 1 to 5 that obstetrics and gynecology experts from different countries were invited to rate accordingly. The ratings were analyzed to evaluate the average level of agreement and percentage of positive ratings (≥4) for each criterion.

RESULTS

Of the 42 experts invited, 20 responded to the questionnaire. The combined score for all responses yielded a mean rating of 4, with 75% of responses receiving a positive rating (≥4). While examining specific criteria, the ChatGPT responses were better for the accuracy criterion, with a mean rating of 4.2 and 80% of the questions received a positive rating. The responses scored less for the completeness criterion, with a mean rating of 3.8 and 46.7% of questions received a positive rating. For safety, the mean rating was 3.9 and 53.3% of questions received a positive rating. There was no response with an average negative rating below three.

CONCLUSION

This study demonstrates promising results regarding potential use of ChatGPT's in providing accurate responses to obstetric clinical questions posed by pregnant women. However, it is crucial to exercise caution when addressing inquiries concerning the safety of the fetus or the mother.

Collapse

Şenoymak MC, Erbatur NH, Şenoymak İ, Fırat SN. The Role of Artificial Intelligence in Endocrine Management: Assessing ChatGPT's Responses to Prolactinoma Queries. J Pers Med 2024;14:330. [PMID: 38672957 PMCID: PMC11051052 DOI: 10.3390/jpm14040330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 03/15/2024] [Accepted: 03/21/2024] [Indexed: 04/28/2024] Open

Chervonski E, Harish KB, Rockman CB, Sadek M, Teter KA, Jacobowitz GR, Berland TL, Lohr J, Moore C, Maldonado TS. Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients. Vascular 2024:17085381241240550. [PMID: 38500300 DOI: 10.1177/17085381241240550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]

Abstract

OBJECTIVES

Generative artificial intelligence (AI) has emerged as a promising tool to engage with patients. The objective of this study was to assess the quality of AI responses to common patient questions regarding vascular surgery disease processes.

METHODS

OpenAI's ChatGPT-3.5 and Google Bard were queried with 24 mock patient questions spanning seven vascular surgery disease domains. Six experienced vascular surgery faculty at a tertiary academic center independently graded AI responses on their accuracy (rated 1-4 from completely inaccurate to completely accurate), completeness (rated 1-4 from totally incomplete to totally complete), and appropriateness (binary). Responses were also evaluated with three readability scales.

RESULTS

ChatGPT responses were rated, on average, more accurate than Bard responses (3.08 ± 0.33 vs 2.82 ± 0.40, p < .01). ChatGPT responses were scored, on average, more complete than Bard responses (2.98 ± 0.34 vs 2.62 ± 0.36, p < .01). Most ChatGPT responses (75.0%, n = 18) and almost half of Bard responses (45.8%, n = 11) were unanimously deemed appropriate. Almost one-third of Bard responses (29.2%, n = 7) were deemed inappropriate by at least two reviewers (29.2%), and two Bard responses (8.4%) were considered inappropriate by the majority. The mean Flesch Reading Ease, Flesch-Kincaid Grade Level, and Gunning Fog Index of ChatGPT responses were 29.4 ± 10.8, 14.5 ± 2.2, and 17.7 ± 3.1, respectively, indicating that responses were readable with a post-secondary education. Bard's mean readability scores were 58.9 ± 10.5, 8.2 ± 1.7, and 11.0 ± 2.0, respectively, indicating that responses were readable with a high-school education (p < .0001 for three metrics). ChatGPT's mean response length (332 ± 79 words) was higher than Bard's mean response length (183 ± 53 words, p < .001). There was no difference in the accuracy, completeness, readability, or response length of ChatGPT or Bard between disease domains (p > .05 for all analyses).

CONCLUSIONS

AI offers a novel means of educating patients that avoids the inundation of information from "Dr Google" and the time barriers of physician-patient encounters. ChatGPT provides largely valid, though imperfect, responses to myriad patient questions at the expense of readability. While Bard responses are more readable and concise, their quality is poorer. Further research is warranted to better understand failure points for large language models in vascular surgery patient education.

Collapse

Carlà MM, Gambini G, Baldascino A, Giannuzzi F, Boselli F, Crincoli E, D'Onofrio NC, Rizzo S. Exploring AI-chatbots' capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases. Br J Ophthalmol 2024:bjo-2023-325143. [PMID: 38448201 DOI: 10.1136/bjo-2023-325143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 02/16/2024] [Indexed: 03/08/2024]

Wu RC, Li DX, Feng DC. Re: Michael Eppler, Conner Ganjavi, Lorenzo Storino Ramacciotti, et al. Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology. Eur Urol. 2024;85:146-53. Eur Urol 2024;85:e87-e88. [PMID: 38151444 DOI: 10.1016/j.eururo.2023.11.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 11/23/2023] [Indexed: 12/29/2023]

Wu SH, Tong WJ, Li MD, Hu HT, Lu XZ, Huang ZR, Lin XX, Lu RF, Lu MD, Chen LD, Wang W. Collaborative Enhancement of Consistency and Accuracy in US Diagnosis of Thyroid Nodules Using Large Language Models. Radiology 2024;310:e232255. [PMID: 38470237 DOI: 10.1148/radiol.232255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]

Abstract

Background Large language models (LLMs) hold substantial promise for medical imaging interpretation. However, there is a lack of studies on their feasibility in handling reasoning questions associated with medical diagnosis. Purpose To investigate the viability of leveraging three publicly available LLMs to enhance consistency and diagnostic accuracy in medical imaging based on standardized reporting, with pathology as the reference standard. Materials and Methods US images of thyroid nodules with pathologic results were retrospectively collected from a tertiary referral hospital between July 2022 and December 2022 and used to evaluate malignancy diagnoses generated by three LLMs-OpenAI's ChatGPT 3.5, ChatGPT 4.0, and Google's Bard. Inter- and intra-LLM agreement of diagnosis were evaluated. Then, diagnostic performance, including accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC), was evaluated and compared for the LLMs and three interactive approaches: human reader combined with LLMs, image-to-text model combined with LLMs, and an end-to-end convolutional neural network model. Results A total of 1161 US images of thyroid nodules (498 benign, 663 malignant) from 725 patients (mean age, 42.2 years ± 14.1 [SD]; 516 women) were evaluated. ChatGPT 4.0 and Bard displayed substantial to almost perfect intra-LLM agreement (κ range, 0.65-0.86 [95% CI: 0.64, 0.86]), while ChatGPT 3.5 showed fair to substantial agreement (κ range, 0.36-0.68 [95% CI: 0.36, 0.68]). ChatGPT 4.0 had an accuracy of 78%-86% (95% CI: 76%, 88%) and sensitivity of 86%-95% (95% CI: 83%, 96%), compared with 74%-86% (95% CI: 71%, 88%) and 74%-91% (95% CI: 71%, 93%), respectively, for Bard. Moreover, with ChatGPT 4.0, the image-to-text-LLM strategy exhibited an AUC (0.83 [95% CI: 0.80, 0.85]) and accuracy (84% [95% CI: 82%, 86%]) comparable to those of the human-LLM interaction strategy with two senior readers and one junior reader and exceeding those of the human-LLM interaction strategy with one junior reader. Conclusion LLMs, particularly integrated with image-to-text approaches, show potential in enhancing diagnostic medical imaging. ChatGPT 4.0 was optimal for consistency and diagnostic accuracy when compared with Bard and ChatGPT 3.5. © RSNA, 2024 Supplemental material is available for this article.

Collapse

Affiliation(s)

Shao-Hong Wu From the Department of Medical Ultrasonics, Ultrasomics Artificial Intelligence X-Laboratory, Institute of Diagnostic and Interventional Ultrasound, First Affiliated Hospital of Sun Yat-sen University, No. 58 Zhongshan Rd 2, Guangzhou 510080, People's Republic of China (S.H.W., W.J.T., M.D. Li, H.T.H., Z.R.H., X.X.L., R.F.L., M.D. Lu, L.D.C., W.W.); and Department of Traditional Chinese Medicine, First Affiliated Hospital of Sun Yat-sen University, Guangzhou, People's Republic of China (X.Z.L.)
Wen-Juan Tong From the Department of Medical Ultrasonics, Ultrasomics Artificial Intelligence X-Laboratory, Institute of Diagnostic and Interventional Ultrasound, First Affiliated Hospital of Sun Yat-sen University, No. 58 Zhongshan Rd 2, Guangzhou 510080, People's Republic of China (S.H.W., W.J.T., M.D. Li, H.T.H., Z.R.H., X.X.L., R.F.L., M.D. Lu, L.D.C., W.W.); and Department of Traditional Chinese Medicine, First Affiliated Hospital of Sun Yat-sen University, Guangzhou, People's Republic of China (X.Z.L.)
Ming-De Li From the Department of Medical Ultrasonics, Ultrasomics Artificial Intelligence X-Laboratory, Institute of Diagnostic and Interventional Ultrasound, First Affiliated Hospital of Sun Yat-sen University, No. 58 Zhongshan Rd 2, Guangzhou 510080, People's Republic of China (S.H.W., W.J.T., M.D. Li, H.T.H., Z.R.H., X.X.L., R.F.L., M.D. Lu, L.D.C., W.W.); and Department of Traditional Chinese Medicine, First Affiliated Hospital of Sun Yat-sen University, Guangzhou, People's Republic of China (X.Z.L.)
Hang-Tong Hu From the Department of Medical Ultrasonics, Ultrasomics Artificial Intelligence X-Laboratory, Institute of Diagnostic and Interventional Ultrasound, First Affiliated Hospital of Sun Yat-sen University, No. 58 Zhongshan Rd 2, Guangzhou 510080, People's Republic of China (S.H.W., W.J.T., M.D. Li, H.T.H., Z.R.H., X.X.L., R.F.L., M.D. Lu, L.D.C., W.W.); and Department of Traditional Chinese Medicine, First Affiliated Hospital of Sun Yat-sen University, Guangzhou, People's Republic of China (X.Z.L.)
Xiao-Zhou Lu From the Department of Medical Ultrasonics, Ultrasomics Artificial Intelligence X-Laboratory, Institute of Diagnostic and Interventional Ultrasound, First Affiliated Hospital of Sun Yat-sen University, No. 58 Zhongshan Rd 2, Guangzhou 510080, People's Republic of China (S.H.W., W.J.T., M.D. Li, H.T.H., Z.R.H., X.X.L., R.F.L., M.D. Lu, L.D.C., W.W.); and Department of Traditional Chinese Medicine, First Affiliated Hospital of Sun Yat-sen University, Guangzhou, People's Republic of China (X.Z.L.)
Ze-Rong Huang From the Department of Medical Ultrasonics, Ultrasomics Artificial Intelligence X-Laboratory, Institute of Diagnostic and Interventional Ultrasound, First Affiliated Hospital of Sun Yat-sen University, No. 58 Zhongshan Rd 2, Guangzhou 510080, People's Republic of China (S.H.W., W.J.T., M.D. Li, H.T.H., Z.R.H., X.X.L., R.F.L., M.D. Lu, L.D.C., W.W.); and Department of Traditional Chinese Medicine, First Affiliated Hospital of Sun Yat-sen University, Guangzhou, People's Republic of China (X.Z.L.)
Xin-Xin Lin From the Department of Medical Ultrasonics, Ultrasomics Artificial Intelligence X-Laboratory, Institute of Diagnostic and Interventional Ultrasound, First Affiliated Hospital of Sun Yat-sen University, No. 58 Zhongshan Rd 2, Guangzhou 510080, People's Republic of China (S.H.W., W.J.T., M.D. Li, H.T.H., Z.R.H., X.X.L., R.F.L., M.D. Lu, L.D.C., W.W.); and Department of Traditional Chinese Medicine, First Affiliated Hospital of Sun Yat-sen University, Guangzhou, People's Republic of China (X.Z.L.)
Rui-Fang Lu From the Department of Medical Ultrasonics, Ultrasomics Artificial Intelligence X-Laboratory, Institute of Diagnostic and Interventional Ultrasound, First Affiliated Hospital of Sun Yat-sen University, No. 58 Zhongshan Rd 2, Guangzhou 510080, People's Republic of China (S.H.W., W.J.T., M.D. Li, H.T.H., Z.R.H., X.X.L., R.F.L., M.D. Lu, L.D.C., W.W.); and Department of Traditional Chinese Medicine, First Affiliated Hospital of Sun Yat-sen University, Guangzhou, People's Republic of China (X.Z.L.)
Ming-De Lu From the Department of Medical Ultrasonics, Ultrasomics Artificial Intelligence X-Laboratory, Institute of Diagnostic and Interventional Ultrasound, First Affiliated Hospital of Sun Yat-sen University, No. 58 Zhongshan Rd 2, Guangzhou 510080, People's Republic of China (S.H.W., W.J.T., M.D. Li, H.T.H., Z.R.H., X.X.L., R.F.L., M.D. Lu, L.D.C., W.W.); and Department of Traditional Chinese Medicine, First Affiliated Hospital of Sun Yat-sen University, Guangzhou, People's Republic of China (X.Z.L.)
Li-Da Chen From the Department of Medical Ultrasonics, Ultrasomics Artificial Intelligence X-Laboratory, Institute of Diagnostic and Interventional Ultrasound, First Affiliated Hospital of Sun Yat-sen University, No. 58 Zhongshan Rd 2, Guangzhou 510080, People's Republic of China (S.H.W., W.J.T., M.D. Li, H.T.H., Z.R.H., X.X.L., R.F.L., M.D. Lu, L.D.C., W.W.); and Department of Traditional Chinese Medicine, First Affiliated Hospital of Sun Yat-sen University, Guangzhou, People's Republic of China (X.Z.L.)
Wei Wang From the Department of Medical Ultrasonics, Ultrasomics Artificial Intelligence X-Laboratory, Institute of Diagnostic and Interventional Ultrasound, First Affiliated Hospital of Sun Yat-sen University, No. 58 Zhongshan Rd 2, Guangzhou 510080, People's Republic of China (S.H.W., W.J.T., M.D. Li, H.T.H., Z.R.H., X.X.L., R.F.L., M.D. Lu, L.D.C., W.W.); and Department of Traditional Chinese Medicine, First Affiliated Hospital of Sun Yat-sen University, Guangzhou, People's Republic of China (X.Z.L.)

Collapse

Coskun BN, Yagiz B, Ocakoglu G, Dalkilic E, Pehlivan Y. Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use. Rheumatol Int 2024;44:509-515. [PMID: 37747564 DOI: 10.1007/s00296-023-05473-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 09/14/2023] [Indexed: 09/26/2023]

Abi-Rafeh J, Mroueh VJ, Bassiri-Tehrani B, Marks J, Kazan R, Nahai F. Complications Following Body Contouring: Performance Validation of Bard, a Novel AI Large Language Model, in Triaging and Managing Postoperative Patient Concerns. Aesthetic Plast Surg 2024;48:953-976. [PMID: 38273152 DOI: 10.1007/s00266-023-03819-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 12/14/2023] [Indexed: 01/27/2024]

Abstract

INTRODUCTION

Large language models (LLM) have revolutionized the way humans interact with artificial intelligence (AI) technology, with marked potential for applications in esthetic surgery. The present study evaluates the performance of Bard, a novel LLM, in identifying and managing postoperative patient concerns for complications following body contouring surgery.

METHODS

The American Society of Plastic Surgeons' website was queried to identify and simulate all potential postoperative complications following body contouring across different acuities and severity. Bard's accuracy was assessed in providing a differential diagnosis, soliciting a history, suggesting a most-likely diagnosis, appropriate disposition, treatments/interventions to begin from home, and red-flag signs/symptoms indicating deterioration, or requiring urgent emergency department (ED) presentation.

RESULTS

Twenty-two simulated body contouring complications were examined. Overall, Bard demonstrated a 59% accuracy in listing relevant diagnoses on its differentials, with a 52% incidence of incorrect or misleading diagnoses. Following history-taking, Bard demonstrated an overall accuracy of 44% in identifying the most-likely diagnosis, and a 55% accuracy in suggesting the indicated medical dispositions. Helpful treatments/interventions to begin from home were suggested with a 40% accuracy, whereas red-flag signs/symptoms, indicating deterioration, were shared with a 48% accuracy. A detailed analysis of performance, stratified according to latency of postoperative presentation (<48hours, 48hours-1month, or >1month postoperatively), and according to acuity and indicated medical disposition, is presented herein.

CONCLUSIONS

Despite promising potential of LLMs and AI in healthcare-related applications, Bard's performance in the present study significantly falls short of accepted clinical standards, thus indicating a need for further research and development prior to adoption.

LEVEL OF EVIDENCE IV

This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .

Collapse

Rahsepar AA. Large Language Models for Enhancing Radiology Report Impressions: Improve Readability While Decreasing Burnout. Radiology 2024;310:e240498. [PMID: 38530179 DOI: 10.1148/radiol.240498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]

Bera K, O'Connor G, Jiang S, Tirumani SH, Ramaiya N. Analysis of ChatGPT publications in radiology: Literature so far. Curr Probl Diagn Radiol 2024;53:215-225. [PMID: 37891083 DOI: 10.1067/j.cpradiol.2023.10.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 10/18/2023] [Indexed: 10/29/2023]

Abstract

OBJECTIVE

To perform a detailed qualitative and quantitative analysis of the published literature on ChatGPT and radiology in the nine months since its public release, detailing the scope of the work in the short timeframe.

METHODS

A systematic literature search was carried out of the MEDLINE, EMBASE databases through August 15, 2023 for articles that were focused on ChatGPT and imaging/radiology. Articles were classified into original research and reviews/perspectives. Quantitative analysis was carried out by two experienced radiologists using objective scoring systems for evaluating original and non-original research.

RESULTS

51 articles were published involving ChatGPT and radiology/imaging dating from 26 Jan 2023 to the last article published on 14 Aug 2023. 23 articles were original research while the rest included reviews/perspectives or brief communications. For quantitative analysis scored by two readers, we included 23 original research and 17 non-original research articles (after excluding 11 letters as responses to previous articles). Mean score for original research was 3.20 out of 5 (across five questions), while mean score for non-original research was 1.17 out of 2 (across six questions). Mean score grading performance of ChatGPT in original research was 3.20 out of five (across two questions).

DISCUSSION

While it is early days for ChatGPT and its impact in radiology, there has already been a plethora of articles talking about the multifaceted nature of the tool and how it can impact every aspect of radiology from patient education, pre-authorization, protocol selection, generating differentials, to structuring radiology reports. Most articles show impressive performance of ChatGPT which can only improve with more research and improvements in the tool itself. There have also been several articles which have highlighted the limitations of ChatGPT in its current iteration, which will allow radiologists and researchers to improve these areas.

Collapse

Doddi S, Hibshman T, Salichs O, Bera K, Tippareddy C, Ramaiya N, Tirumani SH. Assessing appropriate responses to ACR urologic imaging scenarios using ChatGPT and Bard. Curr Probl Diagn Radiol 2024;53:226-229. [PMID: 37891086 DOI: 10.1067/j.cpradiol.2023.10.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 10/29/2023]

Hu Y, Hu Z, Liu W, Gao A, Wen S, Liu S, Lin Z. Exploring the potential of ChatGPT as an adjunct for generating diagnosis based on chief complaint and cone beam CT radiologic findings. BMC Med Inform Decis Mak 2024;24:55. [PMID: 38374067 PMCID: PMC10875853 DOI: 10.1186/s12911-024-02445-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 01/28/2024] [Indexed: 02/21/2024] Open

Peng W, Feng Y, Yao C, Zhang S, Zhuo H, Qiu T, Zhang Y, Tang J, Gu Y, Sun Y. Evaluating AI in medicine: a comparative analysis of expert and ChatGPT responses to colorectal cancer questions. Sci Rep 2024;14:2840. [PMID: 38310152 PMCID: PMC10838275 DOI: 10.1038/s41598-024-52853-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 01/24/2024] [Indexed: 02/05/2024] Open

Affiliation(s)

Wen Peng Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China
Yifei Feng Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China
Cui Yao Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China
Sheng Zhang Department of Radiotherapy, The First Affiliated Hospital with Nanjing Medical University, Nanjing, People's Republic of China
Han Zhuo Department of Intervention, The First Affiliated Hospital with Nanjing Medical University, Nanjing, People's Republic of China
Tianzhu Qiu Department of Oncology, The First Affiliated Hospital with Nanjing Medical University, Nanjing, People's Republic of China
Yi Zhang Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China
Junwei Tang Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China. The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China.
Yanhong Gu Department of Oncology, The First Affiliated Hospital with Nanjing Medical University, Nanjing, People's Republic of China.
Yueming Sun Department of General Surgery, The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, Jiangsu, People's Republic of China. The First School of Clinical Medicine, Nanjing Medical University, Nanjing, China.

Collapse

Toyama Y, Harigai A, Abe M, Nagano M, Kawabata M, Seki Y, Takase K. Performance evaluation of ChatGPT, GPT-4, and Bard on the official board examination of the Japan Radiology Society. Jpn J Radiol 2024;42:201-207. [PMID: 37792149 PMCID: PMC10811006 DOI: 10.1007/s11604-023-01491-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 09/12/2023] [Indexed: 10/05/2023]

Abstract

PURPOSE

Herein, we assessed the accuracy of large language models (LLMs) in generating responses to questions in clinical radiology practice. We compared the performance of ChatGPT, GPT-4, and Google Bard using questions from the Japan Radiology Board Examination (JRBE).

MATERIALS AND METHODS

In total, 103 questions from the JRBE 2022 were used with permission from the Japan Radiological Society. These questions were categorized by pattern, required level of thinking, and topic. McNemar's test was used to compare the proportion of correct responses between the LLMs. Fisher's exact test was used to assess the performance of GPT-4 for each topic category.

RESULTS

ChatGPT, GPT-4, and Google Bard correctly answered 40.8% (42 of 103), 65.0% (67 of 103), and 38.8% (40 of 103) of the questions, respectively. GPT-4 significantly outperformed ChatGPT by 24.2% (p < 0.001) and Google Bard by 26.2% (p < 0.001). In the categorical analysis by level of thinking, GPT-4 correctly answered 79.7% of the lower-order questions, which was significantly higher than ChatGPT or Google Bard (p < 0.001). The categorical analysis by question pattern revealed GPT-4's superiority over ChatGPT (67.4% vs. 46.5%, p = 0.004) and Google Bard (39.5%, p < 0.001) in the single-answer questions. The categorical analysis by topic revealed that GPT-4 outperformed ChatGPT (40%, p = 0.013) and Google Bard (26.7%, p = 0.004). No significant differences were observed between the LLMs in the categories not mentioned above. The performance of GPT-4 was significantly better in nuclear medicine (93.3%) than in diagnostic radiology (55.8%; p < 0.001). GPT-4 also performed better on lower-order questions than on higher-order questions (79.7% vs. 45.5%, p < 0.001).

CONCLUSION

ChatGPTplus based on GPT-4 scored 65% when answering Japanese questions from the JRBE, outperforming ChatGPT and Google Bard. This highlights the potential of using LLMs to address advanced clinical questions in the field of radiology in Japan.

Collapse

Kim S, Lee CK, Kim SS. Large Language Models: A Guide for Radiologists. Korean J Radiol 2024;25:126-133. [PMID: 38288895 PMCID: PMC10831297 DOI: 10.3348/kjr.2023.0997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 11/27/2023] [Accepted: 12/18/2023] [Indexed: 02/01/2024] Open

Patil NS, Huang RS, Caterine S, Yao J, Larocque N, van der Pol CB, Stubbs E. Artificial Intelligence Chatbots' Understanding of the Risks and Benefits of Computed Tomography and Magnetic Resonance Imaging Scenarios. Can Assoc Radiol J 2024:8465371231220561. [PMID: 38183235 DOI: 10.1177/08465371231220561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2024] Open

Abstract

PURPOSE

Patients may seek online information to better understand medical imaging procedures. The purpose of this study was to assess the accuracy of information provided by 2 popular artificial intelligence (AI) chatbots pertaining to common imaging scenarios' risks, benefits, and alternatives.

METHODS

Fourteen imaging-related scenarios pertaining to computed tomography (CT) or magnetic resonance imaging (MRI) were used. Factors including the use of intravenous contrast, the presence of renal disease, and whether the patient was pregnant were included in the analysis. For each scenario, 3 prompts for outlining the (1) risks, (2) benefits, and (3) alternative imaging choices or potential implications of not using contrast were inputted into ChatGPT and Bard. A grading rubric and a 5-point Likert scale was used by 2 independent reviewers to grade responses. Prompt variability and chatbot context dependency were also assessed.

RESULTS

ChatGPT's performance was superior to Bard's in accurately responding to prompts per Likert grading (4.36 ± 0.63 vs 3.25 ± 1.03 seconds, P < .0001). There was substantial agreement between independent reviewer grading for ChatGPT (κ = 0.621) and Bard (κ = 0.684). Response text length was not statistically different between ChatGPT and Bard (2087 ± 256 characters vs 2162 ± 369 characters, P = .24). Response time was longer for ChatGPT (34 ± 2 vs 8 ± 1 seconds, P < .0001).

CONCLUSIONS

ChatGPT performed superior to Bard at outlining risks, benefits, and alternatives to common imaging scenarios. Generally, context dependency and prompt variability did not change chatbot response content. Due to the lack of detailed scientific reasoning and inability to provide patient-specific information, both AI chatbots have limitations as a patient information resource.

Collapse

Mediboina A, Badam RK, Chodavarapu S. Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI. Cureus 2024;16:e51544. [PMID: 38318564 PMCID: PMC10840059 DOI: 10.7759/cureus.51544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/01/2024] [Indexed: 02/07/2024] Open

Abstract

Background and objective ChatGPT and Google Bard AI are widely used conversational chatbots, even in healthcare. While they have several strengths, they can generate seemingly correct but erroneous responses, warranting caution in medical contexts. In an era where access to abortion care is diminishing, patients may increasingly rely on online resources and AI-driven language models for information on medication abortions. In light of this, this study aimed to compare the accuracy and comprehensiveness of responses generated by ChatGPT 3.5 and Google Bard AI to medical queries about medication abortions. Methods Fourteen open-ended questions about medication abortion were formulated based on the Frequently Asked Questions (FAQs) from the National Abortion Federation (NAF) and the Reproductive Health Access Project (RHAP) websites. These questions were answered using ChatGPT version 3.5 and Google Bard AI on October 7, 2023. The accuracy of the responses was analyzed by cross-referencing the generated answers against the information provided by NAF and RHAP. Any discrepancies were further verified against the guidelines from the American Congress of Obstetricians and Gynecologists (ACOG). A rating scale used by Johnson et al. was employed for assessment, utilizing a 6-point Likert scale [ranging from 1 (completely incorrect) to 6 (correct)] to evaluate accuracy and a 3-point scale [ranging from 1 (incomplete) to 3 (comprehensive)] to assess completeness. Questions that did not yield answers were assigned a score of 0 and omitted from the correlation analysis. Data analysis and visualization were done using R Software version 4.3.1. Statistical significance was determined by employing Spearman's R and Mann-Whitney U tests. Results All questions were entered sequentially into both chatbots by the same author. On the initial attempt, ChatGPT successfully generated relevant responses for all questions, while Google Bard AI failed to provide answers for five questions. Repeating the same question in Google Bard AI yielded an answer for one; two were answered with different phrasing; and two remained unanswered despite rephrasing. ChatGPT showed a median accuracy score of 5 (mean: 5.26, SD: 0.73) and a median completeness score of 3 (mean: 2.57, SD: 0.51). It showed the highest accuracy score in six responses and the highest completeness score in eight responses. In contrast, Google Bard AI had a median accuracy score of 5 (mean: 4.5, SD: 2.03) and a median completeness score of 2 (mean: 2.14, SD: 1.03). It achieved the highest accuracy score in five responses and the highest completeness score in six responses. Spearman's correlation coefficient revealed no correlation between accuracy and completeness for ChatGPT (rs = -0.46771, p = 0.09171). However, Google Bard AI showed a marginally significant correlation (rs = 0.5738, p = 0.05108). Mann-Whitney U test indicated no statistically significant differences between ChatGPT and Google Bard AI concerning accuracy (U = 82, p>0.05) or completeness (U = 78, p>0.05). Conclusion While both chatbots showed similar levels of accuracy, minor errors were noted, pertaining to finer aspects that demand specialized knowledge of abortion care. This could explain the lack of a significant correlation between accuracy and completeness. Ultimately, AI-driven language models have the potential to provide information on medication abortions, but there is a need for continual refinement and oversight.

Collapse

Park SH. Noteworthy Developments in the Korean Journal of Radiology in 2023 and for 2024. Korean J Radiol 2024;25:1-5. [PMID: 38184762 PMCID: PMC10788598 DOI: 10.3348/kjr.2023.1172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 11/22/2023] [Indexed: 01/08/2024] Open

Bhayana R. Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology 2024;310:e232756. [PMID: 38226883 DOI: 10.1148/radiol.232756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]

Gan RK, Ogbodo JC, Wee YZ, Gan AZ, González PA. Performance of Google bard and ChatGPT in mass casualty incidents triage. Am J Emerg Med 2024;75:72-78. [PMID: 37967485 DOI: 10.1016/j.ajem.2023.10.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 10/03/2023] [Accepted: 10/24/2023] [Indexed: 11/17/2023] Open

Indran IR, Paranthaman P, Gupta N, Mustafa N. Twelve tips to leverage AI for efficient and effective medical question generation: A guide for educators using Chat GPT. MEDICAL TEACHER 2023:1-6. [PMID: 38146711 DOI: 10.1080/0142159x.2023.2294703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 12/11/2023] [Indexed: 12/27/2023]

Huo B, Cacciamani GE, Collins GS, McKechnie T, Lee Y, Guyatt G. Reporting standards for the use of large language model-linked chatbots for health advice. Nat Med 2023;29:2988. [PMID: 37957381 DOI: 10.1038/s41591-023-02656-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]

Alanzi TM, Alzahrani W, Albalawi NS, Allahyani T, Alghamdi A, Al-Zahrani H, Almutairi A, Alzahrani H, Almulhem L, Alanzi N, Al Moarfeg A, Farhah N. Public Awareness of Obesity as a Risk Factor for Cancer in Central Saudi Arabia: Feasibility of ChatGPT as an Educational Intervention. Cureus 2023;15:e50781. [PMID: 38239542 PMCID: PMC10795720 DOI: 10.7759/cureus.50781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/17/2023] [Indexed: 01/22/2024] Open

Abstract

BACKGROUND

While the link between obesity and chronic diseases such as diabetes and cardiovascular disorders is well-documented, there is a growing body of evidence connecting obesity with an increased risk of cancer. However, public awareness of this connection remains limited.

STUDY PURPOSE

To analyze public awareness of overweight/obesity as a risk factor for cancer and analyze public perceptions on the feasibility of ChatGPT, an artificial intelligence-based conversational agent, as an educational intervention tool.

METHODS

A mixed-methods approach including deductive quantitative cross-sectional approach to draw precise conclusions based on empirical evidence on public awareness of the link between obesity and cancer; and inductive qualitative approach to interpret public perceptions on using ChatGPT for creating awareness of obesity, cancer and its risk factors was used in this study. Participants included adult residents in Saudi Arabia. A total of 486 individuals and 21 individuals were included in the survey and semi-structured interviews respectively.

RESULTS

About 65% of the participants are not completely aware of cancer and its risk factors. Significant differences in awareness were observed concerning age groups (p < .0001), socio-economic status (p = .041), and regional distribution (p = .0351). A total of 10 themes were analyzed from the interview data, which included four positive factors (accessibility, personalization, cost-effectiveness, anonymity and privacy, multi-language support) and five negative factors (information inaccuracy, lack of emotional intelligence, dependency and overreliance, data privacy and security, and inability to provide physical support or diagnosis).

CONCLUSION

This study has underscored the potential of leveraging ChatGPT as a valuable public awareness tool for cancer in Saudi Arabia.

Collapse

Zhang C, Xu J, Tang R, Yang J, Wang W, Yu X, Shi S. Novel research and future prospects of artificial intelligence in cancer diagnosis and treatment. J Hematol Oncol 2023;16:114. [PMID: 38012673 PMCID: PMC10680201 DOI: 10.1186/s13045-023-01514-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 11/20/2023] [Indexed: 11/29/2023] Open

Affiliation(s)

Chaoyi Zhang Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
Jin Xu Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
Rong Tang Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
Jianhui Yang Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
Wei Wang Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
Xianjun Yu Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China. Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China. Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China. Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China.
Si Shi Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China. Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China. Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China. Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China.

Collapse

Iannantuono GM, Bracken-Clarke D, Karzai F, Choo-Wosoba H, Gulley JL, Floudas CS. Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.31.23297825. [PMID: 38076813 PMCID: PMC10705618 DOI: 10.1101/2023.10.31.23297825] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]

Hu JM, Liu FC, Chu CM, Chang YT. Health Care Trainees' and Professionals' Perceptions of ChatGPT in Improving Medical Knowledge Training: Rapid Survey Study. J Med Internet Res 2023;25:e49385. [PMID: 37851495 PMCID: PMC10620632 DOI: 10.2196/49385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/13/2023] [Accepted: 09/29/2023] [Indexed: 10/19/2023] Open

Abstract

BACKGROUND

ChatGPT is a powerful pretrained large language model. It has both demonstrated potential and raised concerns related to knowledge translation and knowledge transfer. To apply and improve knowledge transfer in the real world, it is essential to assess the perceptions and acceptance of the users of ChatGPT-assisted training.

OBJECTIVE

We aimed to investigate the perceptions of health care trainees and professionals on ChatGPT-assisted training, using biomedical informatics as an example.

METHODS

We used purposeful sampling to include all health care undergraduate trainees and graduate professionals (n=195) from January to May 2023 in the School of Public Health at the National Defense Medical Center in Taiwan. Subjects were asked to watch a 2-minute video introducing 5 scenarios about ChatGPT-assisted training in biomedical informatics and then answer a self-designed online (web- and mobile-based) questionnaire according to the Kirkpatrick model. The survey responses were used to develop 4 constructs: "perceived knowledge acquisition," "perceived training motivation," "perceived training satisfaction," and "perceived training effectiveness." The study used structural equation modeling (SEM) to evaluate and test the structural model and hypotheses.

RESULTS

The online questionnaire response rate was 152 of 195 (78%); 88 of 152 participants (58%) were undergraduate trainees and 90 of 152 participants (59%) were women. The ages ranged from 18 to 53 years (mean 23.3, SD 6.0 years). There was no statistical difference in perceptions of training evaluation between men and women. Most participants were enthusiastic about the ChatGPT-assisted training, while the graduate professionals were more enthusiastic than undergraduate trainees. Nevertheless, some concerns were raised about potential cheating on training assessment. The average scores for knowledge acquisition, training motivation, training satisfaction, and training effectiveness were 3.84 (SD 0.80), 3.76 (SD 0.93), 3.75 (SD 0.87), and 3.72 (SD 0.91), respectively (Likert scale 1-5: strongly disagree to strongly agree). Knowledge acquisition had the highest score and training effectiveness the lowest. In the SEM results, training effectiveness was influenced predominantly by knowledge acquisition and partially met the hypotheses in the research framework. Knowledge acquisition had a direct effect on training effectiveness, training satisfaction, and training motivation, with β coefficients of .80, .87, and .97, respectively (all P<.001).

CONCLUSIONS

Most health care trainees and professionals perceived ChatGPT-assisted training as an aid in knowledge transfer. However, to improve training effectiveness, it should be combined with empirical experts for proper guidance and dual interaction. In a future study, we recommend using a larger sample size for evaluation of internet-connected large language models in medical knowledge transfer.

Collapse

Talyshinskii A, Naik N, Hameed BMZ, Zhanbyrbekuly U, Khairli G, Guliev B, Juilebø-Jones P, Tzelves L, Somani BK. Expanding horizons and navigating challenges for enhanced clinical workflows: ChatGPT in urology. Front Surg 2023;10:1257191. [PMID: 37744723 PMCID: PMC10512827 DOI: 10.3389/fsurg.2023.1257191] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open

Iannantuono GM, Bracken-Clarke D, Floudas CS, Roselli M, Gulley JL, Karzai F. Applications of large language models in cancer care: current evidence and future perspectives. Front Oncol 2023;13:1268915. [PMID: 37731643 PMCID: PMC10507617 DOI: 10.3389/fonc.2023.1268915] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 08/21/2023] [Indexed: 09/22/2023] Open

Tippareddy C, Jiang S, Bera K, Ramaiya N. Radiology Reading Room for the Future: Harnessing the Power of Large Language Models Like ChatGPT. Curr Probl Diagn Radiol 2023:S0363-0188(23)00133-0. [PMID: 37758604 DOI: 10.1067/j.cpradiol.2023.08.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 08/28/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023]