1
|
Jo MH, Kim MJ, Oh HK, Choi MJ, Shin HR, Lee TG, Ahn HM, Kim DW, Kang SB. Communicative competence of generative artificial intelligence in responding to patient queries about colorectal cancer surgery. Int J Colorectal Dis 2024; 39:94. [PMID: 38902500 PMCID: PMC11189990 DOI: 10.1007/s00384-024-04670-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/13/2024] [Indexed: 06/22/2024]
Abstract
PURPOSE To examine the ability of generative artificial intelligence (GAI) to answer patients' questions regarding colorectal cancer (CRC). METHODS Ten clinically relevant questions about CRC were selected from top-rated hospitals' websites and patient surveys and presented to three GAI tools (Chatbot Generative Pre-Trained Transformer [GPT-4], Google Bard, and CLOVA X). Their responses were compared with answers from the CRC information book. Response evaluation was performed by two groups, each consisting of five healthcare professionals (HCP) and patients. Each question was scored on a 1-5 Likert scale based on four evaluation criteria (maximum score, 20 points/question). RESULTS In an analysis including only HCPs, the information book scored 11.8 ± 1.2, GPT-4 scored 13.5 ± 1.1, Google Bard scored 11.5 ± 0.7, and CLOVA X scored 12.2 ± 1.4 (P = 0.001). The score of GPT-4 was significantly higher than those of the information book (P = 0.020) and Google Bard (P = 0.001). In an analysis including only patients, the information book scored 14.1 ± 1.4, GPT-4 scored 15.2 ± 1.8, Google Bard scored 15.5 ± 1.8, and CLOVA X scored 14.4 ± 1.8, without significant differences (P = 0.234). When both groups of evaluators were included, the information book scored 13.0 ± 0.9, GPT-4 scored 14.4 ± 1.2, Google Bard scored 13.5 ± 1.0, and CLOVA X scored 13.3 ± 1.5 (P = 0.070). CONCLUSION The three GAIs demonstrated similar or better communicative competence than the information book regarding questions related to CRC surgery in Korean. If high-quality medical information provided by GAI is supervised properly by HCPs and published as an information book, it could be helpful for patients to obtain accurate information and make informed decisions.
Collapse
Affiliation(s)
- Min Hyeong Jo
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
| | - Min-Jun Kim
- Department of Surgery, Seoul National University College of Medicine, Seoul, South Korea
| | - Heung-Kwon Oh
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea.
- Department of Surgery, Seoul National University College of Medicine, Seoul, South Korea.
| | - Mi Jeong Choi
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
| | - Hye-Rim Shin
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
| | - Tae-Gyun Lee
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
| | - Hong-Min Ahn
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
| | - Duck-Woo Kim
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
- Department of Surgery, Seoul National University College of Medicine, Seoul, South Korea
| | - Sung-Bum Kang
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
- Department of Surgery, Seoul National University College of Medicine, Seoul, South Korea
| |
Collapse
|
2
|
Kooraki S, Hosseiny M, Jalili MH, Rahsepar AA, Imanzadeh A, Kim GH, Hassani C, Abtin F, Moriarty JM, Bedayat A. Evaluation of ChatGPT-Generated Educational Patient Pamphlets for Common Interventional Radiology Procedures. Acad Radiol 2024:S1076-6332(24)00307-6. [PMID: 38839458 DOI: 10.1016/j.acra.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/10/2024] [Accepted: 05/13/2024] [Indexed: 06/07/2024]
Abstract
RATIONALE AND OBJECTIVES This study aimed to evaluate the accuracy and reliability of educational patient pamphlets created by ChatGPT, a large language model, for common interventional radiology (IR) procedures. METHODS AND MATERIALS Twenty frequently performed IR procedures were selected, and five users were tasked to independently request ChatGPT to generate educational patient pamphlets for each procedure using identical commands. Subsequently, two independent radiologists assessed the content, quality, and accuracy of the pamphlets. The review focused on identifying potential errors, inaccuracies, the consistency of pamphlets. RESULTS In a thorough analysis of the education pamphlets, we identified shortcomings in 30% (30/100) of pamphlets, with a total of 34 specific inaccuracies, including missing information about sedation for the procedure (10/34), inaccuracies related to specific procedural-related complications (8/34). A key-word co-occurrence network showed consistent themes within each group of pamphlets, while a line-by-line comparison at the level of users and across different procedures showed statistically significant inconsistencies (P < 0.001). CONCLUSION ChatGPT-generated education pamphlets demonstrated potential clinical relevance and fairly consistent terminology; however, the pamphlets were not entirely accurate and exhibited some shortcomings and inter-user structural variabilities. To ensure patient safety, future improvements and refinements in large language models are warranted, while maintaining human supervision and expert validation.
Collapse
Affiliation(s)
- Soheil Kooraki
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA.
| | - Melina Hosseiny
- Department of Radiology, University of California, San Diego (UCSD), San Diego, CA.
| | - Mohamamd H Jalili
- Department of radiology and biomedical imaging, Yale New Haven Health, Bridgeport Hospital, CT.
| | - Amir Ali Rahsepar
- Department of Radiology, Feinberg School of Medicine, Northwestern University, Chicago, IL.
| | - Amir Imanzadeh
- Department of Radiology, University of California, Irvine (UCI), Irvine, CA.
| | - Grace Hyun Kim
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA.
| | - Cameron Hassani
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA.
| | - Fereidoun Abtin
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA.
| | - John M Moriarty
- Department of Radiological Sciences, Division of Interventional Radiology, David Geffen School of Medicine at UCLA, Los Angeles, CA.
| | - Arash Bedayat
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA.
| |
Collapse
|
3
|
Daraqel B, Wafaie K, Mohammed H, Cao L, Mheissen S, Liu Y, Zheng L. The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard. Am J Orthod Dentofacial Orthop 2024; 165:652-662. [PMID: 38493370 DOI: 10.1016/j.ajodo.2024.01.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 01/01/2024] [Accepted: 01/01/2024] [Indexed: 03/18/2024]
Abstract
INTRODUCTION This study aimed to evaluate and compare the performance of 2 artificial intelligence (AI) models, Chat Generative Pretrained Transformer-3.5 (ChatGPT-3.5; OpenAI, San Francisco, Calif) and Google Bidirectional Encoder Representations from Transformers (Google Bard; Bard Experiment, Google, Mountain View, Calif), in terms of response accuracy, completeness, generation time, and response length when answering general orthodontic questions. METHODS A team of orthodontic specialists developed a set of 100 questions in 10 orthodontic domains. One author submitted the questions to both ChatGPT and Google Bard. The AI-generated responses from both models were randomly assigned into 2 forms and sent to 5 blinded and independent assessors. The quality of AI-generated responses was evaluated using a newly developed tool for accuracy of information and completeness. In addition, response generation time and length were recorded. RESULTS The accuracy and completeness of responses were high in both AI models. The median accuracy score was 9 (interquartile range [IQR]: 8-9) for ChatGPT and 8 (IQR: 8-9) for Google Bard (Median difference: 1; P <0.001). The median completeness score was similar in both models, with 8 (IQR: 8-9) for ChatGPT and 8 (IQR: 7-9) for Google Bard. The odds of accuracy and completeness were higher by 31% and 23% in ChatGPT than in Google Bard. Google Bard's response generation time was significantly shorter than that of ChatGPT by 10.4 second/question. However, both models were similar in terms of response length generation. CONCLUSIONS Both ChatGPT and Google Bard generated responses were rated with a high level of accuracy and completeness to the posed general orthodontic questions. However, acquiring answers was generally faster using the Google Bard model.
Collapse
Affiliation(s)
- Baraa Daraqel
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China; Oral Health Research and Promotion Unit, Al-Quds University, Jerusalem, Palestine.
| | - Khaled Wafaie
- Department of Orthodontics, Faculty of Dentistry, First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | | | - Li Cao
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China
| | | | - Yang Liu
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China
| | - Leilei Zheng
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China.
| |
Collapse
|
4
|
Jedrzejczak WW, Kochanek K. Comparison of the Audiological Knowledge of Three Chatbots: ChatGPT, Bing Chat, and Bard. Audiol Neurootol 2024:1-7. [PMID: 38710158 DOI: 10.1159/000538983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 04/15/2024] [Indexed: 05/08/2024] Open
Abstract
INTRODUCTION The purpose of this study was to evaluate three chatbots - OpenAI ChatGPT, Microsoft Bing Chat (currently Copilot), and Google Bard (currently Gemini) - in terms of their responses to a defined set of audiological questions. METHODS Each chatbot was presented with the same 10 questions. The authors rated the responses on a Likert scale ranging from 1 to 5. Additional features, such as the number of inaccuracies or errors and the provision of references, were also examined. RESULTS Most responses given by all three chatbots were rated as satisfactory or better. However, all chatbots generated at least a few errors or inaccuracies. ChatGPT achieved the highest overall score, while Bard was the worst. Bard was also the only chatbot unable to provide a response to one of the questions. ChatGPT was the only chatbot that did not provide information about its sources. CONCLUSIONS Chatbots are an intriguing tool that can be used to access basic information in a specialized area like audiology. Nevertheless, one needs to be careful, as correct information is not infrequently mixed in with errors that are hard to pick up unless the user is well versed in the field.
Collapse
Affiliation(s)
- W Wiktor Jedrzejczak
- Institute of Physiology and Pathology of Hearing, Warsaw, Poland
- World Hearing Center, Kajetany, Poland
| | - Krzysztof Kochanek
- Institute of Physiology and Pathology of Hearing, Warsaw, Poland
- World Hearing Center, Kajetany, Poland
| |
Collapse
|
5
|
Freire Y, Santamaría Laorden A, Orejas Pérez J, Gómez Sánchez M, Díaz-Flores García V, Suárez A. ChatGPT performance in prosthodontics: Assessment of accuracy and repeatability in answer generation. J Prosthet Dent 2024; 131:659.e1-659.e6. [PMID: 38310063 DOI: 10.1016/j.prosdent.2024.01.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/17/2024] [Accepted: 01/18/2024] [Indexed: 02/05/2024]
Abstract
STATEMENT OF PROBLEM The artificial intelligence (AI) software program ChatGPT is based on large language models (LLMs) and is widely accessible. However, in prosthodontics, little is known about its performance in generating answers. PURPOSE The purpose of this study was to determine the performance of ChatGPT in generating answers about removable dental prostheses (RDPs) and tooth-supported fixed dental prostheses (FDPs). MATERIAL AND METHODS Thirty short questions were designed about RDPs and tooth-supported FDP, and 30 answers were generated for each of the questions using ChatGPT-4 in October 2023. The 900 generated answers were independently graded by experts using a 3-point Likert scale. The relative frequency and absolute percentage of answers were described. Accuracy was assessed using the Wald binomial method, while repeatability was evaluated using percentage agreement, Brennan and Prediger coefficient, Conger generalized Cohen kappa, Fleiss kappa, Gwet AC, and Krippendorff alpha methods. Confidence intervals were set at 95%. Statistical analysis was performed using the STATA software program. RESULTS The performance of ChatGPT in generating answers related to RDP and tooth-supported FDP was limited. The answers showed a reliability of 25.6%, with a confidence range between 22.9% and 28.6%. The repeatability ranged from substantial to moderate. CONCLUSIONS The results show that currently ChatGPT has limited ability to generate answers related to RDPs and tooth-supported FDPs. Therefore, ChatGPT cannot replace a dentist, and, if professionals were to use it, they should be aware of its limitations.
Collapse
Affiliation(s)
- Yolanda Freire
- Assistant Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| | - Andrea Santamaría Laorden
- Assistant Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| | - Jaime Orejas Pérez
- Assistant Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| | - Margarita Gómez Sánchez
- Assistant Professor, Vice Dean of Dentistry, Department of Pre-Clinic Dentistry and Clinical Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| | - Víctor Díaz-Flores García
- Assistant Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain.
| | - Ana Suárez
- Associate Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| |
Collapse
|
6
|
Ray PP. Advancing AI in rheumatology: critical reflections and proposals for future research using large language models. Rheumatol Int 2024; 44:573-574. [PMID: 37891327 DOI: 10.1007/s00296-023-05488-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 10/03/2023] [Indexed: 10/29/2023]
Affiliation(s)
- Partha Pratim Ray
- Department of Computer Applications, Sikkim University, 6th Mile, PO-Tadong, Gangtok, 737102, Sikkim, India.
| |
Collapse
|
7
|
Venerito V, Gupta L. Large language models: rheumatologists' newest colleagues? Nat Rev Rheumatol 2024; 20:75-76. [PMID: 38177451 DOI: 10.1038/s41584-023-01070-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Affiliation(s)
- Vincenzo Venerito
- Rheumatology Unit, Department of Precision and Regenerative Medicine and Ionian Area (DiMePRe-J), University of Bari Aldo Moro, Bari, Italy
| | - Latika Gupta
- Department of Rheumatology, Royal Wolverhampton Hospitals NHS Trust, Wolverhampton, UK.
- Division of Musculoskeletal and Dermatological Sciences, Centre for Musculoskeletal Research, School of Biological Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK.
| |
Collapse
|