51
|
Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun CH, Lam JSH, Chen DZ, Goh JHL, Tan MCJ, Sheng B, Cheng CY, Koh VTC, Tham YC. Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine 2023; 95:104770. [PMID: 37625267 PMCID: PMC10470220 DOI: 10.1016/j.ebiom.2023.104770] [Citation(s) in RCA: 123] [Impact Index Per Article: 61.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 07/21/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023] Open
Abstract
BACKGROUND Large language models (LLMs) are garnering wide interest due to their human-like and contextually relevant responses. However, LLMs' accuracy across specific medical domains has yet been thoroughly evaluated. Myopia is a frequent topic which patients and parents commonly seek information online. Our study evaluated the performance of three LLMs namely ChatGPT-3.5, ChatGPT-4.0, and Google Bard, in delivering accurate responses to common myopia-related queries. METHODS We curated thirty-one commonly asked myopia care-related questions, which were categorised into six domains-pathogenesis, risk factors, clinical presentation, diagnosis, treatment and prevention, and prognosis. Each question was posed to the LLMs, and their responses were independently graded by three consultant-level paediatric ophthalmologists on a three-point accuracy scale (poor, borderline, good). A majority consensus approach was used to determine the final rating for each response. 'Good' rated responses were further evaluated for comprehensiveness on a five-point scale. Conversely, 'poor' rated responses were further prompted for self-correction and then re-evaluated for accuracy. FINDINGS ChatGPT-4.0 demonstrated superior accuracy, with 80.6% of responses rated as 'good', compared to 61.3% in ChatGPT-3.5 and 54.8% in Google Bard (Pearson's chi-squared test, all p ≤ 0.009). All three LLM-Chatbots showed high mean comprehensiveness scores (Google Bard: 4.35; ChatGPT-4.0: 4.23; ChatGPT-3.5: 4.11, out of a maximum score of 5). All LLM-Chatbots also demonstrated substantial self-correction capabilities: 66.7% (2 in 3) of ChatGPT-4.0's, 40% (2 in 5) of ChatGPT-3.5's, and 60% (3 in 5) of Google Bard's responses improved after self-correction. The LLM-Chatbots performed consistently across domains, except for 'treatment and prevention'. However, ChatGPT-4.0 still performed superiorly in this domain, receiving 70% 'good' ratings, compared to 40% in ChatGPT-3.5 and 45% in Google Bard (Pearson's chi-squared test, all p ≤ 0.001). INTERPRETATION Our findings underscore the potential of LLMs, particularly ChatGPT-4.0, for delivering accurate and comprehensive responses to myopia-related queries. Continuous strategies and evaluations to improve LLMs' accuracy remain crucial. FUNDING Dr Yih-Chung Tham was supported by the National Medical Research Council of Singapore (NMRC/MOH/HCSAINV21nov-0001).
Collapse
Affiliation(s)
- Zhi Wei Lim
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Krithi Pushpanathan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
| | - Samantha Min Er Yew
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
| | - Yien Lai
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - Chen-Hsin Sun
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - Janice Sing Harn Lam
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - David Ziyou Chen
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | | | - Marcus Chun Jin Tan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - Bin Sheng
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China; Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China; MoE Key Lab of Artificial Intelligence, Artificial Intelligence Institute, Shanghai Jiao Tong University, Shanghai, China
| | - Ching-Yu Cheng
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore
| | - Victor Teck Chang Koh
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - Yih-Chung Tham
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore.
| |
Collapse
|
52
|
Chow JCL, Wong V, Sanders L, Li K. Developing an AI-Assisted Educational Chatbot for Radiotherapy Using the IBM Watson Assistant Platform. Healthcare (Basel) 2023; 11:2417. [PMID: 37685452 PMCID: PMC10487627 DOI: 10.3390/healthcare11172417] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 08/25/2023] [Accepted: 08/26/2023] [Indexed: 09/10/2023] Open
Abstract
Objectives: This study aims to make radiotherapy knowledge regarding healthcare accessible to the general public by developing an AI-powered chatbot. The interactive nature of the chatbot is expected to facilitate better understanding of information on radiotherapy through communication with users. Methods: Using the IBM Watson Assistant platform on IBM Cloud, the chatbot was constructed following a pre-designed flowchart that outlines the conversation flow. This approach ensured the development of the chatbot with a clear mindset and allowed for effective tracking of the conversation. The chatbot is equipped to furnish users with information and quizzes on radiotherapy to assess their understanding of the subject. Results: By adopting a question-and-answer approach, the chatbot can engage in human-like communication with users seeking information about radiotherapy. As some users may feel anxious and struggle to articulate their queries, the chatbot is designed to be user-friendly and reassuring, providing a list of questions for the user to choose from. Feedback on the chatbot's content was mostly positive, despite a few limitations. The chatbot performed well and successfully conveyed knowledge as intended. Conclusions: There is a need to enhance the chatbot's conversation approach to improve user interaction. Including translation capabilities to cater to individuals with different first languages would also be advantageous. Lastly, the newly launched ChatGPT could potentially be developed into a medical chatbot to facilitate knowledge transfer.
Collapse
Affiliation(s)
- James C. L. Chow
- Radiation Medicine Program, Princess Margaret Cancer Centre, University Health Network, Toronto, ON M5G 1X6, Canada
- Department of Radiation Oncology, University of Toronto, Toronto, ON M5T 1P5, Canada
| | - Valerie Wong
- Department of Physics, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada;
| | - Leslie Sanders
- Department of Humanities, York University, Toronto, ON M3J 1P3, Canada;
| | - Kay Li
- Department of English, University of Toronto, Toronto, ON M5R 2M8, Canada;
| |
Collapse
|
53
|
Nazir T, Ahmad U, Mal M, Rehman MU, Saeed R, Kalia J. Microsoft Bing vs Google Bard in Neurology: A Comparative Study of AI-Generated Patient Education Material.. [DOI: 10.1101/2023.08.25.23294641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
AbstractBackgroundPatient education is an essential component of healthcare, and artificial intelligence (AI) language models such as Google Bard and Microsoft Bing have the potential to improve information transmission and enhance patient care. However, it is crucial to evaluate the quality, accuracy, and understandability of the materials generated by these models before applying them in medical practice. This study aimed to assess and compare the quality of patient education materials produced by Google Bard and Microsoft Bing in response to questions related to neurological conditions.MethodsA cross-sectional study design was used to evaluate and compare the ability of Google Bard and Microsoft Bing to generate patient education materials. The study included the top ten prevalent neurological diseases based on WHO prevalence data. Ten board-certified neurologists and four neurology residents evaluated the responses generated by the models on six quality metrics. The scores for each model were compiled and averaged across all measures, and the significance of any observed variations was assessed using an independent t-test.ResultsGoogle Bard performed better than Microsoft Bing in all six-quality metrics, with an overall mean score of 79% and 69%, respectively. Google Bard outperformed Microsoft Bing in all measures for eight questions, while Microsoft Bing performed marginally better in terms of objectivity and clarity for the epilepsy query.ConclusionThis study showed that Google Bard performs better than Microsoft Bing in generating patient education materials for neurological diseases. However, healthcare professionals should take into account both AI models’ advantages and disadvantages when providing support for health information requirements. Future studies can help determine the underlying causes of these variations and guide cooperative initiatives to create more user-focused AI-generated patient education materials. Finally, researchers should consider the perception of patients regarding AI-generated patient education material and its impact on implementing these solutions in healthcare settings.
Collapse
|
54
|
Watters C, Lemanski MK. Universal skepticism of ChatGPT: a review of early literature on chat generative pre-trained transformer. Front Big Data 2023; 6:1224976. [PMID: 37680954 PMCID: PMC10482048 DOI: 10.3389/fdata.2023.1224976] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/10/2023] [Indexed: 09/09/2023] Open
Abstract
ChatGPT, a new language model developed by OpenAI, has garnered significant attention in various fields since its release. This literature review provides an overview of early ChatGPT literature across multiple disciplines, exploring its applications, limitations, and ethical considerations. The review encompasses Scopus-indexed publications from November 2022 to April 2023 and includes 156 articles related to ChatGPT. The findings reveal a predominance of negative sentiment across disciplines, though subject-specific attitudes must be considered. The review highlights the implications of ChatGPT in many fields including healthcare, raising concerns about employment opportunities and ethical considerations. While ChatGPT holds promise for improved communication, further research is needed to address its capabilities and limitations. This literature review provides insights into early research on ChatGPT, informing future investigations and practical applications of chatbot technology, as well as development and usage of generative AI.
Collapse
Affiliation(s)
- Casey Watters
- Faculty of Law, Bond University, Gold Coast, QLD, Australia
| | | |
Collapse
|
57
|
Rawashdeh B, Kim J, AlRyalat SA, Prasad R, Cooper M. ChatGPT and Artificial Intelligence in Transplantation Research: Is It Always Correct? Cureus 2023; 15:e42150. [PMID: 37602076 PMCID: PMC10438857 DOI: 10.7759/cureus.42150] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2023] [Indexed: 08/22/2023] Open
Abstract
INTRODUCTION ChatGPT (OpenAI, San Francisco, California, United States) is a chatbot powered by language-based artificial intelligence (AI). It generates text based on the information provided by users. It is currently being evaluated in medical research, publishing, and healthcare. However, there has been no prior study on the evaluation of its ability to help in kidney transplant research. This feasibility study aimed to evaluate the application and accuracy of ChatGPT in the field of kidney transplantation. METHODS On two separate dates, February 21 and March 2, 2023, ChatGPT 3.5 was questioned regarding the medical treatment of kidney transplants and related scientific facts. The responses provided by the chatbot were compiled, and a panel of two specialists reviewed the correctness of each answer. RESULTS We demonstrated that ChatGPT possessed substantial general knowledge of kidney transplantation; however, they lacked sufficient information and had inaccurate information that necessitates a deeper understanding of the topic. Moreover, ChatGPT failed to provide references for any of the scientific data it provided regarding kidney transplants, and when requested for references, it provided inaccurate ones. CONCLUSION The results of this short feasibility study indicate that ChatGPT may have the ability to assist in data collecting when a particular query is posed. However, caution should be exercised and it should not be used in isolation as a supplement to research or decisions regarding healthcare because there are still challenges with data accuracy and missing information.
Collapse
Affiliation(s)
- Badi Rawashdeh
- Transplant Surgery, Medical College of Wisconsin, Milwaukee, USA
| | - Joohyun Kim
- Transplant Surgery, Medical College of Wisconsin, Milwaukee, USA
| | | | - Raj Prasad
- Transplant Surgery, Medical College of Wisconsin, Milwaukee, USA
| | - Matthew Cooper
- Transplant Surgery, Medical College of Wisconsin, Milwaukee, USA
| |
Collapse
|
59
|
Abstract
The OpenAI chatbot ChatGPT is an artificial intelligence (AI) application that uses state-of-the-art language processing AI. It can perform a vast number of tasks, from writing poetry and explaining complex quantum mechanics, to translating language and writing research articles with a human-like understanding and legitimacy. Since its initial release to the public in November 2022, ChatGPT has garnered considerable attention due to its ability to mimic the patterns of human language, and it has attracted billion-dollar investments from Microsoft and PricewaterhouseCoopers. The scope of ChatGPT and other large language models appears infinite, but there are several important limitations. This editorial provides an introduction to the basic functionality of ChatGPT and other large language models, their current applications and limitations, and the associated implications for clinical practice and research.
Collapse
Affiliation(s)
- Kyle N Kunze
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, USA
| | - Seong J Jang
- Weill Cornell Medical College, New York, New York, USA
| | | | - Jonathan M Vigdorchik
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, New York, USA
- Adult Reconstruction and Joint Replacement Service, Hospital for Special Surgery, New York, New York, USA
| | - Fares S Haddad
- The Bone & Joint Journal , London, UK
- University College London Hospitals, and The NIHR Biomedical Research Centre at UCLH, London, UK
- Princess Grace Hospital, London, UK
| |
Collapse
|