1
|
Wang Y, Liu C, Zhou K, Zhu T, Han X. Towards regulatory generative AI in ophthalmology healthcare: a security and privacy perspective. Br J Ophthalmol 2024:bjo-2024-325167. [PMID: 38834290 DOI: 10.1136/bjo-2024-325167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 05/19/2024] [Indexed: 06/06/2024]
Abstract
As the healthcare community increasingly harnesses the power of generative artificial intelligence (AI), critical issues of security, privacy and regulation take centre stage. In this paper, we explore the security and privacy risks of generative AI from model-level and data-level perspectives. Moreover, we elucidate the potential consequences and case studies within the domain of ophthalmology. Model-level risks include knowledge leakage from the model and model safety under AI-specific attacks, while data-level risks involve unauthorised data collection and data accuracy concerns. Within the healthcare context, these risks can bear severe consequences, encompassing potential breaches of sensitive information, violating privacy rights and threats to patient safety. This paper not only highlights these challenges but also elucidates governance-driven solutions that adhere to AI and healthcare regulations. We advocate for preparedness against potential threats, call for transparency enhancements and underscore the necessity of clinical validation before real-world implementation. The objective of security and privacy improvement in generative AI warrants emphasising the role of ophthalmologists and other healthcare providers, and the timely introduction of comprehensive regulations.
Collapse
Affiliation(s)
- Yueye Wang
- Sun Yat-sen University Zhongshan Ophthalmic Center State Key Laboratory of Ophthalmology, Guangzhou, Guangdong, China
| | - Chi Liu
- Faculty of Data Science, City University of Macau, Macao SAR, China
| | - Keyao Zhou
- Department of Ophthalmology, Guangdong Provincial People's Hospital, Guangzhou, Guangdong, China
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China
| | - Tianqing Zhu
- Faculty of Data Science, City University of Macau, Macao SAR, China
| | - Xiaotong Han
- Sun Yat-sen University Zhongshan Ophthalmic Center State Key Laboratory of Ophthalmology, Guangzhou, Guangdong, China
| |
Collapse
|
2
|
Roldan-Vasquez E, Mitri S, Bhasin S, Bharani T, Capasso K, Haslinger M, Sharma R, James TA. Reliability of artificial intelligence chatbot responses to frequently asked questions in breast surgical oncology. J Surg Oncol 2024. [PMID: 38837375 DOI: 10.1002/jso.27715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 05/21/2024] [Indexed: 06/07/2024]
Abstract
INTRODUCTION Artificial intelligence (AI)-driven chatbots, capable of simulating human-like conversations, are becoming more prevalent in healthcare. While this technology offers potential benefits in patient engagement and information accessibility, it raises concerns about potential misuse, misinformation, inaccuracies, and ethical challenges. METHODS This study evaluated a publicly available AI chatbot, ChatGPT, in its responses to nine questions related to breast cancer surgery selected from the American Society of Breast Surgeons' frequently asked questions (FAQ) patient education website. Four breast surgical oncologists assessed the responses for accuracy and reliability using a five-point Likert scale and the Patient Education Materials Assessment (PEMAT) Tool. RESULTS The average reliability score for ChatGPT in answering breast cancer surgery questions was 3.98 out of 5.00. Surgeons unanimously found the responses understandable and actionable per the PEMAT criteria. The consensus found ChatGPT's overall performance was appropriate, with minor or no inaccuracies. CONCLUSION ChatGPT demonstrates good reliability in responding to breast cancer surgery queries, with minor, nonharmful inaccuracies. Its answers are accurate, clear, and easy to comprehend. Notably, ChatGPT acknowledged its informational role and did not attempt to replace medical advice or discourage users from seeking input from a healthcare professional.
Collapse
Affiliation(s)
- Estefania Roldan-Vasquez
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Samir Mitri
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Shreya Bhasin
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
- School of Medicine and Dentistry, University of Rochester, Rochester, New York, USA
| | - Tina Bharani
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
- Department of Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Kathryn Capasso
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Michelle Haslinger
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Ranjna Sharma
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Ted A James
- Department of Surgery, Breast Surgical Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
3
|
Nguyen TP, Carvalho B, Sukhdeo H, Joudi K, Guo N, Chen M, Wolpaw JT, Kiefer JJ, Byrne M, Jamroz T, Mootz AA, Reale SC, Zou J, Sultan P. Comparison of artificial intelligence large language model chatbots in answering frequently asked questions in anaesthesia. BJA OPEN 2024; 10:100280. [PMID: 38764485 PMCID: PMC11099318 DOI: 10.1016/j.bjao.2024.100280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 03/20/2024] [Indexed: 05/21/2024]
Abstract
Background Patients are increasingly using artificial intelligence (AI) chatbots to seek answers to medical queries. Methods Ten frequently asked questions in anaesthesia were posed to three AI chatbots: ChatGPT4 (OpenAI), Bard (Google), and Bing Chat (Microsoft). Each chatbot's answers were evaluated in a randomised, blinded order by five residency programme directors from 15 medical institutions in the USA. Three medical content quality categories (accuracy, comprehensiveness, safety) and three communication quality categories (understandability, empathy/respect, and ethics) were scored between 1 and 5 (1 representing worst, 5 representing best). Results ChatGPT4 and Bard outperformed Bing Chat (median [inter-quartile range] scores: 4 [3-4], 4 [3-4], and 3 [2-4], respectively; P<0.001 with all metrics combined). All AI chatbots performed poorly in accuracy (score of ≥4 by 58%, 48%, and 36% of experts for ChatGPT4, Bard, and Bing Chat, respectively), comprehensiveness (score ≥4 by 42%, 30%, and 12% of experts for ChatGPT4, Bard, and Bing Chat, respectively), and safety (score ≥4 by 50%, 40%, and 28% of experts for ChatGPT4, Bard, and Bing Chat, respectively). Notably, answers from ChatGPT4, Bard, and Bing Chat differed statistically in comprehensiveness (ChatGPT4, 3 [2-4] vs Bing Chat, 2 [2-3], P<0.001; and Bard 3 [2-4] vs Bing Chat, 2 [2-3], P=0.002). All large language model chatbots performed well with no statistical difference for understandability (P=0.24), empathy (P=0.032), and ethics (P=0.465). Conclusions In answering anaesthesia patient frequently asked questions, the chatbots perform well on communication metrics but are suboptimal for medical content metrics. Overall, ChatGPT4 and Bard were comparable to each other, both outperforming Bing Chat.
Collapse
Affiliation(s)
- Teresa P. Nguyen
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Brendan Carvalho
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Hannah Sukhdeo
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Kareem Joudi
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Nan Guo
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Marianne Chen
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| | - Jed T. Wolpaw
- Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jesse J. Kiefer
- Department of Anesthesiology and Critical Care Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
| | - Melissa Byrne
- Department of Anesthesiology, Perioperative and Pain Medicine, University of Michigan Ann Arbor School of Medicine, Ann Arbor, MI, USA
| | - Tatiana Jamroz
- Department of Anesthesiology, Perioperative and Pain Medicine, Cleveland Clinic Foundation and Hospitals, Cleveland, OH, USA
| | - Allison A. Mootz
- Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Harvard School of Medicine, Boston, MA, USA
| | - Sharon C. Reale
- Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Harvard School of Medicine, Boston, MA, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Pervez Sultan
- Department of Anesthesiology, Perioperative and Pain Medicine, Stanford School of Medicine, Stanford, CA, USA
| |
Collapse
|
4
|
Ye F, Zhang H, Luo X, Wu T, Yang Q, Shi Z. Evaluating ChatGPT's Performance in Answering Questions About Allergic Rhinitis and Chronic Rhinosinusitis. Otolaryngol Head Neck Surg 2024. [PMID: 38796735 DOI: 10.1002/ohn.832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 04/30/2024] [Accepted: 05/04/2024] [Indexed: 05/28/2024]
Abstract
OBJECTIVE This study aims to evaluate the accuracy of ChatGPT in answering allergic rhinitis (AR) and chronic rhinosinusitis (CRS) related questions. STUDY DESIGN This is a cross-sectional study. SETTING Each question was inputted as a separate, independent prompt. METHODS Responses to AR (n = 189) and CRS (n = 242) related questions, generated by GPT-3.5 and GPT-4, were independently graded for accuracy by 2 senior rhinology professors, with disagreements adjudicated by a third reviewer. RESULTS Overall, ChatGPT demonstrated a satisfactory performance, accurately answering over 80% of questions across all categories. Specifically, GPT-4.0's accuracy in responding to AR-related questions significantly exceeded that of GPT-3.5, but distinction not evident in CRS-related questions. Patient-originated questions had a significantly higher accuracy compared to doctor-originated questions when utilizing GPT-4.0 to respond to AR-related questions. This discrepancy was not observed with GPT-3.5 or in the context of CRS-related questions. Across different types of content, ChatGPT excelled in covering basic knowledge, prevention, and emotion for AR and CRS. However, it experienced challenges when addressing questions about recent advancements, a trend consistent across both GPT-3.5 and GPT-4.0 iterations. Importantly, the accuracy of responses remained unaffected when questions were posed in Chinese. CONCLUSION Our findings suggest ChatGPT's capability to convey accurate information for AR and CRS patients, and offer insights into its performance across various domains, guiding its utilization and improvement.
Collapse
Affiliation(s)
- Fan Ye
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - He Zhang
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Xin Luo
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Tong Wu
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Qintai Yang
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Naso-Orbital-Maxilla and Skull Base Center, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Airway Inflammatory Disease Research and Innovative Technology Translation, Guangzhou, China
| | - Zhaohui Shi
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Naso-Orbital-Maxilla and Skull Base Center, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
- Key Laboratory of Airway Inflammatory Disease Research and Innovative Technology Translation, Guangzhou, China
| |
Collapse
|
5
|
Shiraishi M, Tomioka Y, Miyakuni A, Moriwaki Y, Yang R, Oba J, Okazaki M. Generating Informed Consent Documents Related to Blepharoplasty Using ChatGPT. Ophthalmic Plast Reconstr Surg 2024; 40:316-320. [PMID: 38133626 DOI: 10.1097/iop.0000000000002574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
PURPOSE This study aimed to demonstrate the performance of the popular artificial intelligence (AI) language model, Chat Generative Pre-trained Transformer (ChatGPT) (OpenAI, San Francisco, CA, U.S.A.), in generating the informed consent (IC) document of blepharoplasty. METHODS A total of 2 prompts were provided to ChatGPT to generate IC documents. Four board-certified plastic surgeons and 4 nonmedical staff members evaluated the AI-generated IC documents and the original IC document currently used in the clinical setting. They assessed these documents in terms of accuracy, informativeness, and accessibility. RESULTS Among board-certified plastic surgeons, the initial AI-generated IC document scored significantly lower than the original IC document in accuracy ( p < 0.001), informativeness ( p = 0.005), and accessibility ( p = 0.021), while the revised AI-generated IC document scored lower compared with the original document in accuracy ( p = 0.03) and accessibility ( p = 0.021). Among nonmedical staff members, no statistical significance of 2 AI-generated IC documents was observed compared with the original document in terms of accuracy, informativeness, and accessibility. CONCLUSIONS The results showed that current ChatGPT cannot be used as a distinct patient education resource. However, it has the potential to make better IC documents when improving the professional terminology. This AI technology will eventually transform ophthalmic plastic surgery healthcare systematics by enhancing patient education and decision-making via IC documents.
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | | | | | | | | | | | | |
Collapse
|
6
|
Kedia N, Sanjeev S, Ong J, Chhablani J. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (Lond) 2024; 38:1252-1261. [PMID: 38172581 PMCID: PMC11076576 DOI: 10.1038/s41433-023-02915-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/23/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
ChatGPT, an artificial intelligence (AI) chatbot built on large language models (LLMs), has rapidly gained popularity. The benefits and limitations of this transformative technology have been discussed across various fields, including medicine. The widespread availability of ChatGPT has enabled clinicians to study how these tools could be used for a variety of tasks such as generating differential diagnosis lists, organizing patient notes, and synthesizing literature for scientific research. LLMs have shown promising capabilities in ophthalmology by performing well on the Ophthalmic Knowledge Assessment Program, providing fairly accurate responses to questions about retinal diseases, and in generating differential diagnoses list. There are current limitations to this technology, including the propensity of LLMs to "hallucinate", or confidently generate false information; their potential role in perpetuating biases in medicine; and the challenges in incorporating LLMs into research without allowing "AI-plagiarism" or publication of false information. In this paper, we provide a balanced overview of what LLMs are and introduce some of the LLMs that have been generated in the past few years. We discuss recent literature evaluating the role of these language models in medicine with a focus on ChatGPT. The field of AI is fast-paced, and new applications based on LLMs are being generated rapidly; therefore, it is important for ophthalmologists to be aware of how this technology works and how it may impact patient care. Here, we discuss the benefits, limitations, and future advancements of LLMs in patient care and research.
Collapse
Affiliation(s)
- Nikita Kedia
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | | | - Joshua Ong
- Department of Ophthalmology and Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, MI, USA
| | - Jay Chhablani
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
7
|
Momenaei B, Mansour HA, Kuriyan AE, Xu D, Sridhar J, Ting DSW, Yonekawa Y. ChatGPT enters the room: what it means for patient counseling, physician education, academics, and disease management. Curr Opin Ophthalmol 2024; 35:205-209. [PMID: 38334288 DOI: 10.1097/icu.0000000000001036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
PURPOSE OF REVIEW This review seeks to provide a summary of the most recent research findings regarding the utilization of ChatGPT, an artificial intelligence (AI)-powered chatbot, in the field of ophthalmology in addition to exploring the limitations and ethical considerations associated with its application. RECENT FINDINGS ChatGPT has gained widespread recognition and demonstrated potential in enhancing patient and physician education, boosting research productivity, and streamlining administrative tasks. In various studies examining its utility in ophthalmology, ChatGPT has exhibited fair to good accuracy, with its most recent iteration showcasing superior performance in providing ophthalmic recommendations across various ophthalmic disorders such as corneal diseases, orbital disorders, vitreoretinal diseases, uveitis, neuro-ophthalmology, and glaucoma. This proves beneficial for patients in accessing information and aids physicians in triaging as well as formulating differential diagnoses. Despite such benefits, ChatGPT has limitations that require acknowledgment including the potential risk of offering inaccurate or harmful information, dependence on outdated data, the necessity for a high level of education for data comprehension, and concerns regarding patient privacy and ethical considerations within the research domain. SUMMARY ChatGPT is a promising new tool that could contribute to ophthalmic healthcare education and research, potentially reducing work burdens. However, its current limitations necessitate a complementary role with human expert oversight.
Collapse
Affiliation(s)
- Bita Momenaei
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Hana A Mansour
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Ajay E Kuriyan
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - David Xu
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Jayanth Sridhar
- University of California Los Angeles, Los Angeles, California, USA
| | | | - Yoshihiro Yonekawa
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| |
Collapse
|
8
|
Mihalache A, Huang RS, Popovic MM, Muni RH. Artificial intelligence chatbot and Academy Preferred Practice Pattern ® Guidelines on cataract and glaucoma. J Cataract Refract Surg 2024; 50:534-535. [PMID: 38468154 DOI: 10.1097/j.jcrs.0000000000001317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 09/08/2023] [Indexed: 03/13/2024]
Affiliation(s)
- Andrew Mihalache
- From the Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada (Mihalache, Huang); Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada (Popovic, Muni); Department of Ophthalmology, St. Michael's Hospital/Unity Health Toronto, Toronto, Ontario, Canada (Muni)
| | | | | | | |
Collapse
|
9
|
Biswas S, Davies LN, Sheppard AL, Logan NS, Wolffsohn JS. Utility of artificial intelligence-based large language models in ophthalmic care. Ophthalmic Physiol Opt 2024; 44:641-671. [PMID: 38404172 DOI: 10.1111/opo.13284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 01/23/2024] [Accepted: 01/25/2024] [Indexed: 02/27/2024]
Abstract
PURPOSE With the introduction of ChatGPT, artificial intelligence (AI)-based large language models (LLMs) are rapidly becoming popular within the scientific community. They use natural language processing to generate human-like responses to queries. However, the application of LLMs and comparison of the abilities among different LLMs with their human counterparts in ophthalmic care remain under-reported. RECENT FINDINGS Hitherto, studies in eye care have demonstrated the utility of ChatGPT in generating patient information, clinical diagnosis and passing ophthalmology question-based examinations, among others. LLMs' performance (median accuracy, %) is influenced by factors such as the iteration, prompts utilised and the domain. Human expert (86%) demonstrated the highest proficiency in disease diagnosis, while ChatGPT-4 outperformed others in ophthalmology examinations (75.9%), symptom triaging (98%) and providing information and answering questions (84.6%). LLMs exhibited superior performance in general ophthalmology but reduced accuracy in ophthalmic subspecialties. Although AI-based LLMs like ChatGPT are deemed more efficient than their human counterparts, these AIs are constrained by their nonspecific and outdated training, no access to current knowledge, generation of plausible-sounding 'fake' responses or hallucinations, inability to process images, lack of critical literature analysis and ethical and copyright issues. A comprehensive evaluation of recently published studies is crucial to deepen understanding of LLMs and the potential of these AI-based LLMs. SUMMARY Ophthalmic care professionals should undertake a conservative approach when using AI, as human judgement remains essential for clinical decision-making and monitoring the accuracy of information. This review identified the ophthalmic applications and potential usages which need further exploration. With the advancement of LLMs, setting standards for benchmarking and promoting best practices is crucial. Potential clinical deployment requires the evaluation of these LLMs to move away from artificial settings, delve into clinical trials and determine their usefulness in the real world.
Collapse
Affiliation(s)
- Sayantan Biswas
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Leon N Davies
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Amy L Sheppard
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Nicola S Logan
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - James S Wolffsohn
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| |
Collapse
|
10
|
Knebel D, Priglinger S, Scherer N, Klaas J, Siedlecki J, Schworm B. Assessment of ChatGPT in the Prehospital Management of Ophthalmological Emergencies - An Analysis of 10 Fictional Case Vignettes. Klin Monbl Augenheilkd 2024; 241:675-681. [PMID: 37890504 DOI: 10.1055/a-2149-0447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2023]
Abstract
BACKGROUND The artificial intelligence (AI)-based platform ChatGPT (Chat Generative Pre-Trained Transformer, OpenAI LP, San Francisco, CA, USA) has gained impressive popularity in recent months. Its performance on case vignettes of general medical (non-ophthalmological) emergencies has been assessed - with very encouraging results. The purpose of this study was to assess the performance of ChatGPT on ophthalmological emergency case vignettes in terms of the main outcome measures triage accuracy, appropriateness of recommended prehospital measures, and overall potential to inflict harm to the user/patient. METHODS We wrote ten short, fictional case vignettes describing different acute ophthalmological symptoms. Each vignette was entered into ChatGPT five times with the same wording and following a standardized interaction pathway. The answers were analyzed following a systematic approach. RESULTS We observed a triage accuracy of 93.6%. Most answers contained only appropriate recommendations for prehospital measures. However, an overall potential to inflict harm to users/patients was present in 32% of answers. CONCLUSION ChatGPT should presently not be used as a stand-alone primary source of information about acute ophthalmological symptoms. As AI continues to evolve, its safety and efficacy in the prehospital management of ophthalmological emergencies has to be reassessed regularly.
Collapse
Affiliation(s)
- Dominik Knebel
- Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
| | - Siegfried Priglinger
- Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
| | - Nicolas Scherer
- Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
| | - Julian Klaas
- Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
| | - Jakob Siedlecki
- Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
| | - Benedikt Schworm
- Department of Ophthalmology, University Hospital, Ludwigs-Maximilians-Universität München, München, Germany
| |
Collapse
|
11
|
Rosselló-Jiménez D, Docampo S, Collado Y, Cuadra-Llopart L, Riba F, Llonch-Masriera M. Geriatrics and artificial intelligence in Spain (Ger-IA project): talking to ChatGPT, a nationwide survey. Eur Geriatr Med 2024:10.1007/s41999-024-00970-7. [PMID: 38615289 DOI: 10.1007/s41999-024-00970-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 03/04/2024] [Indexed: 04/15/2024]
Abstract
PURPOSE The purposes of the study was to describe the degree of agreement between geriatricians with the answers given by an AI tool (ChatGPT) in response to questions related to different areas in geriatrics, to study the differences between specialists and residents in geriatrics in terms of the degree of agreement with ChatGPT, and to analyse the mean scores obtained by areas of knowledge/domains. METHODS An observational study was conducted involving 126 doctors from 41 geriatric medicine departments in Spain. Ten questions about geriatric medicine were posed to ChatGPT, and doctors evaluated the AI's answers using a Likert scale. Sociodemographic variables were included. Questions were categorized into five knowledge domains, and means and standard deviations were calculated for each. RESULTS 130 doctors answered the questionnaire. 126 doctors (69.8% women, mean age 41.4 [9.8]) were included in the final analysis. The mean score obtained by ChatGPT was 3.1/5 [0.67]. Specialists rated ChatGPT lower than residents (3.0/5 vs. 3.3/5 points, respectively, P < 0.05). By domains, ChatGPT scored better (M: 3.96; SD: 0.71) in general/theoretical questions rather than in complex decisions/end-of-life situations (M: 2.50; SD: 0.76) and answers related to diagnosis/performing of complementary tests obtained the lowest ones (M: 2.48; SD: 0.77). CONCLUSION Scores presented big variability depending on the area of knowledge. Questions related to theoretical aspects of challenges/future in geriatrics obtained better scores. When it comes to complex decision-making, appropriateness of the therapeutic efforts or decisions about diagnostic tests, professionals indicated a poorer performance. AI is likely to be incorporated into some areas of medicine, but it would still present important limitations, mainly in complex medical decision-making.
Collapse
Affiliation(s)
- Daniel Rosselló-Jiménez
- Geriatric Medicine Department, Hospital Universitari de Terrassa, Consorci Sanitari de Terrassa, Carr. Torrebonica, s/n, Terrassa, 08227, Barcelona, Spain.
| | - S Docampo
- Geriatric Medicine Department, Hospital Santa Creu, Tortosa, Tortosa, Tarragona, Spain
| | - Y Collado
- Geriatric Medicine Department, Hospital Universitari de Terrassa, Consorci Sanitari de Terrassa, Carr. Torrebonica, s/n, Terrassa, 08227, Barcelona, Spain
| | - L Cuadra-Llopart
- Geriatric Medicine Department, Hospital Universitari de Terrassa, Consorci Sanitari de Terrassa, Carr. Torrebonica, s/n, Terrassa, 08227, Barcelona, Spain
- Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya (UIC), Barcelona, Spain
- ACTIUM Functional Anatomy Group, Universitat Internacional de Catalunya (UIC), Barcelona, Spain
| | - F Riba
- Geriatric Medicine Department, Hospital Santa Creu, Tortosa, Tortosa, Tarragona, Spain
| | - M Llonch-Masriera
- Geriatric Medicine Department, Hospital Universitari de Terrassa, Consorci Sanitari de Terrassa, Carr. Torrebonica, s/n, Terrassa, 08227, Barcelona, Spain
- Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya (UIC), Barcelona, Spain
| |
Collapse
|
12
|
Kaftan AN, Hussain MK, Naser FH. Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study. Sci Rep 2024; 14:8233. [PMID: 38589613 PMCID: PMC11002004 DOI: 10.1038/s41598-024-58964-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 04/05/2024] [Indexed: 04/10/2024] Open
Abstract
With the release of ChatGPT at the end of 2022, a new era of thinking and technology use has begun. Artificial intelligence models (AIs) like Gemini (Bard), Copilot (Bing), and ChatGPT-3.5 have the potential to impact every aspect of our lives, including laboratory data interpretation. To assess the accuracy of ChatGPT-3.5, Copilot, and Gemini responses in evaluating biochemical data. Ten simulated patients' biochemical laboratory data, including serum urea, creatinine, glucose, cholesterol, triglycerides, low-density lipoprotein (LDL-c), and high-density lipoprotein (HDL-c), in addition to HbA1c, were interpreted by three AIs: Copilot, Gemini, and ChatGPT-3.5, followed by evaluation with three raters. The study was carried out using two approaches. The first encompassed all biochemical data. The second contained only kidney function data. The first approach indicated Copilot to have the highest level of accuracy, followed by Gemini and ChatGPT-3.5. Friedman and Dunn's post-hoc test revealed that Copilot had the highest mean rank; the pairwise comparisons revealed significant differences for Copilot vs. ChatGPT-3.5 (P = 0.002) and Gemini (P = 0.008). The second approach exhibited Copilot to have the highest accuracy of performance. The Friedman test with Dunn's post-hoc analysis showed Copilot to have the highest mean rank. The Wilcoxon Signed-Rank Test demonstrated an indistinguishable response (P = 0.5) of Copilot when all laboratory data were applied vs. the application of only kidney function data. Copilot is more accurate in interpreting biochemical data than Gemini and ChatGPT-3.5. Its consistent responses across different data subsets highlight its reliability in this context.
Collapse
|
13
|
Braun EM, Juhasz-Böss I, Solomayer EF, Truhn D, Keller C, Heinrich V, Braun BJ. Will I soon be out of my job? Quality and guideline conformity of ChatGPT therapy suggestions to patient inquiries with gynecologic symptoms in a palliative setting. Arch Gynecol Obstet 2024; 309:1543-1549. [PMID: 37975899 DOI: 10.1007/s00404-023-07272-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 10/15/2023] [Indexed: 11/19/2023]
Abstract
PURPOSE The market and application possibilities for artificial intelligence are currently growing at high speed and are increasingly finding their way into gynecology. While the medical side is highly represented in the current literature, the patient's perspective is still lagging behind. Therefore, the aim of this study was to evaluate the recommendations of ChatGPT regarding patient inquiries about the possible therapy of gynecological leading symptoms in a palliative situation by experts. METHODS Case vignettes were constructed for 10 common concomitant symptoms in gynecologic oncology tumors in a palliative setting, and patient queries regarding therapy of these symptoms were generated as prompts for ChatGPT. Five experts in palliative care and gynecologic oncology evaluated the responses with respect to guideline adherence and applicability and identified advantages and disadvantages. RESULTS The overall rating of ChatGPT responses averaged 4.1 (5 = strongly agree; 1 = strongly disagree). The experts saw an average guideline conformity of the therapy recommendations with a value of 4.0. ChatGPT sometimes omits relevant therapies and does not provide an individual assessment of the suggested therapies, but does indicate that a physician consultation is additionally necessary. CONCLUSIONS Language models, such as ChatGPT, can provide valid and largely guideline-compliant therapy recommendations in their freely available and thus in principle accessible version for our patients. For a complete therapy recommendation, an evaluation of the therapies, their individual adjustment as well as a filtering of possible wrong recommendations, a medical expert's opinion remains indispensable.
Collapse
Affiliation(s)
- Eva-Marie Braun
- Center for Integrative Oncology, Die Filderklinik, Im Haberschlai 7, 70794, Filderstadt-Bonlanden, Germany.
| | - Ingolf Juhasz-Böss
- Department of Gynecology, University Medical Center Freiburg, Hugstetter Straße 55, 79106, Freiburg, Germany
| | - Erich-Franz Solomayer
- Department of Gynecology, Obstetrics and Reproductive Medicine, Saarland University Hospital, Kirrberger Straße, Building 9, 66421, Homburg, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Pauwelsstraße 30, 52074, Aachen, Germany
| | - Christiane Keller
- Center for Palliative Medicine and Pediatric Pain Therapy, Saarland University Hospital, Kirrberger Straße, Building 69, 66421, Homburg, Germany
| | - Vanessa Heinrich
- Department of Radiation Oncology, University Hospital Tübingen, Crona Kliniken, Hoppe-Seyler-Str. 3, 72076, Tübingen, Germany
| | - Benedikt Johannes Braun
- Department of Trauma and Reconstructive Surgery at the Eberhard Karls University Tübingen, BG Unfallklinik Tübingen, Schnarrenbergstrasse 95, 72076, Tübingen, Germany
| |
Collapse
|
14
|
Teixeira-Marques F, Medeiros N, Nazaré F, Alves S, Lima N, Ribeiro L, Gama R, Oliveira P. Exploring the role of ChatGPT in clinical decision-making in otorhinolaryngology: a ChatGPT designed study. Eur Arch Otorhinolaryngol 2024; 281:2023-2030. [PMID: 38345613 DOI: 10.1007/s00405-024-08498-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 01/23/2024] [Indexed: 03/16/2024]
Abstract
PURPOSE Since the beginning of 2023, ChatGPT emerged as a hot topic in healthcare research. The potential to be a valuable tool in clinical practice is compelling, particularly in improving clinical decision support by helping physicians to make clinical decisions based on the best medical knowledge available. We aim to investigate ChatGPT's ability to identify, diagnose and manage patients with otorhinolaryngology-related symptoms. METHODS A prospective, cross-sectional study was designed based on an idea suggested by ChatGPT to assess the level of agreement between ChatGPT and five otorhinolaryngologists (ENTs) in 20 reality-inspired clinical cases. The clinical cases were presented to the chatbot on two different occasions (ChatGPT-1 and ChatGPT-2) to assess its temporal stability. RESULTS The mean score of ChatGPT-1 was 4.4 (SD 1.2; min 1, max 5) and of ChatGPT-2 was 4.15 (SD 1.3; min 1, max 5), while the ENTs mean score was 4.91 (SD 0.3; min 3, max 5). The Mann-Whitney U test revealed a statistically significant difference (p < 0.001) between both ChatGPT's and the ENTs's score. ChatGPT-1 and ChatGPT-2 gave different answers in five occasions. CONCLUSIONS Artificial intelligence will be an important instrument in clinical decision-making in the near future and ChatGPT is the most promising chatbot so far. Despite needing further development to be used with safety, there is room for improvement and potential to aid otorhinolaryngology residents and specialists in making the most correct decision for the patient.
Collapse
Affiliation(s)
- Francisco Teixeira-Marques
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal.
| | - Nuno Medeiros
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Francisco Nazaré
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Sandra Alves
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Nuno Lima
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Leandro Ribeiro
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Rita Gama
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Pedro Oliveira
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| |
Collapse
|
15
|
Mark J, Subhi Y. Blinded by Stress: A Patient and Physician Perspective on Central Serous Chorioretinopathy. Ophthalmol Ther 2024; 13:861-866. [PMID: 38386185 PMCID: PMC10912400 DOI: 10.1007/s40123-024-00907-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 02/05/2024] [Indexed: 02/23/2024] Open
Abstract
This commentary is co-authored by a patient with central serous chorioretinopathy (CSC), which is the fourth most common exudative maculopathy. The patient, a young and profiled member of the Danish Parliament, kindly shares his experience living with stress, onset of symptoms, and the experience of being diagnosed with CSC and receiving photodynamic treatment. The experiences of the patient are put into perspective by an ophthalmologist.
Collapse
Affiliation(s)
- Jacob Mark
- Patient Author, Christiansborg, Copenhagen, Denmark
| | - Yousif Subhi
- Department of Clinical Research, University of Southern Denmark, J.B. Winsløws Vej 19.3, 5000, Odense C, Denmark.
- Department of Ophthalmology, Zealand University Hospital, Roskilde, Denmark.
- Department of Ophthalmology, Rigshospitalet, Glostrup, Denmark.
| |
Collapse
|
16
|
Cohen SA, Brant A, Fisher AC, Pershing S, Do D, Pan C. Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery. Semin Ophthalmol 2024:1-8. [PMID: 38516983 DOI: 10.1080/08820538.2024.2326058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 02/27/2024] [Indexed: 03/23/2024]
Abstract
PURPOSE Patients are using online search modalities to learn about their eye health. While Google remains the most popular search engine, the use of large language models (LLMs) like ChatGPT has increased. Cataract surgery is the most common surgical procedure in the US, and there is limited data on the quality of online information that populates after searches related to cataract surgery on search engines such as Google and LLM platforms such as ChatGPT. We identified the most common patient frequently asked questions (FAQs) about cataracts and cataract surgery and evaluated the accuracy, safety, and readability of the answers to these questions provided by both Google and ChatGPT. We demonstrated the utility of ChatGPT in writing notes and creating patient education materials. METHODS The top 20 FAQs related to cataracts and cataract surgery were recorded from Google. Responses to the questions provided by Google and ChatGPT were evaluated by a panel of ophthalmologists for accuracy and safety. Evaluators were also asked to distinguish between Google and LLM chatbot answers. Five validated readability indices were used to assess the readability of responses. ChatGPT was instructed to generate operative notes, post-operative instructions, and customizable patient education materials according to specific readability criteria. RESULTS Responses to 20 patient FAQs generated by ChatGPT were significantly longer and written at a higher reading level than responses provided by Google (p < .001), with an average grade level of 14.8 (college level). Expert reviewers were correctly able to distinguish between a human-reviewed and chatbot generated response an average of 31% of the time. Google answers contained incorrect or inappropriate material 27% of the time, compared with 6% of LLM generated answers (p < .001). When expert reviewers were asked to compare the responses directly, chatbot responses were favored (66%). CONCLUSIONS When comparing the responses to patients' cataract FAQs provided by ChatGPT and Google, practicing ophthalmologists overwhelming preferred ChatGPT responses. LLM chatbot responses were less likely to contain inaccurate information. ChatGPT represents a viable information source for eye health for patients with higher health literacy. ChatGPT may also be used by ophthalmologists to create customizable patient education materials for patients with varying health literacy.
Collapse
Affiliation(s)
- Samuel A Cohen
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Arthur Brant
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Ann Caroline Fisher
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Suzann Pershing
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Diana Do
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Carolyn Pan
- Byers Eye Institute, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
17
|
Mihalache A, Huang RS, Patil NS, Popovic MM, Lee WW, Yan P, Cruz-Pimentel M, Muni RH. Chatbot and Academy Preferred Practice Pattern Guidelines on Retinal Diseases. Ophthalmol Retina 2024:S2468-6530(24)00117-9. [PMID: 38499086 DOI: 10.1016/j.oret.2024.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 02/24/2024] [Accepted: 03/12/2024] [Indexed: 03/20/2024]
Affiliation(s)
- Andrew Mihalache
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Ryan S Huang
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Nikhil S Patil
- Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Marko M Popovic
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Wei Wei Lee
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Peng Yan
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada; Toronto Western Hospital, University Health Network, University of Toronto, Toronto, Ontario, Canada; Department of Ophthalmology, Kensington Vision and Research Center, Toronto, Ontario, Canada
| | - Miguel Cruz-Pimentel
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Rajeev H Muni
- Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada; Department of Ophthalmology, St. Michael's Hospital/Unity Health Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
18
|
Gurnani B, Kaur K. Leveraging ChatGPT for ophthalmic education: A critical appraisal. Eur J Ophthalmol 2024; 34:323-327. [PMID: 37974429 DOI: 10.1177/11206721231215862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
In recent years, the advent of artificial intelligence (AI) has transformed many sectors, including medical education. This editorial critically appraises the integration of ChatGPT, a state-of-the-art AI language model, into ophthalmic education, focusing on its potential, limitations, and ethical considerations. The application of ChatGPT in teaching and training ophthalmologists presents an innovative method to offer real-time, customized learning experiences. Through a systematic analysis of both experimental and clinical data, this editorial examines how ChatGPT enhances engagement, understanding, and retention of complex ophthalmological concepts. The study also evaluates the efficacy of ChatGPT in simulating patient interactions and clinical scenarios, which can foster improved diagnostic and interpersonal skills. Despite the promising advantages, concerns regarding reliability, lack of personal touch, and potential biases in the AI-generated content are scrutinized. Ethical considerations concerning data privacy and potential misuse are also explored. The findings underline the need for carefully designed integration, continuous evaluation, and adherence to ethical guidelines to maximize benefits while mitigating risks. By shedding light on these multifaceted aspects, this paper contributes to the ongoing discourse on the incorporation of AI in medical education, offering valuable insights and guidance for educators, practitioners, and policymakers aiming to leverage modern technology for enhancing ophthalmic education.
Collapse
Affiliation(s)
- Bharat Gurnani
- Cataract, Cornea, Trauma, External Diseases, Ocular Surface and Refractive Services, ASG Eye Hospital, Jodhpur, Rajasthan, India
- Sadguru Netra Chikitsalya, Shri Sadguru Seva Sangh Trust, Chitrakoot, Madhya Pradesh, India
| | - Kirandeep Kaur
- Cataract, Pediatric Ophthalmology and Strabismus, ASG Eye Hospital, Jodhpur, Rajasthan, India
- Children Eye Care Centre, Sadguru Netra Chikitsalya, Shri Sadguru Seva Sangh Trust, Chitrakoot, Madhya Pradesh, India
| |
Collapse
|
19
|
Wei Q, Yao Z, Cui Y, Wei B, Jin Z, Xu X. Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. J Biomed Inform 2024; 151:104620. [PMID: 38462064 DOI: 10.1016/j.jbi.2024.104620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/27/2024] [Accepted: 02/29/2024] [Indexed: 03/12/2024]
Abstract
OBJECTIVE Large language models (LLMs) such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in answering medical questions and provide direction for future research. METHODS An extensive literature search was conducted on June 15, 2023, across ten medical databases. The keyword used was "ChatGPT," without restrictions on publication type, language, or date. Studies evaluating ChatGPT's performance in answering medical questions were included. Exclusions comprised review articles, comments, patents, non-medical evaluations of ChatGPT, and preprint studies. Data was extracted on general study characteristics, question sources, conversation processes, assessment metrics, and performance of ChatGPT. An evaluation framework for LLM in medical inquiries was proposed by integrating insights from selected literature. This study is registered with PROSPERO, CRD42023456327. RESULTS A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. ChatGPT displayed an overall integrated accuracy of 56 % (95 % CI: 51 %-60 %, I2 = 87 %) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. As per our proposed evaluation framework, many studies failed to report methodological details, such as the date of inquiry, version of ChatGPT, and inter-rater consistency. CONCLUSION This review reveals ChatGPT's potential in addressing medical inquiries, but the heterogeneity of the study design and insufficient reporting might affect the results' reliability. Our proposed evaluation framework provides insights for the future study design and transparent reporting of LLM in responding to medical questions.
Collapse
Affiliation(s)
- Qiuhong Wei
- Big Data Center for Children's Medical Care, Children's Hospital of Chongqing Medical University, Chongqing, China; Children Nutrition Research Center, Children's Hospital of Chongqing Medical University, Chongqing, China; National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, Chongqing Key Laboratory of Child Neurodevelopment and Cognitive Disorders, Chongqing, China
| | - Zhengxiong Yao
- Department of Neurology, Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Ying Cui
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Bo Wei
- Department of Global Statistics and Data Science, BeiGene USA Inc., San Mateo, CA, USA
| | - Zhezhen Jin
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Ximing Xu
- Big Data Center for Children's Medical Care, Children's Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
20
|
Nikdel M, Ghadimi H, Tavakoli M, Suh DW. Assessment of the Responses of the Artificial Intelligence-based Chatbot ChatGPT-4 to Frequently Asked Questions About Amblyopia and Childhood Myopia. J Pediatr Ophthalmol Strabismus 2024; 61:86-89. [PMID: 37882183 DOI: 10.3928/01913913-20231005-02] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Abstract
PURPOSE To assess the responses of the ChatGPT-4, the forerunner artificial intelligence-based chatbot, to frequently asked questions regarding two common pediatric ophthalmologic disorders, amblyopia and childhood myopia. METHODS Twenty-seven questions about amblyopia and 28 questions about childhood myopia were asked of the ChatGPT twice (totally 110 questions). The responses were evaluated by two pediatric ophthalmologists as acceptable, incomplete, or unacceptable. RESULTS There was remarkable agreement (96.4%) between the two pediatric ophthalmologists on their assessment of the responses. Acceptable responses were provided by the ChatGPT to 93 of 110 (84.6%) questions in total (44 of 54 [81.5%] for amblyopia and 49 of 56 [87.5%] questions for childhood myopia). Seven of 54 (12.9%) responses to questions on amblyopia were graded as incomplete compared to 4 of 56 (7.1%) of questions on childhood myopia. The ChatGPT gave inappropriate responses to three questions about amblyopia (5.6%) and childhood myopia (5.4%). The most noticeable inappropriate responses were related to the definition of reverse amblyopia and the threshold of refractive error for prescription of spectacles to children with myopia. CONCLUSIONS The ChatGPT has the potential to serve as an adjunct informational tool for pediatric ophthalmology patients and their caregivers by demonstrating a relatively good performance in answering 84.6% of the most frequently asked questions about amblyopia and childhood myopia. [J Pediatr Ophthalmol Strabismus. 2024;61(2):86-89.].
Collapse
|
21
|
Yalla GR, Hyman N, Hock LE, Zhang Q, Shukla AG, Kolomeyer NN. Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures. Cureus 2024; 16:e56766. [PMID: 38650824 PMCID: PMC11034394 DOI: 10.7759/cureus.56766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2024] [Indexed: 04/25/2024] Open
Abstract
Introduction With the potential for artificial intelligence (AI) chatbots to serve as the primary source of glaucoma information to patients, it is essential to characterize the information that chatbots provide such that providers can tailor discussions, anticipate patient concerns, and identify misleading information. Therefore, the purpose of this study was to evaluate glaucoma information from AI chatbots, including ChatGPT-4, Bard, and Bing, by analyzing response accuracy, comprehensiveness, readability, word count, and character count in comparison to each other and glaucoma-related American Academy of Ophthalmology (AAO) patient materials. Methods Section headers from AAO glaucoma-related patient education brochures were adapted into question form and asked five times to each AI chatbot (ChatGPT-4, Bard, and Bing). Two sets of responses from each chatbot were used to evaluate the accuracy of AI chatbot responses and AAO brochure information, and the comprehensiveness of AI chatbot responses compared to the AAO brochure information, scored 1-5 by three independent glaucoma-trained ophthalmologists. Readability (assessed with Flesch-Kincaid Grade Level (FKGL), corresponding to the United States school grade levels), word count, and character count were determined for all chatbot responses and AAO brochure sections. Results Accuracy scores for AAO, ChatGPT, Bing, and Bard were 4.84, 4.26, 4.53, and 3.53, respectively. On direct comparison, AAO was more accurate than ChatGPT (p=0.002), and Bard was the least accurate (Bard versus AAO, p<0.001; Bard versus ChatGPT, p<0.002; Bard versus Bing, p=0.001). ChatGPT had the most comprehensive responses (ChatGPT versus Bing, p<0.001; ChatGPT versus Bard p=0.008), with comprehensiveness scores for ChatGPT, Bing, and Bard at 3.32, 2.16, and 2.79, respectively. AAO information and Bard responses were at the most accessible readability levels (AAO versus ChatGPT, AAO versus Bing, Bard versus ChatGPT, Bard versus Bing, all p<0.0001), with readability levels for AAO, ChatGPT, Bing, and Bard at 8.11, 13.01, 11.73, and 7.90, respectively. Bing responses had the lowest word and character count. Conclusion AI chatbot responses varied in accuracy, comprehensiveness, and readability. With accuracy scores and comprehensiveness below that of AAO brochures and elevated readability levels, AI chatbots require improvements to be a more useful supplementary source of glaucoma information for patients. Physicians must be aware of these limitations such that patients are asked about existing knowledge and questions and are then provided with clarifying and comprehensive information.
Collapse
Affiliation(s)
- Goutham R Yalla
- Department of Ophthalmology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, USA
- Glaucoma Research Center, Wills Eye Hospital, Philadelphia, USA
| | - Nicholas Hyman
- Department of Ophthalmology, Vagelos College of Physicians and Surgeons, Columbia University, New York, USA
- Department of Ophthalmology, Glaucoma Division, Columbia University Irving Medical Center, New York, USA
| | - Lauren E Hock
- Glaucoma Research Center, Wills Eye Hospital, Philadelphia, USA
| | - Qiang Zhang
- Glaucoma Research Center, Wills Eye Hospital, Philadelphia, USA
- Biostatistics Consulting Core, Vickie and Jack Farber Vision Research Center, Wills Eye Hospital, Philadelphia, USA
| | - Aakriti G Shukla
- Department of Ophthalmology, Glaucoma Division, Columbia University Irving Medical Center, New York, USA
| | | |
Collapse
|
22
|
Høj S, Thomsen SF, Meteran H, Sigsgaard T, Meteran H. Artificial intelligence and allergic rhinitis: does ChatGPT increase or impair the knowledge? J Public Health (Oxf) 2024; 46:123-126. [PMID: 37968109 DOI: 10.1093/pubmed/fdad219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 09/14/2023] [Accepted: 10/06/2023] [Indexed: 11/17/2023] Open
Abstract
BACKGROUND Optimal management of allergic rhinitis requires patient education with easy access to accurate information. However, previous online platforms have provided misleading information. The demand for online medical information continues to grow, especially with the introduction of advanced chatbots like ChatGPT. METHODS This study aimed to evaluate the quality of information provided by ChatGPT regarding allergic rhinitis. A Likert scale was used to assess the accuracy of responses, ranging from 1 to 5. Four authors independently rated the responses from a healthcare professional's perspective. RESULTS A total of 20 questions covering various aspects of allergic rhinitis were asked. Among the answers, eight received a score of 5 (no inaccuracies), five received a score of 4 (minor non-harmful inaccuracies), six received a score of 3 (potentially misinterpretable inaccuracies) and one answer had a score of 2 (minor potentially harmful inaccuracies). CONCLUSIONS The variability in accuracy scores highlights the need for caution when relying solely on chatbots like ChatGPT for medical advice. Patients should consult qualified healthcare professionals and use online sources as a supplement. While ChatGPT has advantages in medical information delivery, its use should be approached with caution. ChatGPT can be useful for patient education but cannot replace healthcare professionals.
Collapse
Affiliation(s)
- Simon Høj
- Steno Diabetes Center Copenhagen, Copenhagen University Hospital, Herlev 2730, Denmark
- Department of Dermatology, Venereology, and Wound Healing Centre, Copenhagen University Hospital-Bispebjerg, Copenhagen 2400, Denmark
- Department of Public Health, Environment, Occupation, and Health, Aarhus University, Aarhus 8000, Denmark
| | - Simon F Thomsen
- Department of Dermatology, Venereology, and Wound Healing Centre, Copenhagen University Hospital-Bispebjerg, Copenhagen 2400, Denmark
- Department of Biomedical Sciences, University of Copenhagen, Copenhagen 2200, Denmark
| | - Hanieh Meteran
- Department of Internal Medicine, Section of Endocrinology, Copenhagen University Hospital-Hvidovre, Hvidovre 2650, Denmark
| | - Torben Sigsgaard
- Department of Public Health, Environment, Occupation, and Health, Aarhus University, Aarhus 8000, Denmark
| | - Howraman Meteran
- Department of Public Health, Environment, Occupation, and Health, Aarhus University, Aarhus 8000, Denmark
- Department of Internal Medicine, Respiratory Medicine Section, Copenhagen University Hospital-Hvidovre, Hvidovre 2650, Denmark
- Department of Respiratory Medicine, Zealand University Hospital Roskilde-Næstved, Næstved 4700, Denmark
| |
Collapse
|
23
|
Marshall RF, Mallem K, Xu H, Thorne J, Burkholder B, Chaon B, Liberman P, Berkenstock M. Investigating the Accuracy and Completeness of an Artificial Intelligence Large Language Model About Uveitis: An Evaluation of ChatGPT. Ocul Immunol Inflamm 2024:1-4. [PMID: 38394625 DOI: 10.1080/09273948.2024.2317417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 02/06/2024] [Indexed: 02/25/2024]
Abstract
PURPOSE To assess the accuracy and completeness of ChatGPT-generated answers regarding uveitis description, prevention, treatment, and prognosis. METHODS Thirty-two uveitis-related questions were generated by a uveitis specialist and inputted into ChatGPT 3.5. Answers were compiled into a survey and were reviewed by five uveitis specialists using standardized Likert scales of accuracy and completeness. RESULTS In total, the median accuracy score for all the uveitis questions (n = 32) was 4.00 (between "more correct than incorrect" and "nearly all correct"), and the median completeness score was 2.00 ("adequate, addresses all aspects of the question and provides the minimum amount of information required to be considered complete"). The interrater variability assessment had a total kappa value of 0.0278 for accuracy and 0.0847 for completeness. CONCLUSION ChatGPT can provide relatively high accuracy responses for various questions related to uveitis; however, the answers it provides are incomplete, with some inaccuracies. Its utility in providing medical information requires further validation and development prior to serving as a source of uveitis information for patients.
Collapse
Affiliation(s)
- Rayna F Marshall
- The Drexel University College of Medicine, Philadelphia, Pennsylvania, USA
| | - Krishna Mallem
- The Drexel University College of Medicine, Philadelphia, Pennsylvania, USA
| | - Hannah Xu
- University of California San Diego, San Diego, California, USA
| | - Jennifer Thorne
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Bryn Burkholder
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Benjamin Chaon
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Paulina Liberman
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Meghan Berkenstock
- The Wilmer Eye Institute, Division of Ocular Immunology, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
24
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
25
|
Hatia A, Doldo T, Parrini S, Chisci E, Cipriani L, Montagna L, Lagana G, Guenza G, Agosta E, Vinjolli F, Hoxha M, D’Amelio C, Favaretto N, Chisci G. Accuracy and Completeness of ChatGPT-Generated Information on Interceptive Orthodontics: A Multicenter Collaborative Study. J Clin Med 2024; 13:735. [PMID: 38337430 PMCID: PMC10856539 DOI: 10.3390/jcm13030735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 01/21/2024] [Accepted: 01/25/2024] [Indexed: 02/12/2024] Open
Abstract
Background: this study aims to investigate the accuracy and completeness of ChatGPT in answering questions and solving clinical scenarios of interceptive orthodontics. Materials and Methods: ten specialized orthodontists from ten Italian postgraduate orthodontics schools developed 21 clinical open-ended questions encompassing all of the subspecialities of interceptive orthodontics and 7 comprehensive clinical cases. Questions and scenarios were inputted into ChatGPT4, and the resulting answers were evaluated by the researchers using predefined accuracy (range 1-6) and completeness (range 1-3) Likert scales. Results: For the open-ended questions, the overall median score was 4.9/6 for the accuracy and 2.4/3 for completeness. In addition, the reviewers rated the accuracy of open-ended answers as entirely correct (score 6 on Likert scale) in 40.5% of cases and completeness as entirely correct (score 3 n Likert scale) in 50.5% of cases. As for the clinical cases, the overall median score was 4.9/6 for accuracy and 2.5/3 for completeness. Overall, the reviewers rated the accuracy of clinical case answers as entirely correct in 46% of cases and the completeness of clinical case answers as entirely correct in 54.3% of cases. Conclusions: The results showed a high level of accuracy and completeness in AI responses and a great ability to solve difficult clinical cases, but the answers were not 100% accurate and complete. ChatGPT is not yet sophisticated enough to replace the intellectual work of human beings.
Collapse
Affiliation(s)
- Arjeta Hatia
- Orthodontics Postgraduate School, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy; (T.D.); (L.C.)
| | - Tiziana Doldo
- Orthodontics Postgraduate School, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy; (T.D.); (L.C.)
| | - Stefano Parrini
- Oral Surgery Postgraduate School, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy;
| | - Elettra Chisci
- Orthodontics Postgraduate School, University of Ferrara, 44121 Ferrara, Italy
| | - Linda Cipriani
- Orthodontics Postgraduate School, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy; (T.D.); (L.C.)
| | - Livia Montagna
- Orthodontics Postgraduate School, University of Cagliari, 09121 Cagliari, Italy;
| | - Giuseppina Lagana
- Orthodontics Postgraduate School, “Sapienza” University of Rome, 00185 Rome, Italy;
| | - Guia Guenza
- Orthodontics Postgraduate School, University of Milano, 20019 Milan, Italy
| | - Edoardo Agosta
- Orthodontics Postgraduate School, University of Torino, 10024 Turin, Italy
| | - Franceska Vinjolli
- Orthodontics Postgraduate School, University of Roma Tor Vergata, 00133 Rome, Italy;
| | - Meladiona Hoxha
- Orthodontics Postgraduate School, “Cattolica” University of Rome, 00168 Rome, Italy;
| | - Claudio D’Amelio
- Orthodontics Postgraduate School, University of Chieti, 66100 Chieti, Italy;
| | - Nicolò Favaretto
- Orthodontics Postgraduate School, University of Trieste, 34100 Trieste, Italy
| | - Glauco Chisci
- Oral Surgery Postgraduate School, Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy;
| |
Collapse
|
26
|
Gravina AG, Pellegrino R, Cipullo M, Palladino G, Imperio G, Ventura A, Auletta S, Ciamarra P, Federico A. May ChatGPT be a tool producing medical information for common inflammatory bowel disease patients' questions? An evidence-controlled analysis. World J Gastroenterol 2024; 30:17-33. [PMID: 38293321 PMCID: PMC10823903 DOI: 10.3748/wjg.v30.i1.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/07/2023] [Accepted: 12/28/2023] [Indexed: 01/06/2024] Open
Abstract
Artificial intelligence is increasingly entering everyday healthcare. Large language model (LLM) systems such as Chat Generative Pre-trained Transformer (ChatGPT) have become potentially accessible to everyone, including patients with inflammatory bowel diseases (IBD). However, significant ethical issues and pitfalls exist in innovative LLM tools. The hype generated by such systems may lead to unweighted patient trust in these systems. Therefore, it is necessary to understand whether LLMs (trendy ones, such as ChatGPT) can produce plausible medical information (MI) for patients. This review examined ChatGPT's potential to provide MI regarding questions commonly addressed by patients with IBD to their gastroenterologists. From the review of the outputs provided by ChatGPT, this tool showed some attractive potential while having significant limitations in updating and detailing information and providing inaccurate information in some cases. Further studies and refinement of the ChatGPT, possibly aligning the outputs with the leading medical evidence provided by reliable databases, are needed.
Collapse
Affiliation(s)
- Antonietta Gerarda Gravina
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Raffaele Pellegrino
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Marina Cipullo
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Giovanna Palladino
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Giuseppe Imperio
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Andrea Ventura
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Salvatore Auletta
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Paola Ciamarra
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Alessandro Federico
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| |
Collapse
|
27
|
Shemer A, Cohen M, Altarescu A, Atar-Vardi M, Hecht I, Dubinsky-Pertzov B, Shoshany N, Zmujack S, Or L, Einan-Lifshitz A, Pras E. Diagnostic capabilities of ChatGPT in ophthalmology. Graefes Arch Clin Exp Ophthalmol 2024:10.1007/s00417-023-06363-z. [PMID: 38183467 DOI: 10.1007/s00417-023-06363-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 12/04/2023] [Accepted: 12/23/2023] [Indexed: 01/08/2024] Open
Abstract
PURPOSE The purpose of this study is to assess the diagnostic accuracy of ChatGPT in the field of ophthalmology. METHODS This is a retrospective cohort study conducted in one academic tertiary medical center. We reviewed data of patients admitted to the ophthalmology department from 06/2022 to 01/2023. We then created two clinical cases for each patient. The first case is according to the medical history alone (Hx). The second case includes an addition of the clinical examination (Hx and Ex). For each case, we asked for the three most likely diagnoses from ChatGPT, residents, and attendings. Then, we compared the accuracy rates (at least one correct diagnosis) of all groups. Additionally, we evaluated the total duration for completing the assignment between the groups. RESULTS ChatGPT, residents, and attendings evaluated 126 cases from 63 patients (history only or history and exam findings for each patient). ChatGPT achieved a significantly lower accurate diagnosis rate (54%) in the Hx, as compared to the residents (75%; p < 0.01) and attendings (71%; p < 0.01). After adding the clinical examination findings, the diagnosis rate of ChatGPT was 68%, whereas for the residents and the attendings, it increased to 94% (p < 0.01) and 86% (p < 0.01), respectively. ChatGPT was 4 to 5 times faster than the attendings and residents. CONCLUSIONS AND RELEVANCE ChatGPT showed low diagnostic rates in ophthalmology cases compared to residents and attendings based on patient history alone or with additional clinical examination findings. However, ChatGPT completed the task faster than the physicians.
Collapse
Affiliation(s)
- Asaf Shemer
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel.
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Michal Cohen
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Health Science, Ben-Gurion University of the Negev, South District, Beer-Sheva, Israel
| | - Aya Altarescu
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Maya Atar-Vardi
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Idan Hecht
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Biana Dubinsky-Pertzov
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Nadav Shoshany
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Sigal Zmujack
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Lior Or
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Adi Einan-Lifshitz
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eran Pras
- Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- The Matlow's Ophthalmo-Genetics Laboratory, Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel
| |
Collapse
|
28
|
Morales-Ramirez P, Mishek H, Dasgupta A. The Genie Is Out of the Bottle: What ChatGPT Can and Cannot Do for Medical Professionals. Obstet Gynecol 2024; 143:e1-e6. [PMID: 37944140 DOI: 10.1097/aog.0000000000005446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/12/2023] [Indexed: 11/12/2023]
Abstract
ChatGPT is a cutting-edge artificial intelligence technology that was released for public use in November 2022. Its rapid adoption has raised questions about capabilities, limitations, and risks. This article presents an overview of ChatGPT, and it highlights the current state of this technology for the medical field. The article seeks to provide a balanced perspective on what the model can and cannot do in three specific domains: clinical practice, research, and medical education. It also provides suggestions on how to optimize the use of this tool.
Collapse
|
29
|
Biswas S, Logan NS, Davies LN, Sheppard AL, Wolffsohn JS. Authors' Reply: Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia. Ophthalmic Physiol Opt 2024; 44:233-234. [PMID: 37635297 DOI: 10.1111/opo.13227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 08/17/2023] [Indexed: 08/29/2023]
Affiliation(s)
- Sayantan Biswas
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Nicola S Logan
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Leon N Davies
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Amy L Sheppard
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - James S Wolffsohn
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| |
Collapse
|
30
|
Wong M, Lim ZW, Pushpanathan K, Cheung CY, Wang YX, Chen D, Tham YC. Review of emerging trends and projection of future developments in large language models research in ophthalmology. Br J Ophthalmol 2023:bjo-2023-324734. [PMID: 38164563 DOI: 10.1136/bjo-2023-324734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 11/14/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND Large language models (LLMs) are fast emerging as potent tools in healthcare, including ophthalmology. This systematic review offers a twofold contribution: it summarises current trends in ophthalmology-related LLM research and projects future directions for this burgeoning field. METHODS We systematically searched across various databases (PubMed, Europe PMC, Scopus and Web of Science) for articles related to LLM use in ophthalmology, published between 1 January 2022 and 31 July 2023. Selected articles were summarised, and categorised by type (editorial, commentary, original research, etc) and their research focus (eg, evaluating ChatGPT's performance in ophthalmology examinations or clinical tasks). FINDINGS We identified 32 articles meeting our criteria, published between January and July 2023, with a peak in June (n=12). Most were original research evaluating LLMs' proficiency in clinically related tasks (n=9). Studies demonstrated that ChatGPT-4.0 outperformed its predecessor, ChatGPT-3.5, in ophthalmology exams. Furthermore, ChatGPT excelled in constructing discharge notes (n=2), evaluating diagnoses (n=2) and answering general medical queries (n=6). However, it struggled with generating scientific articles or abstracts (n=3) and answering specific subdomain questions, especially those regarding specific treatment options (n=2). ChatGPT's performance relative to other LLMs (Google's Bard, Microsoft's Bing) varied by study design. Ethical concerns such as data hallucination (n=27), authorship (n=5) and data privacy (n=2) were frequently cited. INTERPRETATION While LLMs hold transformative potential for healthcare and ophthalmology, concerns over accountability, accuracy and data security remain. Future research should focus on application programming interface integration, comparative assessments of popular LLMs, their ability to interpret image-based data and the establishment of standardised evaluation frameworks.
Collapse
Affiliation(s)
| | - Zhi Wei Lim
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Krithi Pushpanathan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Carol Y Cheung
- Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong, Hong Kong
| | - Ya Xing Wang
- Beijing Institute of Ophthalmology, Beijing Tongren Hospital, Capital University of Medical Science, Beijing, China
| | - David Chen
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore
| | - Yih Chung Tham
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
| |
Collapse
|
31
|
Ittarat M, Cheungpasitporn W, Chansangpetch S. Personalized Care in Eye Health: Exploring Opportunities, Challenges, and the Road Ahead for Chatbots. J Pers Med 2023; 13:1679. [PMID: 38138906 PMCID: PMC10744965 DOI: 10.3390/jpm13121679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/24/2023] Open
Abstract
In modern eye care, the adoption of ophthalmology chatbots stands out as a pivotal technological progression. These digital assistants present numerous benefits, such as better access to vital information, heightened patient interaction, and streamlined triaging. Recent evaluations have highlighted their performance in both the triage of ophthalmology conditions and ophthalmology knowledge assessment, underscoring their potential and areas for improvement. However, assimilating these chatbots into the prevailing healthcare infrastructures brings challenges. These encompass ethical dilemmas, legal compliance, seamless integration with electronic health records (EHR), and fostering effective dialogue with medical professionals. Addressing these challenges necessitates the creation of bespoke standards and protocols for ophthalmology chatbots. The horizon for these chatbots is illuminated by advancements and anticipated innovations, poised to redefine the delivery of eye care. The synergy of artificial intelligence (AI) and machine learning (ML) with chatbots amplifies their diagnostic prowess. Additionally, their capability to adapt linguistically and culturally ensures they can cater to a global patient demographic. In this article, we explore in detail the utilization of chatbots in ophthalmology, examining their accuracy, reliability, data protection, security, transparency, potential algorithmic biases, and ethical considerations. We provide a comprehensive review of their roles in the triage of ophthalmology conditions and knowledge assessment, emphasizing their significance and future potential in the field.
Collapse
Affiliation(s)
- Mantapond Ittarat
- Surin Hospital and Surin Medical Education Center, Suranaree University of Technology, Surin 32000, Thailand;
| | | | - Sunee Chansangpetch
- Center of Excellence in Glaucoma, Chulalongkorn University, Bangkok 10330, Thailand;
- Department of Ophthalmology, Faculty of Medicine, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok 10330, Thailand
| |
Collapse
|
32
|
Kunze KN. Editorial Commentary: Recognizing and Avoiding Medical Misinformation Across Digital Platforms: Smoke, Mirrors (and Streaming). Arthroscopy 2023; 39:2454-2455. [PMID: 37981387 DOI: 10.1016/j.arthro.2023.06.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 06/27/2023] [Accepted: 06/30/2023] [Indexed: 11/21/2023]
Abstract
The evolution of social media and related online sources has substantially increased the ability of patients to query and access publicly available information that may have relevance to a potential musculoskeletal condition of interest. Although increased accessibility to information has several purported benefits, including encouragement of patients to become more invested in their care through self-teaching, a downside to the existence of a vast number of unregulated resources remains the risk of misinformation. As health care providers, we have a moral and ethical obligation to mitigate this risk by directing patients to high-quality resources for medical information and to be aware of resources that are unreliable. To this end, a growing body of evidence has suggested that YouTube lacks reliability and quality in terms of medical information concerning a variety of musculoskeletal conditions.
Collapse
|
33
|
Ferro Desideri L, Roth J, Zinkernagel M, Anguita R. "Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration". Int J Retina Vitreous 2023; 9:71. [PMID: 37980501 PMCID: PMC10657493 DOI: 10.1186/s40942-023-00511-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 11/11/2023] [Indexed: 11/20/2023] Open
Abstract
INTRODUCTION Age-related macular degeneration (AMD) affects millions of people globally, leading to a surge in online research of putative diagnoses, causing potential misinformation and anxiety in patients and their parents. This study explores the efficacy of artificial intelligence-derived large language models (LLMs) like in addressing AMD patients' questions. METHODS ChatGPT 3.5 (2023), Bing AI (2023), and Google Bard (2023) were adopted as LLMs. Patients' questions were subdivided in two question categories, (a) general medical advice and (b) pre- and post-intravitreal injection advice and classified as (1) accurate and sufficient (2) partially accurate but sufficient and (3) inaccurate and not sufficient. Non-parametric test has been done to compare the means between the 3 LLMs scores and also an analysis of variance and reliability tests were performed among the 3 groups. RESULTS In category a) of questions, the average score was 1.20 (± 0.41) with ChatGPT 3.5, 1.60 (± 0.63) with Bing AI and 1.60 (± 0.73) with Google Bard, showing no significant differences among the 3 groups (p = 0.129). The average score in category b was 1.07 (± 0.27) with ChatGPT 3.5, 1.69 (± 0.63) with Bing AI and 1.38 (± 0.63) with Google Bard, showing a significant difference among the 3 groups (p = 0.0042). Reliability statistics showed Chronbach's α of 0.237 (range 0.448, 0.096-0.544). CONCLUSION ChatGPT 3.5 consistently offered the most accurate and satisfactory responses, particularly with technical queries. While LLMs displayed promise in providing precise information about AMD; however, further improvements are needed especially in more technical questions.
Collapse
Affiliation(s)
- Lorenzo Ferro Desideri
- Department of Ophthalmology, Inselspital, University Hospital of Bern, Bern, Switzerland.
- Bern Photographic Reading Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
| | - Janice Roth
- Department of Ophthalmology, Inselspital, University Hospital of Bern, Bern, Switzerland
| | - Martin Zinkernagel
- Department of Ophthalmology, Inselspital, University Hospital of Bern, Bern, Switzerland
- Bern Photographic Reading Center, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Rodrigo Anguita
- Department of Ophthalmology, Inselspital, University Hospital of Bern, Bern, Switzerland
- Moorfields Eye Hospital NHS Foundation Trust, City Road, London, EC1V 2PD, UK
| |
Collapse
|
34
|
Biswas S, Logan NS, Davies LN, Sheppard AL, Wolffsohn JS. Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia. Ophthalmic Physiol Opt 2023; 43:1562-1570. [PMID: 37476960 DOI: 10.1111/opo.13207] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/04/2023] [Accepted: 07/11/2023] [Indexed: 07/22/2023]
Abstract
PURPOSE ChatGPT is an artificial intelligence language model, which uses natural language processing to simulate human conversation. It has seen a wide range of applications including healthcare education, research and clinical practice. This study evaluated the accuracy of ChatGPT in providing accurate and quality information to answer questions on myopia. METHODS A series of 11 questions (nine categories of general summary, cause, symptom, onset, prevention, complication, natural history, treatment and prognosis) were generated for this cross-sectional study. Each question was entered five times into fresh ChatGPT sessions (free from influence of prior questions). The responses were evaluated by a five-member team of optometry teaching and research staff. The evaluators individually rated the accuracy and quality of responses on a Likert scale, where a higher score indicated greater quality of information (1: very poor; 2: poor; 3: acceptable; 4: good; 5: very good). Median scores for each question were estimated and compared between evaluators. Agreement between the five evaluators and the reliability statistics of the questions were estimated. RESULTS Of the 11 questions on myopia, ChatGPT provided good quality information (median scores: 4.0) for 10 questions and acceptable responses (median scores: 3.0) for one question. Out of 275 responses in total, 66 (24%) were rated very good, 134 (49%) were rated good, whereas 60 (22%) were rated acceptable, 10 (3.6%) were rated poor and 5 (1.8%) were rated very poor. Cronbach's α of 0.807 indicated good level of agreement between test items. Evaluators' ratings demonstrated 'slight agreement' (Fleiss's κ, 0.005) with a significant difference in scoring among the evaluators (Kruskal-Wallis test, p < 0.001). CONCLUSION Overall, ChatGPT generated good quality information to answer questions on myopia. Although ChatGPT shows great potential in rapidly providing information on myopia, the presence of inaccurate responses demonstrates that further evaluation and awareness concerning its limitations are crucial to avoid potential misinterpretation.
Collapse
Affiliation(s)
- Sayantan Biswas
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Nicola S Logan
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Leon N Davies
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - Amy L Sheppard
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| | - James S Wolffsohn
- School of Optometry, College of Health and Life Sciences, Aston University, Birmingham, UK
| |
Collapse
|
35
|
Mese I, Taslicay CA, Sivrioglu AK. Improving radiology workflow using ChatGPT and artificial intelligence. Clin Imaging 2023; 103:109993. [PMID: 37812965 DOI: 10.1016/j.clinimag.2023.109993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/19/2023] [Accepted: 09/28/2023] [Indexed: 10/11/2023]
Abstract
Artificial Intelligence is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence. One of the branches of artificial intelligence is natural language processing, which is dedicated to studying the interaction between computers and human language. ChatGPT is a sophisticated natural language processing tool that can understand and respond to complex questions and commands in natural language. Radiology is a vital aspect of modern medicine that involves the use of imaging technologies to diagnose and treat medical conditions artificial intelligence, including ChatGPT, can be integrated into radiology workflows to improve efficiency, accuracy, and patient care. ChatGPT can streamline various radiology workflow steps, including patient registration, scheduling, patient check-in, image acquisition, interpretation, and reporting. While ChatGPT has the potential to transform radiology workflows, there are limitations to the technology that must be addressed, such as the potential for bias in artificial intelligence algorithms and ethical concerns. As technology continues to advance, ChatGPT is likely to become an increasingly important tool in the field of radiology, and in healthcare more broadly.
Collapse
Affiliation(s)
- Ismail Mese
- Department of Radiology, Health Sciences University, Erenkoy Mental Health and Neurology Training and Research Hospital, 19 Mayıs, Sinan Ercan Cd. No: 23, Kadıköy/Istanbul 34736, Turkey.
| | | | - Ali Kemal Sivrioglu
- Department of Radiology, Liv Hospital Vadistanbul, Ayazağa Mahallesi, Kemerburgaz Caddesi, Vadistanbul Park Etabı, 7F Blok, 34396 Sarıyer/İstanbul, Turkey
| |
Collapse
|
36
|
Taloni A, Borselli M, Scarsi V, Rossi C, Coco G, Scorcia V, Giannaccare G. Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology. Sci Rep 2023; 13:18562. [PMID: 37899405 PMCID: PMC10613606 DOI: 10.1038/s41598-023-45837-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 10/24/2023] [Indexed: 10/31/2023] Open
Abstract
To compare the performance of humans, GPT-4.0 and GPT-3.5 in answering multiple-choice questions from the American Academy of Ophthalmology (AAO) Basic and Clinical Science Course (BCSC) self-assessment program, available at https://www.aao.org/education/self-assessments . In June 2023, text-based multiple-choice questions were submitted to GPT-4.0 and GPT-3.5. The AAO provides the percentage of humans who selected the correct answer, which was analyzed for comparison. All questions were classified by 10 subspecialties and 3 practice areas (diagnostics/clinics, medical treatment, surgery). Out of 1023 questions, GPT-4.0 achieved the best score (82.4%), followed by humans (75.7%) and GPT-3.5 (65.9%), with significant difference in accuracy rates (always P < 0.0001). Both GPT-4.0 and GPT-3.5 showed the worst results in surgery-related questions (74.6% and 57.0% respectively). For difficult questions (answered incorrectly by > 50% of humans), both GPT models favorably compared to humans, without reaching significancy. The word count for answers provided by GPT-4.0 was significantly lower than those produced by GPT-3.5 (160 ± 56 and 206 ± 77 respectively, P < 0.0001); however, incorrect responses were longer (P < 0.02). GPT-4.0 represented a substantial improvement over GPT-3.5, achieving better performance than humans in an AAO BCSC self-assessment test. However, ChatGPT is still limited by inconsistency across different practice areas, especially when it comes to surgery.
Collapse
Affiliation(s)
- Andrea Taloni
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Massimiliano Borselli
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Valentina Scarsi
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Costanza Rossi
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Giulia Coco
- Department of Clinical Sciences and Translational Medicine, University of Rome Tor Vergata, Rome, Italy
| | - Vincenzo Scorcia
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Giuseppe Giannaccare
- Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy.
- Department of Surgical Sciences, Eye Clinic, University of Cagliari, Via Università 40, 09124, Cagliari, Italy.
| |
Collapse
|
37
|
Ong J, Hariprasad SM, Chhablani J. ChatGPT and GPT-4 in Ophthalmology: Applications of Large Language Model Artificial Intelligence in Retina. Ophthalmic Surg Lasers Imaging Retina 2023; 54:557-562. [PMID: 37847163 DOI: 10.3928/23258160-20230926-01] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2023]
|
38
|
Rojas-Carabali W, Cifuentes-González C, Wei X, Putera I, Sen A, Thng ZX, Agrawal R, Elze T, Sobrin L, Kempen JH, Lee B, Biswas J, Nguyen QD, Gupta V, de-la-Torre A, Agrawal R. Evaluating the Diagnostic Accuracy and Management Recommendations of ChatGPT in Uveitis. Ocul Immunol Inflamm 2023:1-6. [PMID: 37722842 DOI: 10.1080/09273948.2023.2253471] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 08/25/2023] [Indexed: 09/20/2023]
Abstract
INTRODUCTION Accurate diagnosis and timely management are vital for favorable uveitis outcomes. Artificial Intelligence (AI) holds promise in medical decision-making, particularly in ophthalmology. Yet, the diagnostic precision and management advice from AI-based uveitis chatbots lack assessment. METHODS We appraised diagnostic accuracy and management suggestions of an AI-based chatbot, ChatGPT, versus five uveitis-trained ophthalmologists, using 25 standard cases aligned with new Uveitis Nomenclature guidelines. Participants predicted likely diagnoses, two differentials, and next management steps. Comparative success rates were computed. RESULTS Ophthalmologists excelled (60-92%) in likely diagnosis, exceeding AI (60%). Considering fully and partially accurate diagnoses, ophthalmologists achieved 76-100% success; AI attained 72%. Despite an 8% AI improvement, its overall performance lagged. Ophthalmologists and AI agreed on diagnosis in 48% cases, with 91.6% exhibiting concurrence in management plans. CONCLUSIONS The study underscores AI chatbots' potential in uveitis diagnosis and management, indicating their value in reducing diagnostic errors. Further research is essential to enhance AI chatbot precision in diagnosis and recommendations.
Collapse
Affiliation(s)
- William Rojas-Carabali
- National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
- Department of Bioinformatics, Lee Kong Chiang School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Carlos Cifuentes-González
- Neuroscience Research Group (NEUROS), Neurovitae Center for Neuroscience, Institute of Translational Medicine (IMT), Escuela de Medicina y Ciencias de la Salud, Universidad del Rosario, Bogotá, Colombia
| | - Xin Wei
- National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
| | - Ikhwanuliman Putera
- Department of Ophthalmology, Faculty of Medicine Universitas Indonesia - CiptoMangunkusmoKirana Eye Hospital, Jakarta, Indonesia
- Laboratory Medical Immunology, Department of Immunology, ErasmusMC, University Medical Centre, Rotterdam, the Netherlands
- Department of Internal Medicine, Division of Clinical Immunology, Erasmus MC, University Medical Center, Rotterdam, The Netherlands
- Department of Ophthalmology, Erasmus MC, University Medical Center, Rotterdam, The Netherlands
| | - Alok Sen
- Department of Vitreoretina and Uveitis, Sadguru Netra Chikatsalya, Chitrakoot, India
| | - Zheng Xian Thng
- National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
| | - Rajdeep Agrawal
- Department of Bioinformatics, Lee Kong Chiang School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Tobias Elze
- Department of Ophthalmology, Massachusetts Eye and Ear/Harvard Medical School, and Schepens Eye Research Institute, Boston, Massachusetts, USA
| | - Lucia Sobrin
- Department of Ophthalmology, Massachusetts Eye and Ear/Harvard Medical School, and Schepens Eye Research Institute, Boston, Massachusetts, USA
| | - John H Kempen
- Department of Ophthalmology, Massachusetts Eye and Ear/Harvard Medical School, and Schepens Eye Research Institute, Boston, Massachusetts, USA
- Community Ophthalmology, Sight for Souls, Bellevue, Washington, USA
- Department of Ophthalmology, Addis Ababa University, Addis Ababa, Ethiopia
- MyungSung Christian Medical Center (MCM) Eye Unit, MCM Comprehensive Specialized Hospital, and MyungSung Medical School, Addis Ababa, Ethiopia
| | - Bernett Lee
- National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
| | - Jyotirmay Biswas
- Department of Ocular Pathology and Uveitis, Medical Research Foundation, Sankara Netralaya, Chennai, India
| | - Quan Dong Nguyen
- Byers Eye Institute, Stanford University, Palo Alto, California, USA
| | - Vishali Gupta
- Post Graduate Institute of Medical Education and Research (PGIMER), Advance Eye Centre, Chandigarh, India
| | - Alejandra de-la-Torre
- National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
| | - Rupesh Agrawal
- MyungSung Christian Medical Center (MCM) Eye Unit, MCM Comprehensive Specialized Hospital, and MyungSung Medical School, Addis Ababa, Ethiopia
- Department of Ophthalmology and Visual Sciences, Academic Clinical Program, Duke-NUS Medical School, Singapore, Singapore
- Moorfields Eye Hospital, NHS Foundation Trust, London, UK
- Singapore Eye Research Institute, The Academia, Singapore, Singapore
| |
Collapse
|
39
|
Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun CH, Lam JSH, Chen DZ, Goh JHL, Tan MCJ, Sheng B, Cheng CY, Koh VTC, Tham YC. Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine 2023; 95:104770. [PMID: 37625267 PMCID: PMC10470220 DOI: 10.1016/j.ebiom.2023.104770] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 07/21/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023] Open
Abstract
BACKGROUND Large language models (LLMs) are garnering wide interest due to their human-like and contextually relevant responses. However, LLMs' accuracy across specific medical domains has yet been thoroughly evaluated. Myopia is a frequent topic which patients and parents commonly seek information online. Our study evaluated the performance of three LLMs namely ChatGPT-3.5, ChatGPT-4.0, and Google Bard, in delivering accurate responses to common myopia-related queries. METHODS We curated thirty-one commonly asked myopia care-related questions, which were categorised into six domains-pathogenesis, risk factors, clinical presentation, diagnosis, treatment and prevention, and prognosis. Each question was posed to the LLMs, and their responses were independently graded by three consultant-level paediatric ophthalmologists on a three-point accuracy scale (poor, borderline, good). A majority consensus approach was used to determine the final rating for each response. 'Good' rated responses were further evaluated for comprehensiveness on a five-point scale. Conversely, 'poor' rated responses were further prompted for self-correction and then re-evaluated for accuracy. FINDINGS ChatGPT-4.0 demonstrated superior accuracy, with 80.6% of responses rated as 'good', compared to 61.3% in ChatGPT-3.5 and 54.8% in Google Bard (Pearson's chi-squared test, all p ≤ 0.009). All three LLM-Chatbots showed high mean comprehensiveness scores (Google Bard: 4.35; ChatGPT-4.0: 4.23; ChatGPT-3.5: 4.11, out of a maximum score of 5). All LLM-Chatbots also demonstrated substantial self-correction capabilities: 66.7% (2 in 3) of ChatGPT-4.0's, 40% (2 in 5) of ChatGPT-3.5's, and 60% (3 in 5) of Google Bard's responses improved after self-correction. The LLM-Chatbots performed consistently across domains, except for 'treatment and prevention'. However, ChatGPT-4.0 still performed superiorly in this domain, receiving 70% 'good' ratings, compared to 40% in ChatGPT-3.5 and 45% in Google Bard (Pearson's chi-squared test, all p ≤ 0.001). INTERPRETATION Our findings underscore the potential of LLMs, particularly ChatGPT-4.0, for delivering accurate and comprehensive responses to myopia-related queries. Continuous strategies and evaluations to improve LLMs' accuracy remain crucial. FUNDING Dr Yih-Chung Tham was supported by the National Medical Research Council of Singapore (NMRC/MOH/HCSAINV21nov-0001).
Collapse
Affiliation(s)
- Zhi Wei Lim
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Krithi Pushpanathan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
| | - Samantha Min Er Yew
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
| | - Yien Lai
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - Chen-Hsin Sun
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - Janice Sing Harn Lam
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - David Ziyou Chen
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | | | - Marcus Chun Jin Tan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - Bin Sheng
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China; Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China; MoE Key Lab of Artificial Intelligence, Artificial Intelligence Institute, Shanghai Jiao Tong University, Shanghai, China
| | - Ching-Yu Cheng
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore
| | - Victor Teck Chang Koh
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
| | - Yih-Chung Tham
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore.
| |
Collapse
|
40
|
Singh S, Watson S. ChatGPT as a tool for conducting literature review for dry eye disease. Clin Exp Ophthalmol 2023; 51:731-732. [PMID: 37321598 DOI: 10.1111/ceo.14268] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/07/2023] [Accepted: 06/02/2023] [Indexed: 06/17/2023]
Affiliation(s)
- Swati Singh
- Ophthalmic Plastic Surgery Services, L V Prasad Eye Institute, Hyderabad, India
| | - Stephanie Watson
- Discipline of Ophthalmology, Sydney Medical School, The University of Sydney, Save Sight Institute, Sydney, New South Wales, Australia
| |
Collapse
|
41
|
Nielsen JPS, von Buchwald C, Grønhøj C. Validity of the large language model ChatGPT (GPT4) as a patient information source in otolaryngology by a variety of doctors in a tertiary otorhinolaryngology department. Acta Otolaryngol 2023; 143:779-782. [PMID: 37694729 DOI: 10.1080/00016489.2023.2254809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 08/26/2023] [Indexed: 09/12/2023]
Abstract
BACKGROUND A high number of patients seek health information online, and large language models (LLMs) may produce a rising amount of it. AIM This study evaluates the performance regarding health information provided by ChatGPT, a LLM developed by OpenAI, focusing on its utility as a source for otolaryngology-related patient information. MATERIAL AND METHOD A variety of doctors from a tertiary otorhinolaryngology department used a Likert scale to assess the chatbot's responses in terms of accuracy, relevance, and depth. The responses were also evaluated by ChatGPT. RESULTS The composite mean of the three categories was 3.41, with the highest performance noted in the relevance category (mean = 3.71) when evaluated by the respondents. The accuracy and depth categories yielded mean scores of 3.51 and 3.00, respectively. All the categories were rated as 5 when evaluated by ChatGPT. CONCLUSION AND SIGNIFICANCE Despite its potential in providing relevant and accurate medical information, the chatbot's responses lacked depth and were found to potentially perpetuate biases due to its training on publicly available text. In conclusion, while LLMs show promise in healthcare, further refinement is necessary to enhance response depth and mitigate potential biases.
Collapse
Affiliation(s)
- Jacob P S Nielsen
- Department of Otorhinolaryngology-Head and Neck Surgery and Audiology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Christian von Buchwald
- Department of Otorhinolaryngology-Head and Neck Surgery and Audiology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Christian Grønhøj
- Department of Otorhinolaryngology-Head and Neck Surgery and Audiology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| |
Collapse
|
42
|
Khanna RK, Ducloyer JB, Hage A, Rezkallah A, Durbant E, Bigoteau M, Mouchel R, Guillon-Rolf R, Le L, Tahiri R, Chammas J, Baudouin C. Evaluating the potential of ChatGPT-4 in ophthalmology: The good, the bad and the ugly. J Fr Ophtalmol 2023; 46:697-705. [PMID: 37573231 DOI: 10.1016/j.jfo.2023.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/10/2023] [Accepted: 07/18/2023] [Indexed: 08/14/2023]
Abstract
There is growing interest nowadays for artificial intelligence (AI) in all medical fields. Beyond the direct medical application of AI to medical data, generative AI such as "pre-trained transformer" (GPT) could significantly change the ophthalmology landscape, opening up new avenues for enhancing precision, productivity, and patient outcomes. At present, ChatGPT-4 has been investigated in various ways in ophthalmology for research, medical education, and support for clinical decisions purposes. This article intends to demonstrate the application of ChatGPT-4 within the field of ophthalmology by employing a 'mise en abime' approach. While we explore its potential to enhance the future of ophthalmology care, we will also carefully outline its current limitations and potential risks.
Collapse
Affiliation(s)
- R K Khanna
- Service d'ophtalmologie, hôpital universitaire Bretonneau, Inserm 1253 iBrain, Tours, France.
| | - J-B Ducloyer
- Service d'ophtalmologie, Nantes université, CHU de Nantes, 44000 Nantes, France
| | - A Hage
- Service d'ophtalmologie, CHNO 15-20, Paris, France
| | - A Rezkallah
- Service d'ophtalmologie, hôpital de la Croix Rousse, Lyon, France
| | - E Durbant
- Cabinet d'ophtalmologie, Paris, France
| | - M Bigoteau
- Service d'ophtalmologie, hôpital Jacques-Cœur, Bourges, France
| | - R Mouchel
- Centre ophtalmologique Kléber, Clinique du parc, Lyon, France; Centre ophtalmologique du Grand Lac, Aix-les-Bains, France
| | - R Guillon-Rolf
- Service d'ophtalmologie, Fondation Adolphe-de-Rothschild, Paris, France
| | - L Le
- Cabinet d'ophtalmologie, Chartres, France
| | - R Tahiri
- Service de chirurgie ambulatoire, centre hospitalier d'Avranches-Granville, 849, rue des Menneries, 50400 Granville, France
| | - J Chammas
- Service d'ophtalmologie, CHU Robert-Debré, Reims, France; Centre ophtalmologique du Rhin, Strasbourg, France
| | - C Baudouin
- Service d'ophtalmologie, CHNO 15-20, Sorbonne Université, Inserm, CNRS, Institut de la Vision, IHU FOReSIGHT, Paris, France
| |
Collapse
|
43
|
Watters C, Lemanski MK. Universal skepticism of ChatGPT: a review of early literature on chat generative pre-trained transformer. Front Big Data 2023; 6:1224976. [PMID: 37680954 PMCID: PMC10482048 DOI: 10.3389/fdata.2023.1224976] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/10/2023] [Indexed: 09/09/2023] Open
Abstract
ChatGPT, a new language model developed by OpenAI, has garnered significant attention in various fields since its release. This literature review provides an overview of early ChatGPT literature across multiple disciplines, exploring its applications, limitations, and ethical considerations. The review encompasses Scopus-indexed publications from November 2022 to April 2023 and includes 156 articles related to ChatGPT. The findings reveal a predominance of negative sentiment across disciplines, though subject-specific attitudes must be considered. The review highlights the implications of ChatGPT in many fields including healthcare, raising concerns about employment opportunities and ethical considerations. While ChatGPT holds promise for improved communication, further research is needed to address its capabilities and limitations. This literature review provides insights into early research on ChatGPT, informing future investigations and practical applications of chatbot technology, as well as development and usage of generative AI.
Collapse
Affiliation(s)
- Casey Watters
- Faculty of Law, Bond University, Gold Coast, QLD, Australia
| | | |
Collapse
|
44
|
Bernstein IA, Zhang Y(V, Govil D, Majid I, Chang RT, Sun Y, Shue A, Chou JC, Schehlein E, Christopher KL, Groth SL, Ludwig C, Wang SY. Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions. JAMA Netw Open 2023; 6:e2330320. [PMID: 37606922 PMCID: PMC10445188 DOI: 10.1001/jamanetworkopen.2023.30320] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 07/13/2023] [Indexed: 08/23/2023] Open
Abstract
Importance Large language models (LLMs) like ChatGPT appear capable of performing a variety of tasks, including answering patient eye care questions, but have not yet been evaluated in direct comparison with ophthalmologists. It remains unclear whether LLM-generated advice is accurate, appropriate, and safe for eye patients. Objective To evaluate the quality of ophthalmology advice generated by an LLM chatbot in comparison with ophthalmologist-written advice. Design, Setting, and Participants This cross-sectional study used deidentified data from an online medical forum, in which patient questions received responses written by American Academy of Ophthalmology (AAO)-affiliated ophthalmologists. A masked panel of 8 board-certified ophthalmologists were asked to distinguish between answers generated by the ChatGPT chatbot and human answers. Posts were dated between 2007 and 2016; data were accessed January 2023 and analysis was performed between March and May 2023. Main Outcomes and Measures Identification of chatbot and human answers on a 4-point scale (likely or definitely artificial intelligence [AI] vs likely or definitely human) and evaluation of responses for presence of incorrect information, alignment with perceived consensus in the medical community, likelihood to cause harm, and extent of harm. Results A total of 200 pairs of user questions and answers by AAO-affiliated ophthalmologists were evaluated. The mean (SD) accuracy for distinguishing between AI and human responses was 61.3% (9.7%). Of 800 evaluations of chatbot-written answers, 168 answers (21.0%) were marked as human-written, while 517 of 800 human-written answers (64.6%) were marked as AI-written. Compared with human answers, chatbot answers were more frequently rated as probably or definitely written by AI (prevalence ratio [PR], 1.72; 95% CI, 1.52-1.93). The likelihood of chatbot answers containing incorrect or inappropriate material was comparable with human answers (PR, 0.92; 95% CI, 0.77-1.10), and did not differ from human answers in terms of likelihood of harm (PR, 0.84; 95% CI, 0.67-1.07) nor extent of harm (PR, 0.99; 95% CI, 0.80-1.22). Conclusions and Relevance In this cross-sectional study of human-written and AI-generated responses to 200 eye care questions from an online advice forum, a chatbot appeared capable of responding to long user-written eye health posts and largely generated appropriate responses that did not differ significantly from ophthalmologist-written responses in terms of incorrect information, likelihood of harm, extent of harm, or deviation from ophthalmologist community standards. Additional research is needed to assess patient attitudes toward LLM-augmented ophthalmologists vs fully autonomous AI content generation, to evaluate clarity and acceptability of LLM-generated answers from the patient perspective, to test the performance of LLMs in a greater variety of clinical contexts, and to determine an optimal manner of utilizing LLMs that is ethical and minimizes harm.
Collapse
Affiliation(s)
- Isaac A. Bernstein
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, California
| | - Youchen (Victor) Zhang
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, California
| | - Devendra Govil
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, California
| | - Iyad Majid
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, California
| | - Robert T. Chang
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, California
| | - Yang Sun
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, California
| | - Ann Shue
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, California
| | - Jonathan C. Chou
- Department of Ophthalmology, Kaiser Permanente San Francisco, San Francisco, California
| | | | | | - Sylvia L. Groth
- Department of Ophthalmology and Visual Sciences, Vanderbilt Eye Institute, Nashville, Tennessee
| | - Cassie Ludwig
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, California
| | - Sophia Y. Wang
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, California
| |
Collapse
|
45
|
Liu J, Wang C, Liu S. Utility of ChatGPT in Clinical Practice. J Med Internet Res 2023; 25:e48568. [PMID: 37379067 PMCID: PMC10365580 DOI: 10.2196/48568] [Citation(s) in RCA: 67] [Impact Index Per Article: 67.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 05/29/2023] [Accepted: 06/15/2023] [Indexed: 06/29/2023] Open
Abstract
ChatGPT is receiving increasing attention and has a variety of application scenarios in clinical practice. In clinical decision support, ChatGPT has been used to generate accurate differential diagnosis lists, support clinical decision-making, optimize clinical decision support, and provide insights for cancer screening decisions. In addition, ChatGPT has been used for intelligent question-answering to provide reliable information about diseases and medical queries. In terms of medical documentation, ChatGPT has proven effective in generating patient clinical letters, radiology reports, medical notes, and discharge summaries, improving efficiency and accuracy for health care providers. Future research directions include real-time monitoring and predictive analytics, precision medicine and personalized treatment, the role of ChatGPT in telemedicine and remote health care, and integration with existing health care systems. Overall, ChatGPT is a valuable tool that complements the expertise of health care providers and improves clinical decision-making and patient care. However, ChatGPT is a double-edged sword. We need to carefully consider and study the benefits and potential dangers of ChatGPT. In this viewpoint, we discuss recent advances in ChatGPT research in clinical practice and suggest possible risks and challenges of using ChatGPT in clinical practice. It will help guide and support future artificial intelligence research similar to ChatGPT in health.
Collapse
Affiliation(s)
- Jialin Liu
- Information Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Medical Informatics, West China Medical School, Chengdu, China
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital, Sichuan University, Chengdu, China
| | - Changyu Wang
- Information Center, West China Hospital, Sichuan University, Chengdu, China
- West China College of Stomatology, Sichuan University, Chengdu, China
| | - Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| |
Collapse
|
46
|
Honavar SG. Eye of the AI storm: Exploring the impact of AI tools in ophthalmology. Indian J Ophthalmol 2023; 71:2328-2340. [PMID: 37322638 PMCID: PMC10418018 DOI: 10.4103/ijo.ijo_1478_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023] Open
Affiliation(s)
- Santosh G Honavar
- Editor, Indian Journal of Ophthalmology, Centre for Sight, Road No. 2, Banjara Hills, Hyderabad, Telangana, India E-mail:
| |
Collapse
|
47
|
Abstract
Supplemental material is available for this article.
Collapse
Affiliation(s)
- Rajesh Bhayana
- From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Mount Sinai Hospital and Women's College Hospital, University of Toronto, 200 Elizabeth St, Peter Mulk Building, 1st Fl, Toronto, ON, Canada M5G 24C
| | - Robert R Bleakney
- From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Mount Sinai Hospital and Women's College Hospital, University of Toronto, 200 Elizabeth St, Peter Mulk Building, 1st Fl, Toronto, ON, Canada M5G 24C
| | - Satheesh Krishna
- From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Mount Sinai Hospital and Women's College Hospital, University of Toronto, 200 Elizabeth St, Peter Mulk Building, 1st Fl, Toronto, ON, Canada M5G 24C
| |
Collapse
|
48
|
Singh S, Djalilian A, Ali MJ. ChatGPT and Ophthalmology: Exploring Its Potential with Discharge Summaries and Operative Notes. Semin Ophthalmol 2023:1-5. [PMID: 37133418 DOI: 10.1080/08820538.2023.2209166] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
PURPOSE This study aimed to report the abilities of the large language model ChatGPTR (OpenAI, San Francisco, USA) in constructing ophthalmic discharge summaries and operative notes. METHODS A set of prompts was constructed through statements incorporating common ophthalmic surgeries across the subspecialties of the cornea, retina, glaucoma, paediatric ophthalmology, neuro-ophthalmology, and ophthalmic plastics surgery. The responses of ChatGPT were assessed by three surgeons carefully and analyzed them for evidence-based content, specificity of the response, presence of generic text, disclaimers, factual inaccuracies, and its abilities to admit mistakes and challenge incorrect premises. RESULTS A total of 24 prompts were presented to the ChatGPT. Twelve prompts assessed its ability to construct discharge summaries, and an equal number explored the potential for preparing operative notes. The response was found to be tailored based on the quality of inputs given and was provided in a matter of seconds. The ophthalmic discharge summaries had a valid but significant generic text. ChatGPT could incorporate specific medications, follow-up instructions, consultation time, and location within the discharge summaries when prompted appropriately. While the operative notes were detailed, they required significant tuning. ChatGPT routinely admits its mistakes and corrects itself immediately when confronted with factual inaccuracies. The mistakes are avoided in subsequent reports when given similar prompts. CONCLUSION The performance of ChatGPT in the context of ophthalmic discharge summaries and operative notes was encouraging. These are constructed rapidly in a matter of seconds. Focused training of ChatGPT on these issues with inclusion of a human verification step has an enormous potential to impact healthcare positively.
Collapse
Affiliation(s)
- Swati Singh
- Ophthalmic Plastic Surgery Service, L.V. Prasad Eye Institute, Hyderabad, India
| | - Ali Djalilian
- Department of Ophthalmology, University of Illinois, Chicago, Illinois, USA
| | - Mohammad Javed Ali
- Govindram Seksaria Institute of Dacryology, L.V. Prasad Eye Institute, Hyderabad, India
| |
Collapse
|
49
|
Dutton JJ. Artificial Intelligence and the Future of Computer-Assisted Medical Research and Writing. Ophthalmic Plast Reconstr Surg 2023; 39:203-205. [PMID: 37166288 DOI: 10.1097/iop.0000000000002420] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Affiliation(s)
- Jonathan J Dutton
- Department of Ophthalmology, University of North Carolina, Chapel Hill, North Carolina, U.S.A
| |
Collapse
|
50
|
Ali MJ. ChatGPT and Lacrimal Drainage Disorders: Performance and Scope of Improvement. Ophthalmic Plast Reconstr Surg 2023; 39:221-225. [PMID: 37166289 PMCID: PMC10171282 DOI: 10.1097/iop.0000000000002418] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/11/2023] [Indexed: 05/12/2023]
Abstract
PURPOSE This study aimed to report the performance of the large language model ChatGPT (OpenAI, San Francisco, CA, U.S.A.) in the context of lacrimal drainage disorders. METHODS A set of prompts was constructed through questions and statements spanning common and uncommon aspects of lacrimal drainage disorders. Care was taken to avoid constructing prompts that had significant or new knowledge beyond the year 2020. Each of the prompts was presented thrice to ChatGPT. The questions covered common disorders such as primary acquired nasolacrimal duct obstruction and congenital nasolacrimal duct obstruction and their cause and management. The prompts also tested ChatGPT on certain specifics, such as the history of dacryocystorhinostomy (DCR) surgery, lacrimal pump anatomy, and human canalicular surfactants. ChatGPT was also quizzed on controversial topics such as silicone intubation and the use of mitomycin C in DCR surgery. The responses of ChatGPT were carefully analyzed for evidence-based content, specificity of the response, presence of generic text, disclaimers, factual inaccuracies, and its abilities to admit mistakes and challenge incorrect premises. Three lacrimal surgeons graded the responses into three categories: correct, partially correct, and factually incorrect. RESULTS A total of 21 prompts were presented to the ChatGPT. The responses were detailed and were based according to the prompt structure. In response to most questions, ChatGPT provided a generic disclaimer that it could not give medical advice or professional opinion but then provided an answer to the question in detail. Specific prompts such as "how can I perform an external DCR?" were responded by a sequential listing of all the surgical steps. However, several factual inaccuracies were noted across many ChatGPT replies. Several responses on controversial topics such as silicone intubation and mitomycin C were generic and not precisely evidence-based. ChatGPT's response to specific questions such as canalicular surfactants and idiopathic canalicular inflammatory disease was poor. The presentation of variable prompts on a single topic led to responses with either repetition or recycling of the phrases. Citations were uniformly missing across all responses. Agreement among the three observers was high (95%) in grading the responses. The responses of ChatGPT were graded as correct for only 40% of the prompts, partially correct in 35%, and outright factually incorrect in 25%. Hence, some degree of factual inaccuracy was present in 60% of the responses, if we consider the partially correct responses. The exciting aspect was that ChatGPT was able to admit mistakes and correct them when presented with counterarguments. It was also capable of challenging incorrect prompts and premises. CONCLUSION The performance of ChatGPT in the context of lacrimal drainage disorders, at best, can be termed average. However, the potential of this AI chatbot to influence medicine is enormous. There is a need for it to be specifically trained and retrained for individual medical subspecialties.
Collapse
Affiliation(s)
- Mohammad Javed Ali
- Govindram Seksaria Institute of Dacryology, L.V. Prasad Eye Institute, Hyderabad, India
| |
Collapse
|