Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Mihalache A, Huang RS, Popovic MM, Muni RH. Performance of an Upgraded Artificial Intelligence Chatbot for Ophthalmic Knowledge Assessment. JAMA Ophthalmol 2023;141:798-800. [PMID: 37440220 PMCID: PMC10346493 DOI: 10.1001/jamaophthalmol.2023.2754] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 05/09/2023] [Indexed: 07/14/2023]

For:	Mihalache A, Huang RS, Popovic MM, Muni RH. Performance of an Upgraded Artificial Intelligence Chatbot for Ophthalmic Knowledge Assessment. JAMA Ophthalmol 2023;141:798-800. [PMID: 37440220 PMCID: PMC10346493 DOI: 10.1001/jamaophthalmol.2023.2754] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 05/09/2023] [Indexed: 07/14/2023]

Number

Cited by Other Article(s)

Maywood MJ, Parikh R, Deobhakta A, Begaj T. PERFORMANCE ASSESSMENT OF AN ARTIFICIAL INTELLIGENCE CHATBOT IN CLINICAL VITREORETINAL SCENARIOS. Retina 2024;44:954-964. [PMID: 38271674 DOI: 10.1097/iae.0000000000004053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]

Bressler NM. JAMA Ophthalmology-The Year in Review, 2023. JAMA Ophthalmol 2024;142:405-406. [PMID: 38512250 DOI: 10.1001/jamaophthalmol.2024.0435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2024]

Mihalache A, Huang RS, Popovic MM, Muni RH. Artificial intelligence chatbot and Academy Preferred Practice Pattern ® Guidelines on cataract and glaucoma. J Cataract Refract Surg 2024;50:534-535. [PMID: 38468154 DOI: 10.1097/j.jcrs.0000000000001317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 09/08/2023] [Indexed: 03/13/2024]

Biswas S, Davies LN, Sheppard AL, Logan NS, Wolffsohn JS. Utility of artificial intelligence-based large language models in ophthalmic care. Ophthalmic Physiol Opt 2024;44:641-671. [PMID: 38404172 DOI: 10.1111/opo.13284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 01/23/2024] [Accepted: 01/25/2024] [Indexed: 02/27/2024]

Abstract

PURPOSE

With the introduction of ChatGPT, artificial intelligence (AI)-based large language models (LLMs) are rapidly becoming popular within the scientific community. They use natural language processing to generate human-like responses to queries. However, the application of LLMs and comparison of the abilities among different LLMs with their human counterparts in ophthalmic care remain under-reported.

RECENT FINDINGS

Hitherto, studies in eye care have demonstrated the utility of ChatGPT in generating patient information, clinical diagnosis and passing ophthalmology question-based examinations, among others. LLMs' performance (median accuracy, %) is influenced by factors such as the iteration, prompts utilised and the domain. Human expert (86%) demonstrated the highest proficiency in disease diagnosis, while ChatGPT-4 outperformed others in ophthalmology examinations (75.9%), symptom triaging (98%) and providing information and answering questions (84.6%). LLMs exhibited superior performance in general ophthalmology but reduced accuracy in ophthalmic subspecialties. Although AI-based LLMs like ChatGPT are deemed more efficient than their human counterparts, these AIs are constrained by their nonspecific and outdated training, no access to current knowledge, generation of plausible-sounding 'fake' responses or hallucinations, inability to process images, lack of critical literature analysis and ethical and copyright issues. A comprehensive evaluation of recently published studies is crucial to deepen understanding of LLMs and the potential of these AI-based LLMs.

SUMMARY

Ophthalmic care professionals should undertake a conservative approach when using AI, as human judgement remains essential for clinical decision-making and monitoring the accuracy of information. This review identified the ophthalmic applications and potential usages which need further exploration. With the advancement of LLMs, setting standards for benchmarking and promoting best practices is crucial. Potential clinical deployment requires the evaluation of these LLMs to move away from artificial settings, delve into clinical trials and determine their usefulness in the real world.

Collapse

Mihalache A, Huang RS, Cruz-Pimentel M, Patil NS, Popovic MM, Pandya BU, Shor R, Pereira A, Muni RH. Artificial intelligence chatbot interpretation of ophthalmic multimodal imaging cases. Eye (Lond) 2024:10.1038/s41433-024-03074-5. [PMID: 38649474 DOI: 10.1038/s41433-024-03074-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 03/15/2024] [Accepted: 04/05/2024] [Indexed: 04/25/2024] Open

Mihalache A, Grad J, Patil NS, Huang RS, Popovic MM, Mallipatna A, Kertes PJ, Muni RH. Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment. Eye (Lond) 2024:10.1038/s41433-024-03067-4. [PMID: 38615098 DOI: 10.1038/s41433-024-03067-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 03/08/2024] [Accepted: 04/04/2024] [Indexed: 04/15/2024] Open

Mihalache A, Huang RS, Popovic MM, Patil NS, Pandya BU, Shor R, Pereira A, Kwok JM, Yan P, Wong DT, Kertes PJ, Muni RH. Accuracy of an Artificial Intelligence Chatbot's Interpretation of Clinical Ophthalmic Images. JAMA Ophthalmol 2024;142:321-326. [PMID: 38421670 PMCID: PMC10905373 DOI: 10.1001/jamaophthalmol.2024.0017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 12/19/2023] [Indexed: 03/02/2024]

Abstract

Importance

Ophthalmology is reliant on effective interpretation of multimodal imaging to ensure diagnostic accuracy. The new ability of ChatGPT-4 (OpenAI) to interpret ophthalmic images has not yet been explored.

Objective

To evaluate the performance of the novel release of an artificial intelligence chatbot that is capable of processing imaging data.

Design, Setting, and Participants

This cross-sectional study used a publicly available dataset of ophthalmic cases from OCTCases, a medical education platform based out of the Department of Ophthalmology and Vision Sciences at the University of Toronto, with accompanying clinical multimodal imaging and multiple-choice questions. Across 137 available cases, 136 contained multiple-choice questions (99%).

Exposures

The chatbot answered questions requiring multimodal input from October 16 to October 23, 2023.

Main Outcomes and Measures

The primary outcome was the accuracy of the chatbot in answering multiple-choice questions pertaining to image recognition in ophthalmic cases, measured as the proportion of correct responses. χ2 Tests were conducted to compare the proportion of correct responses across different ophthalmic subspecialties.

Results

A total of 429 multiple-choice questions from 136 ophthalmic cases and 448 images were included in the analysis. The chatbot answered 299 of multiple-choice questions correctly across all cases (70%). The chatbot's performance was better on retina questions than neuro-ophthalmology questions (77% vs 58%; difference = 18%; 95% CI, 7.5%-29.4%; χ21 = 11.4; P < .001). The chatbot achieved a better performance on nonimage-based questions compared with image-based questions (82% vs 65%; difference = 17%; 95% CI, 7.8%-25.1%; χ21 = 12.2; P < .001).The chatbot performed best on questions in the retina category (77% correct) and poorest in the neuro-ophthalmology category (58% correct). The chatbot demonstrated intermediate performance on questions from the ocular oncology (72% correct), pediatric ophthalmology (68% correct), uveitis (67% correct), and glaucoma (61% correct) categories.

Conclusions and Relevance

In this study, the recent version of the chatbot accurately responded to approximately two-thirds of multiple-choice questions pertaining to ophthalmic cases based on imaging interpretation. The multimodal chatbot performed better on questions that did not rely on the interpretation of imaging modalities. As the use of multimodal chatbots becomes increasingly widespread, it is imperative to stress their appropriate integration within medical contexts.

Collapse

Tao BKL, Hua N, Milkovich J, Micieli JA. ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources. Eye (Lond) 2024:10.1038/s41433-024-03037-w. [PMID: 38509182 DOI: 10.1038/s41433-024-03037-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 03/04/2024] [Accepted: 03/14/2024] [Indexed: 03/22/2024] Open

Abstract

BACKGROUND/OBJECTIVES

Experimental investigation. Bing Chat (Microsoft) integration with ChatGPT-4 (OpenAI) integration has conferred the capability of accessing online data past 2021. We investigate its performance against ChatGPT-3.5 on a multiple-choice question ophthalmology exam.

SUBJECTS/METHODS

In August 2023, ChatGPT-3.5 and Bing Chat were evaluated against 913 questions derived from the Academy's Basic and Clinical Science Collection collection. For each response, the sub-topic, performance, Simple Measure of Gobbledygook readability score (measuring years of required education to understand a given passage), and cited resources were collected. The primary outcomes were the comparative scores between models, and qualitatively, the resources referenced by Bing Chat. Secondary outcomes included performance stratified by response readability, question type (explicit or situational), and BCSC sub-topic.

RESULTS

Across 913 questions, ChatGPT-3.5 scored 59.69% [95% CI 56.45,62.94] while Bing Chat scored 73.60% [95% CI 70.69,76.52]. Both models performed significantly better in explicit than clinical reasoning questions. Both models performed best on general medicine questions than ophthalmology subsections. Bing Chat referenced 927 online entities and provided at-least one citation to 836 of the 913 questions. The use of more reliable (peer-reviewed) sources was associated with higher likelihood of correct response. The most-cited resources were eyewiki.aao.org, aao.org, wikipedia.org, and ncbi.nlm.nih.gov. Bing Chat showed significantly better readability than ChatGPT-3.5, averaging a reading level of grade 11.4 [95% CI 7.14, 15.7] versus 12.4 [95% CI 8.77, 16.1], respectively (p-value < 0.0001, ρ = 0.25).

CONCLUSIONS

The online access, improved readability, and citation feature of Bing Chat confers additional utility for ophthalmology learners. We recommend critical appraisal of cited sources during response interpretation.

Collapse

Mihalache A, Huang RS, Patil NS, Popovic MM, Lee WW, Yan P, Cruz-Pimentel M, Muni RH. Chatbot and Academy Preferred Practice Pattern Guidelines on Retinal Diseases. Ophthalmol Retina 2024:S2468-6530(24)00117-9. [PMID: 38499086 DOI: 10.1016/j.oret.2024.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 02/24/2024] [Accepted: 03/12/2024] [Indexed: 03/20/2024]

Mihalache A, Huang RS, Popovic MM, Muni RH. ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination. MEDICAL TEACHER 2024;46:366-372. [PMID: 37839017 DOI: 10.1080/0142159x.2023.2249588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]

Abstract

PURPOSE

ChatGPT-4 is an upgraded version of an artificial intelligence chatbot. The performance of ChatGPT-4 on the United States Medical Licensing Examination (USMLE) has not been independently characterized. We aimed to assess the performance of ChatGPT-4 at responding to USMLE Step 1, Step 2CK, and Step 3 practice questions.

METHOD

Practice multiple-choice questions for the USMLE Step 1, Step 2CK, and Step 3 were compiled. Of 376 available questions, 319 (85%) were analyzed by ChatGPT-4 on March 21st, 2023. Our primary outcome was the performance of ChatGPT-4 for the practice USMLE Step 1, Step 2CK, and Step 3 examinations, measured as the proportion of multiple-choice questions answered correctly. Our secondary outcomes were the mean length of questions and responses provided by ChatGPT-4.

RESULTS

ChatGPT-4 responded to 319 text-based multiple-choice questions from USMLE practice test material. ChatGPT-4 answered 82 of 93 (88%) questions correctly on USMLE Step 1, 91 of 106 (86%) on Step 2CK, and 108 of 120 (90%) on Step 3. ChatGPT-4 provided explanations for all questions. ChatGPT-4 spent 30.8 ± 11.8 s on average responding to practice questions for USMLE Step 1, 23.0 ± 9.4 s per question for Step 2CK, and 23.1 ± 8.3 s per question for Step 3. The mean length of practice USMLE multiple-choice questions that were answered correctly and incorrectly by ChatGPT-4 was similar (difference = 17.48 characters, SE = 59.75, 95%CI = [-100.09,135.04], t = 0.29, p = 0.77). The mean length of ChatGPT-4's correct responses to practice questions was significantly shorter than the mean length of incorrect responses (difference = 79.58 characters, SE = 35.42, 95%CI = [9.89,149.28], t = 2.25, p = 0.03).

CONCLUSIONS

ChatGPT-4 answered a remarkably high proportion of practice questions correctly for USMLE examinations. ChatGPT-4 performed substantially better at USMLE practice questions than previous models of the same AI chatbot.

Collapse

Marshall RF, Mallem K, Xu H, Thorne J, Burkholder B, Chaon B, Liberman P, Berkenstock M. Investigating the Accuracy and Completeness of an Artificial Intelligence Large Language Model About Uveitis: An Evaluation of ChatGPT. Ocul Immunol Inflamm 2024:1-4. [PMID: 38394625 DOI: 10.1080/09273948.2024.2317417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 02/06/2024] [Indexed: 02/25/2024]

Olis M, Dyjak P, Weppelmann TA. Performance of three artificial intelligence chatbots on Ophthalmic Knowledge Assessment Program materials. CANADIAN JOURNAL OF OPHTHALMOLOGY 2024:S0008-4182(24)00026-7. [PMID: 38408734 DOI: 10.1016/j.jcjo.2024.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 12/21/2023] [Accepted: 01/29/2024] [Indexed: 02/28/2024]

Gritti MN, AlTurki H, Farid P, Morgan CT. Progression of an Artificial Intelligence Chatbot (ChatGPT) for Pediatric Cardiology Educational Knowledge Assessment. Pediatr Cardiol 2024;45:309-313. [PMID: 38170274 DOI: 10.1007/s00246-023-03385-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024]

Danesh A, Pazouki H, Danesh F, Danesh A, Vardar-Sengul S. Artificial intelligence in dental education: ChatGPT's performance on the periodontic in-service examination. J Periodontol 2024. [PMID: 38197146 DOI: 10.1002/jper.23-0514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/15/2023] [Accepted: 11/16/2023] [Indexed: 01/11/2024]

Abstract

BACKGROUND

ChatGPT's (Chat Generative Pre-Trained Transformer) remarkable capacity to generate human-like output makes it an appealing learning tool for healthcare students worldwide. Nevertheless, the chatbot's responses may be subject to inaccuracies, putting forth an intense risk of misinformation. ChatGPT's capabilities should be examined in every corner of healthcare education, including dentistry and its specialties, to understand the potential of misinformation associated with the chatbot's use as a learning tool. Our investigation aims to explore ChatGPT's foundation of knowledge in the field of periodontology by evaluating the chatbot's performance on questions obtained from an in-service examination administered by the American Academy of Periodontology (AAP).

METHODS

ChatGPT3.5 and ChatGPT4 were evaluated on 311 multiple-choice questions obtained from the 2023 in-service examination administered by the AAP. The dataset of in-service examination questions was accessed through Nova Southeastern University's Department of Periodontology. Our study excluded questions containing an image as ChatGPT does not accept image inputs.

RESULTS

ChatGPT3.5 and ChatGPT4 answered 57.9% and 73.6% of in-service questions correctly on the 2023 Periodontics In-Service Written Examination, respectively. A two-tailed t test was incorporated to compare independent sample means, and sample proportions were compared using a two-tailed χ2 test. A p value below the threshold of 0.05 was deemed statistically significant.

CONCLUSION

While ChatGPT4 showed a higher proficiency compared to ChatGPT3.5, both chatbot models leave considerable room for misinformation with their responses relating to periodontology. The findings of the study encourage residents to scrutinize the periodontic information generated by ChatGPT to account for the chatbot's current limitations.

Collapse

Wong M, Lim ZW, Pushpanathan K, Cheung CY, Wang YX, Chen D, Tham YC. Review of emerging trends and projection of future developments in large language models research in ophthalmology. Br J Ophthalmol 2023:bjo-2023-324734. [PMID: 38164563 DOI: 10.1136/bjo-2023-324734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 11/14/2023] [Indexed: 01/03/2024]

Abstract

BACKGROUND

Large language models (LLMs) are fast emerging as potent tools in healthcare, including ophthalmology. This systematic review offers a twofold contribution: it summarises current trends in ophthalmology-related LLM research and projects future directions for this burgeoning field.

METHODS

We systematically searched across various databases (PubMed, Europe PMC, Scopus and Web of Science) for articles related to LLM use in ophthalmology, published between 1 January 2022 and 31 July 2023. Selected articles were summarised, and categorised by type (editorial, commentary, original research, etc) and their research focus (eg, evaluating ChatGPT's performance in ophthalmology examinations or clinical tasks).

FINDINGS

We identified 32 articles meeting our criteria, published between January and July 2023, with a peak in June (n=12). Most were original research evaluating LLMs' proficiency in clinically related tasks (n=9). Studies demonstrated that ChatGPT-4.0 outperformed its predecessor, ChatGPT-3.5, in ophthalmology exams. Furthermore, ChatGPT excelled in constructing discharge notes (n=2), evaluating diagnoses (n=2) and answering general medical queries (n=6). However, it struggled with generating scientific articles or abstracts (n=3) and answering specific subdomain questions, especially those regarding specific treatment options (n=2). ChatGPT's performance relative to other LLMs (Google's Bard, Microsoft's Bing) varied by study design. Ethical concerns such as data hallucination (n=27), authorship (n=5) and data privacy (n=2) were frequently cited.

INTERPRETATION

While LLMs hold transformative potential for healthcare and ophthalmology, concerns over accountability, accuracy and data security remain. Future research should focus on application programming interface integration, comparative assessments of popular LLMs, their ability to interpret image-based data and the establishment of standardised evaluation frameworks.

Collapse

Ittarat M, Cheungpasitporn W, Chansangpetch S. Personalized Care in Eye Health: Exploring Opportunities, Challenges, and the Road Ahead for Chatbots. J Pers Med 2023;13:1679. [PMID: 38138906 PMCID: PMC10744965 DOI: 10.3390/jpm13121679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/24/2023] Open

Cohen A, Alter R, Lessans N, Meyer R, Brezinov Y, Levin G. Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations. Arch Gynecol Obstet 2023;308:1797-1802. [PMID: 37668790 DOI: 10.1007/s00404-023-07185-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 08/02/2023] [Indexed: 09/06/2023]

Hu X, Ran AR, Nguyen TX, Szeto S, Yam JC, Chan CKM, Cheung CY. What can GPT-4 do for Diagnosing Rare Eye Diseases? A Pilot Study. Ophthalmol Ther 2023;12:3395-3402. [PMID: 37656399 PMCID: PMC10640532 DOI: 10.1007/s40123-023-00789-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 08/01/2023] [Indexed: 09/02/2023] Open

Abstract

INTRODUCTION

Generative pretrained transformer-4 (GPT-4) has gained widespread attention from society, and its potential has been extensively evaluated in many areas. However, investigation of GPT-4's use in medicine, especially in the ophthalmology field, is still limited. This study aims to evaluate GPT-4's capability to identify rare ophthalmic diseases in three simulated scenarios for different end-users, including patients, family physicians, and junior ophthalmologists.

METHODS

We selected ten treatable rare ophthalmic disease cases from the publicly available EyeRounds service. We gradually increased the amount of information fed into GPT-4 to simulate the scenarios of patient, family physician, and junior ophthalmologist using GPT-4. GPT-4's responses were evaluated from two aspects: suitability (appropriate or inappropriate) and accuracy (right or wrong) by senior ophthalmologists (> 10 years' experiences).

RESULTS

Among the 30 responses, 83.3% were considered "appropriate" by senior ophthalmologists. In the scenarios of simulated patient, family physician, and junior ophthalmologist, seven (70%), ten (100%), and eight (80%) responses were graded as "appropriate" by senior ophthalmologists. However, compared to the ground truth, GPT-4 could only output several possible diseases generally without "right" responses in the simulated patient scenarios. In contrast, in the simulated family physician scenario, 50% of GPT-4's responses were "right," and in the simulated junior ophthalmologist scenario, the model achieved a higher "right" rate of 90%.

CONCLUSION

To our knowledge, this is the first proof-of-concept study that evaluates GPT-4's capacity to identify rare eye diseases in simulated scenarios involving patients, family physicians, and junior ophthalmologists. The results indicate that GPT-4 has the potential to serve as a consultation assisting tool for patients and family physicians to receive referral suggestions and an assisting tool for junior ophthalmologists to diagnose rare eye diseases. However, it is important to approach GPT-4 with caution and acknowledge the need for verification and careful referrals in clinical settings.

Collapse

Fowler T, Pullen S, Birkett L. Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions. Br J Ophthalmol 2023:bjo-2023-324091. [PMID: 37932006 DOI: 10.1136/bjo-2023-324091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 10/08/2023] [Indexed: 11/08/2023]

Antaki F, Milad D, Chia MA, Giguère CÉ, Touma S, El-Khoury J, Keane PA, Duval R. Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering. Br J Ophthalmol 2023:bjo-2023-324438. [PMID: 37923374 DOI: 10.1136/bjo-2023-324438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/01/2023] [Indexed: 11/07/2023]

Mihalache A, Popovic MM, Muni RH. Advances in Artificial Intelligence Chatbot Technology in Ophthalmology-Reply. JAMA Ophthalmol 2023;141:1088-1089. [PMID: 37856111 DOI: 10.1001/jamaophthalmol.2023.4623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2023]

Wiedemann P. Artificial intelligence in ophthalmology. Int J Ophthalmol 2023;16:1357-1360. [PMID: 37724277 PMCID: PMC10409517 DOI: 10.18240/ijo.2023.09.01] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 07/19/2023] [Indexed: 09/20/2023] Open

Liu J, Zheng J, Cai X, Wu D, Yin C. A descriptive study based on the comparison of ChatGPT and evidence-based neurosurgeons. iScience 2023;26:107590. [PMID: 37705958 PMCID: PMC10495632 DOI: 10.1016/j.isci.2023.107590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/21/2023] [Accepted: 08/04/2023] [Indexed: 09/15/2023] Open