Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Potapenko I, Boberg-Ans LC, Stormly Hansen M, Klefter ON, van Dijk EHC, Subhi Y. Artificial intelligence-based chatbot patient information on common retinal diseases using ChatGPT. Acta Ophthalmol 2023;101:829-831. [PMID: 36912780 DOI: 10.1111/aos.15661] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 03/01/2023] [Indexed: 03/14/2023]

For:	Potapenko I, Boberg-Ans LC, Stormly Hansen M, Klefter ON, van Dijk EHC, Subhi Y. Artificial intelligence-based chatbot patient information on common retinal diseases using ChatGPT. Acta Ophthalmol 2023;101:829-831. [PMID: 36912780 DOI: 10.1111/aos.15661] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 03/01/2023] [Indexed: 03/14/2023]

Number

Cited by Other Article(s)

Wang Y, Liu C, Zhou K, Zhu T, Han X. Towards regulatory generative AI in ophthalmology healthcare: a security and privacy perspective. Br J Ophthalmol 2024:bjo-2024-325167. [PMID: 38834290 DOI: 10.1136/bjo-2024-325167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 05/19/2024] [Indexed: 06/06/2024]

Roldan-Vasquez E, Mitri S, Bhasin S, Bharani T, Capasso K, Haslinger M, Sharma R, James TA. Reliability of artificial intelligence chatbot responses to frequently asked questions in breast surgical oncology. J Surg Oncol 2024. [PMID: 38837375 DOI: 10.1002/jso.27715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 05/21/2024] [Indexed: 06/07/2024]

Nguyen TP, Carvalho B, Sukhdeo H, Joudi K, Guo N, Chen M, Wolpaw JT, Kiefer JJ, Byrne M, Jamroz T, Mootz AA, Reale SC, Zou J, Sultan P. Comparison of artificial intelligence large language model chatbots in answering frequently asked questions in anaesthesia. BJA OPEN 2024;10:100280. [PMID: 38764485 PMCID: PMC11099318 DOI: 10.1016/j.bjao.2024.100280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 03/20/2024] [Indexed: 05/21/2024]

Ye F, Zhang H, Luo X, Wu T, Yang Q, Shi Z. Evaluating ChatGPT's Performance in Answering Questions About Allergic Rhinitis and Chronic Rhinosinusitis. Otolaryngol Head Neck Surg 2024. [PMID: 38796735 DOI: 10.1002/ohn.832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 04/30/2024] [Accepted: 05/04/2024] [Indexed: 05/28/2024]

Affiliation(s)

Fan Ye Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
He Zhang Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
Xin Luo Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
Tong Wu Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
Qintai Yang Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China Naso-Orbital-Maxilla and Skull Base Center, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China Key Laboratory of Airway Inflammatory Disease Research and Innovative Technology Translation, Guangzhou, China
Zhaohui Shi Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China Department of Allergy, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China Naso-Orbital-Maxilla and Skull Base Center, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China Key Laboratory of Airway Inflammatory Disease Research and Innovative Technology Translation, Guangzhou, China

Collapse

Shiraishi M, Tomioka Y, Miyakuni A, Moriwaki Y, Yang R, Oba J, Okazaki M. Generating Informed Consent Documents Related to Blepharoplasty Using ChatGPT. Ophthalmic Plast Reconstr Surg 2024;40:316-320. [PMID: 38133626 DOI: 10.1097/iop.0000000000002574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]

Kedia N, Sanjeev S, Ong J, Chhablani J. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (Lond) 2024;38:1252-1261. [PMID: 38172581 PMCID: PMC11076576 DOI: 10.1038/s41433-023-02915-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/23/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open

Momenaei B, Mansour HA, Kuriyan AE, Xu D, Sridhar J, Ting DSW, Yonekawa Y. ChatGPT enters the room: what it means for patient counseling, physician education, academics, and disease management. Curr Opin Ophthalmol 2024;35:205-209. [PMID: 38334288 DOI: 10.1097/icu.0000000000001036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]

Mihalache A, Huang RS, Popovic MM, Muni RH. Artificial intelligence chatbot and Academy Preferred Practice Pattern ® Guidelines on cataract and glaucoma. J Cataract Refract Surg 2024;50:534-535. [PMID: 38468154 DOI: 10.1097/j.jcrs.0000000000001317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 09/08/2023] [Indexed: 03/13/2024]

Biswas S, Davies LN, Sheppard AL, Logan NS, Wolffsohn JS. Utility of artificial intelligence-based large language models in ophthalmic care. Ophthalmic Physiol Opt 2024;44:641-671. [PMID: 38404172 DOI: 10.1111/opo.13284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 01/23/2024] [Accepted: 01/25/2024] [Indexed: 02/27/2024]

Abstract

PURPOSE

With the introduction of ChatGPT, artificial intelligence (AI)-based large language models (LLMs) are rapidly becoming popular within the scientific community. They use natural language processing to generate human-like responses to queries. However, the application of LLMs and comparison of the abilities among different LLMs with their human counterparts in ophthalmic care remain under-reported.

RECENT FINDINGS

Hitherto, studies in eye care have demonstrated the utility of ChatGPT in generating patient information, clinical diagnosis and passing ophthalmology question-based examinations, among others. LLMs' performance (median accuracy, %) is influenced by factors such as the iteration, prompts utilised and the domain. Human expert (86%) demonstrated the highest proficiency in disease diagnosis, while ChatGPT-4 outperformed others in ophthalmology examinations (75.9%), symptom triaging (98%) and providing information and answering questions (84.6%). LLMs exhibited superior performance in general ophthalmology but reduced accuracy in ophthalmic subspecialties. Although AI-based LLMs like ChatGPT are deemed more efficient than their human counterparts, these AIs are constrained by their nonspecific and outdated training, no access to current knowledge, generation of plausible-sounding 'fake' responses or hallucinations, inability to process images, lack of critical literature analysis and ethical and copyright issues. A comprehensive evaluation of recently published studies is crucial to deepen understanding of LLMs and the potential of these AI-based LLMs.

SUMMARY

Ophthalmic care professionals should undertake a conservative approach when using AI, as human judgement remains essential for clinical decision-making and monitoring the accuracy of information. This review identified the ophthalmic applications and potential usages which need further exploration. With the advancement of LLMs, setting standards for benchmarking and promoting best practices is crucial. Potential clinical deployment requires the evaluation of these LLMs to move away from artificial settings, delve into clinical trials and determine their usefulness in the real world.

Collapse

Knebel D, Priglinger S, Scherer N, Klaas J, Siedlecki J, Schworm B. Assessment of ChatGPT in the Prehospital Management of Ophthalmological Emergencies - An Analysis of 10 Fictional Case Vignettes. Klin Monbl Augenheilkd 2024;241:675-681. [PMID: 37890504 DOI: 10.1055/a-2149-0447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2023]

Rosselló-Jiménez D, Docampo S, Collado Y, Cuadra-Llopart L, Riba F, Llonch-Masriera M. Geriatrics and artificial intelligence in Spain (Ger-IA project): talking to ChatGPT, a nationwide survey. Eur Geriatr Med 2024:10.1007/s41999-024-00970-7. [PMID: 38615289 DOI: 10.1007/s41999-024-00970-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 03/04/2024] [Indexed: 04/15/2024]

Abstract

PURPOSE

The purposes of the study was to describe the degree of agreement between geriatricians with the answers given by an AI tool (ChatGPT) in response to questions related to different areas in geriatrics, to study the differences between specialists and residents in geriatrics in terms of the degree of agreement with ChatGPT, and to analyse the mean scores obtained by areas of knowledge/domains.

METHODS

An observational study was conducted involving 126 doctors from 41 geriatric medicine departments in Spain. Ten questions about geriatric medicine were posed to ChatGPT, and doctors evaluated the AI's answers using a Likert scale. Sociodemographic variables were included. Questions were categorized into five knowledge domains, and means and standard deviations were calculated for each.

RESULTS

130 doctors answered the questionnaire. 126 doctors (69.8% women, mean age 41.4 [9.8]) were included in the final analysis. The mean score obtained by ChatGPT was 3.1/5 [0.67]. Specialists rated ChatGPT lower than residents (3.0/5 vs. 3.3/5 points, respectively, P < 0.05). By domains, ChatGPT scored better (M: 3.96; SD: 0.71) in general/theoretical questions rather than in complex decisions/end-of-life situations (M: 2.50; SD: 0.76) and answers related to diagnosis/performing of complementary tests obtained the lowest ones (M: 2.48; SD: 0.77).

CONCLUSION

Scores presented big variability depending on the area of knowledge. Questions related to theoretical aspects of challenges/future in geriatrics obtained better scores. When it comes to complex decision-making, appropriateness of the therapeutic efforts or decisions about diagnostic tests, professionals indicated a poorer performance. AI is likely to be incorporated into some areas of medicine, but it would still present important limitations, mainly in complex medical decision-making.

Collapse

Kaftan AN, Hussain MK, Naser FH. Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study. Sci Rep 2024;14:8233. [PMID: 38589613 PMCID: PMC11002004 DOI: 10.1038/s41598-024-58964-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 04/05/2024] [Indexed: 04/10/2024] Open

Braun EM, Juhasz-Böss I, Solomayer EF, Truhn D, Keller C, Heinrich V, Braun BJ. Will I soon be out of my job? Quality and guideline conformity of ChatGPT therapy suggestions to patient inquiries with gynecologic symptoms in a palliative setting. Arch Gynecol Obstet 2024;309:1543-1549. [PMID: 37975899 DOI: 10.1007/s00404-023-07272-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 10/15/2023] [Indexed: 11/19/2023]

Teixeira-Marques F, Medeiros N, Nazaré F, Alves S, Lima N, Ribeiro L, Gama R, Oliveira P. Exploring the role of ChatGPT in clinical decision-making in otorhinolaryngology: a ChatGPT designed study. Eur Arch Otorhinolaryngol 2024;281:2023-2030. [PMID: 38345613 DOI: 10.1007/s00405-024-08498-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 01/23/2024] [Indexed: 03/16/2024]

Mark J, Subhi Y. Blinded by Stress: A Patient and Physician Perspective on Central Serous Chorioretinopathy. Ophthalmol Ther 2024;13:861-866. [PMID: 38386185 PMCID: PMC10912400 DOI: 10.1007/s40123-024-00907-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 02/05/2024] [Indexed: 02/23/2024] Open

Cohen SA, Brant A, Fisher AC, Pershing S, Do D, Pan C. Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery. Semin Ophthalmol 2024:1-8. [PMID: 38516983 DOI: 10.1080/08820538.2024.2326058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 02/27/2024] [Indexed: 03/23/2024]

Abstract

PURPOSE

Patients are using online search modalities to learn about their eye health. While Google remains the most popular search engine, the use of large language models (LLMs) like ChatGPT has increased. Cataract surgery is the most common surgical procedure in the US, and there is limited data on the quality of online information that populates after searches related to cataract surgery on search engines such as Google and LLM platforms such as ChatGPT. We identified the most common patient frequently asked questions (FAQs) about cataracts and cataract surgery and evaluated the accuracy, safety, and readability of the answers to these questions provided by both Google and ChatGPT. We demonstrated the utility of ChatGPT in writing notes and creating patient education materials.

METHODS

The top 20 FAQs related to cataracts and cataract surgery were recorded from Google. Responses to the questions provided by Google and ChatGPT were evaluated by a panel of ophthalmologists for accuracy and safety. Evaluators were also asked to distinguish between Google and LLM chatbot answers. Five validated readability indices were used to assess the readability of responses. ChatGPT was instructed to generate operative notes, post-operative instructions, and customizable patient education materials according to specific readability criteria.

RESULTS

Responses to 20 patient FAQs generated by ChatGPT were significantly longer and written at a higher reading level than responses provided by Google (p < .001), with an average grade level of 14.8 (college level). Expert reviewers were correctly able to distinguish between a human-reviewed and chatbot generated response an average of 31% of the time. Google answers contained incorrect or inappropriate material 27% of the time, compared with 6% of LLM generated answers (p < .001). When expert reviewers were asked to compare the responses directly, chatbot responses were favored (66%).

CONCLUSIONS

When comparing the responses to patients' cataract FAQs provided by ChatGPT and Google, practicing ophthalmologists overwhelming preferred ChatGPT responses. LLM chatbot responses were less likely to contain inaccurate information. ChatGPT represents a viable information source for eye health for patients with higher health literacy. ChatGPT may also be used by ophthalmologists to create customizable patient education materials for patients with varying health literacy.

Collapse

Mihalache A, Huang RS, Patil NS, Popovic MM, Lee WW, Yan P, Cruz-Pimentel M, Muni RH. Chatbot and Academy Preferred Practice Pattern Guidelines on Retinal Diseases. Ophthalmol Retina 2024:S2468-6530(24)00117-9. [PMID: 38499086 DOI: 10.1016/j.oret.2024.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 02/24/2024] [Accepted: 03/12/2024] [Indexed: 03/20/2024]

Gurnani B, Kaur K. Leveraging ChatGPT for ophthalmic education: A critical appraisal. Eur J Ophthalmol 2024;34:323-327. [PMID: 37974429 DOI: 10.1177/11206721231215862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]

Wei Q, Yao Z, Cui Y, Wei B, Jin Z, Xu X. Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. J Biomed Inform 2024;151:104620. [PMID: 38462064 DOI: 10.1016/j.jbi.2024.104620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/27/2024] [Accepted: 02/29/2024] [Indexed: 03/12/2024]

Abstract

OBJECTIVE

Large language models (LLMs) such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in answering medical questions and provide direction for future research.

METHODS

An extensive literature search was conducted on June 15, 2023, across ten medical databases. The keyword used was "ChatGPT," without restrictions on publication type, language, or date. Studies evaluating ChatGPT's performance in answering medical questions were included. Exclusions comprised review articles, comments, patents, non-medical evaluations of ChatGPT, and preprint studies. Data was extracted on general study characteristics, question sources, conversation processes, assessment metrics, and performance of ChatGPT. An evaluation framework for LLM in medical inquiries was proposed by integrating insights from selected literature. This study is registered with PROSPERO, CRD42023456327.

RESULTS

A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. ChatGPT displayed an overall integrated accuracy of 56 % (95 % CI: 51 %-60 %, I2 = 87 %) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. As per our proposed evaluation framework, many studies failed to report methodological details, such as the date of inquiry, version of ChatGPT, and inter-rater consistency.

CONCLUSION

This review reveals ChatGPT's potential in addressing medical inquiries, but the heterogeneity of the study design and insufficient reporting might affect the results' reliability. Our proposed evaluation framework provides insights for the future study design and transparent reporting of LLM in responding to medical questions.

Collapse

Nikdel M, Ghadimi H, Tavakoli M, Suh DW. Assessment of the Responses of the Artificial Intelligence-based Chatbot ChatGPT-4 to Frequently Asked Questions About Amblyopia and Childhood Myopia. J Pediatr Ophthalmol Strabismus 2024;61:86-89. [PMID: 37882183 DOI: 10.3928/01913913-20231005-02] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]

Yalla GR, Hyman N, Hock LE, Zhang Q, Shukla AG, Kolomeyer NN. Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures. Cureus 2024;16:e56766. [PMID: 38650824 PMCID: PMC11034394 DOI: 10.7759/cureus.56766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2024] [Indexed: 04/25/2024] Open

Abstract

Introduction With the potential for artificial intelligence (AI) chatbots to serve as the primary source of glaucoma information to patients, it is essential to characterize the information that chatbots provide such that providers can tailor discussions, anticipate patient concerns, and identify misleading information. Therefore, the purpose of this study was to evaluate glaucoma information from AI chatbots, including ChatGPT-4, Bard, and Bing, by analyzing response accuracy, comprehensiveness, readability, word count, and character count in comparison to each other and glaucoma-related American Academy of Ophthalmology (AAO) patient materials. Methods Section headers from AAO glaucoma-related patient education brochures were adapted into question form and asked five times to each AI chatbot (ChatGPT-4, Bard, and Bing). Two sets of responses from each chatbot were used to evaluate the accuracy of AI chatbot responses and AAO brochure information, and the comprehensiveness of AI chatbot responses compared to the AAO brochure information, scored 1-5 by three independent glaucoma-trained ophthalmologists. Readability (assessed with Flesch-Kincaid Grade Level (FKGL), corresponding to the United States school grade levels), word count, and character count were determined for all chatbot responses and AAO brochure sections. Results Accuracy scores for AAO, ChatGPT, Bing, and Bard were 4.84, 4.26, 4.53, and 3.53, respectively. On direct comparison, AAO was more accurate than ChatGPT (p=0.002), and Bard was the least accurate (Bard versus AAO, p<0.001; Bard versus ChatGPT, p<0.002; Bard versus Bing, p=0.001). ChatGPT had the most comprehensive responses (ChatGPT versus Bing, p<0.001; ChatGPT versus Bard p=0.008), with comprehensiveness scores for ChatGPT, Bing, and Bard at 3.32, 2.16, and 2.79, respectively. AAO information and Bard responses were at the most accessible readability levels (AAO versus ChatGPT, AAO versus Bing, Bard versus ChatGPT, Bard versus Bing, all p<0.0001), with readability levels for AAO, ChatGPT, Bing, and Bard at 8.11, 13.01, 11.73, and 7.90, respectively. Bing responses had the lowest word and character count. Conclusion AI chatbot responses varied in accuracy, comprehensiveness, and readability. With accuracy scores and comprehensiveness below that of AAO brochures and elevated readability levels, AI chatbots require improvements to be a more useful supplementary source of glaucoma information for patients. Physicians must be aware of these limitations such that patients are asked about existing knowledge and questions and are then provided with clarifying and comprehensive information.

Collapse

Høj S, Thomsen SF, Meteran H, Sigsgaard T, Meteran H. Artificial intelligence and allergic rhinitis: does ChatGPT increase or impair the knowledge? J Public Health (Oxf) 2024;46:123-126. [PMID: 37968109 DOI: 10.1093/pubmed/fdad219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 09/14/2023] [Accepted: 10/06/2023] [Indexed: 11/17/2023] Open

Marshall RF, Mallem K, Xu H, Thorne J, Burkholder B, Chaon B, Liberman P, Berkenstock M. Investigating the Accuracy and Completeness of an Artificial Intelligence Large Language Model About Uveitis: An Evaluation of ChatGPT. Ocul Immunol Inflamm 2024:1-4. [PMID: 38394625 DOI: 10.1080/09273948.2024.2317417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 02/06/2024] [Indexed: 02/25/2024]

Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024;44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open

Hatia A, Doldo T, Parrini S, Chisci E, Cipriani L, Montagna L, Lagana G, Guenza G, Agosta E, Vinjolli F, Hoxha M, D’Amelio C, Favaretto N, Chisci G. Accuracy and Completeness of ChatGPT-Generated Information on Interceptive Orthodontics: A Multicenter Collaborative Study. J Clin Med 2024;13:735. [PMID: 38337430 PMCID: PMC10856539 DOI: 10.3390/jcm13030735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 01/21/2024] [Accepted: 01/25/2024] [Indexed: 02/12/2024] Open

Gravina AG, Pellegrino R, Cipullo M, Palladino G, Imperio G, Ventura A, Auletta S, Ciamarra P, Federico A. May ChatGPT be a tool producing medical information for common inflammatory bowel disease patients' questions? An evidence-controlled analysis. World J Gastroenterol 2024;30:17-33. [PMID: 38293321 PMCID: PMC10823903 DOI: 10.3748/wjg.v30.i1.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/07/2023] [Accepted: 12/28/2023] [Indexed: 01/06/2024] Open

Shemer A, Cohen M, Altarescu A, Atar-Vardi M, Hecht I, Dubinsky-Pertzov B, Shoshany N, Zmujack S, Or L, Einan-Lifshitz A, Pras E. Diagnostic capabilities of ChatGPT in ophthalmology. Graefes Arch Clin Exp Ophthalmol 2024:10.1007/s00417-023-06363-z. [PMID: 38183467 DOI: 10.1007/s00417-023-06363-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 12/04/2023] [Accepted: 12/23/2023] [Indexed: 01/08/2024] Open

Affiliation(s)

Asaf Shemer Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel. Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
Michal Cohen Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel Faculty of Health Science, Ben-Gurion University of the Negev, South District, Beer-Sheva, Israel
Aya Altarescu Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Maya Atar-Vardi Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Idan Hecht Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Biana Dubinsky-Pertzov Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Nadav Shoshany Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Sigal Zmujack Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Lior Or Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Adi Einan-Lifshitz Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Eran Pras Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel The Matlow's Ophthalmo-Genetics Laboratory, Department of Ophthalmology, Shamir Medical Center (Formerly Assaf-Harofeh), Tzrifin, Israel

Collapse

Morales-Ramirez P, Mishek H, Dasgupta A. The Genie Is Out of the Bottle: What ChatGPT Can and Cannot Do for Medical Professionals. Obstet Gynecol 2024;143:e1-e6. [PMID: 37944140 DOI: 10.1097/aog.0000000000005446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/12/2023] [Indexed: 11/12/2023]

Biswas S, Logan NS, Davies LN, Sheppard AL, Wolffsohn JS. Authors' Reply: Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia. Ophthalmic Physiol Opt 2024;44:233-234. [PMID: 37635297 DOI: 10.1111/opo.13227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 08/17/2023] [Indexed: 08/29/2023]

Wong M, Lim ZW, Pushpanathan K, Cheung CY, Wang YX, Chen D, Tham YC. Review of emerging trends and projection of future developments in large language models research in ophthalmology. Br J Ophthalmol 2023:bjo-2023-324734. [PMID: 38164563 DOI: 10.1136/bjo-2023-324734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 11/14/2023] [Indexed: 01/03/2024]

Abstract

BACKGROUND

Large language models (LLMs) are fast emerging as potent tools in healthcare, including ophthalmology. This systematic review offers a twofold contribution: it summarises current trends in ophthalmology-related LLM research and projects future directions for this burgeoning field.

METHODS

We systematically searched across various databases (PubMed, Europe PMC, Scopus and Web of Science) for articles related to LLM use in ophthalmology, published between 1 January 2022 and 31 July 2023. Selected articles were summarised, and categorised by type (editorial, commentary, original research, etc) and their research focus (eg, evaluating ChatGPT's performance in ophthalmology examinations or clinical tasks).

FINDINGS

We identified 32 articles meeting our criteria, published between January and July 2023, with a peak in June (n=12). Most were original research evaluating LLMs' proficiency in clinically related tasks (n=9). Studies demonstrated that ChatGPT-4.0 outperformed its predecessor, ChatGPT-3.5, in ophthalmology exams. Furthermore, ChatGPT excelled in constructing discharge notes (n=2), evaluating diagnoses (n=2) and answering general medical queries (n=6). However, it struggled with generating scientific articles or abstracts (n=3) and answering specific subdomain questions, especially those regarding specific treatment options (n=2). ChatGPT's performance relative to other LLMs (Google's Bard, Microsoft's Bing) varied by study design. Ethical concerns such as data hallucination (n=27), authorship (n=5) and data privacy (n=2) were frequently cited.

INTERPRETATION

While LLMs hold transformative potential for healthcare and ophthalmology, concerns over accountability, accuracy and data security remain. Future research should focus on application programming interface integration, comparative assessments of popular LLMs, their ability to interpret image-based data and the establishment of standardised evaluation frameworks.

Collapse

Ittarat M, Cheungpasitporn W, Chansangpetch S. Personalized Care in Eye Health: Exploring Opportunities, Challenges, and the Road Ahead for Chatbots. J Pers Med 2023;13:1679. [PMID: 38138906 PMCID: PMC10744965 DOI: 10.3390/jpm13121679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/24/2023] Open

Kunze KN. Editorial Commentary: Recognizing and Avoiding Medical Misinformation Across Digital Platforms: Smoke, Mirrors (and Streaming). Arthroscopy 2023;39:2454-2455. [PMID: 37981387 DOI: 10.1016/j.arthro.2023.06.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 06/27/2023] [Accepted: 06/30/2023] [Indexed: 11/21/2023]

Ferro Desideri L, Roth J, Zinkernagel M, Anguita R. "Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration". Int J Retina Vitreous 2023;9:71. [PMID: 37980501 PMCID: PMC10657493 DOI: 10.1186/s40942-023-00511-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 11/11/2023] [Indexed: 11/20/2023] Open

Biswas S, Logan NS, Davies LN, Sheppard AL, Wolffsohn JS. Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia. Ophthalmic Physiol Opt 2023;43:1562-1570. [PMID: 37476960 DOI: 10.1111/opo.13207] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/04/2023] [Accepted: 07/11/2023] [Indexed: 07/22/2023]

Abstract

PURPOSE

ChatGPT is an artificial intelligence language model, which uses natural language processing to simulate human conversation. It has seen a wide range of applications including healthcare education, research and clinical practice. This study evaluated the accuracy of ChatGPT in providing accurate and quality information to answer questions on myopia.

METHODS

A series of 11 questions (nine categories of general summary, cause, symptom, onset, prevention, complication, natural history, treatment and prognosis) were generated for this cross-sectional study. Each question was entered five times into fresh ChatGPT sessions (free from influence of prior questions). The responses were evaluated by a five-member team of optometry teaching and research staff. The evaluators individually rated the accuracy and quality of responses on a Likert scale, where a higher score indicated greater quality of information (1: very poor; 2: poor; 3: acceptable; 4: good; 5: very good). Median scores for each question were estimated and compared between evaluators. Agreement between the five evaluators and the reliability statistics of the questions were estimated.

RESULTS

Of the 11 questions on myopia, ChatGPT provided good quality information (median scores: 4.0) for 10 questions and acceptable responses (median scores: 3.0) for one question. Out of 275 responses in total, 66 (24%) were rated very good, 134 (49%) were rated good, whereas 60 (22%) were rated acceptable, 10 (3.6%) were rated poor and 5 (1.8%) were rated very poor. Cronbach's α of 0.807 indicated good level of agreement between test items. Evaluators' ratings demonstrated 'slight agreement' (Fleiss's κ, 0.005) with a significant difference in scoring among the evaluators (Kruskal-Wallis test, p < 0.001).

CONCLUSION

Overall, ChatGPT generated good quality information to answer questions on myopia. Although ChatGPT shows great potential in rapidly providing information on myopia, the presence of inaccurate responses demonstrates that further evaluation and awareness concerning its limitations are crucial to avoid potential misinterpretation.

Collapse

Mese I, Taslicay CA, Sivrioglu AK. Improving radiology workflow using ChatGPT and artificial intelligence. Clin Imaging 2023;103:109993. [PMID: 37812965 DOI: 10.1016/j.clinimag.2023.109993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/19/2023] [Accepted: 09/28/2023] [Indexed: 10/11/2023]

Taloni A, Borselli M, Scarsi V, Rossi C, Coco G, Scorcia V, Giannaccare G. Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology. Sci Rep 2023;13:18562. [PMID: 37899405 PMCID: PMC10613606 DOI: 10.1038/s41598-023-45837-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 10/24/2023] [Indexed: 10/31/2023] Open

Ong J, Hariprasad SM, Chhablani J. ChatGPT and GPT-4 in Ophthalmology: Applications of Large Language Model Artificial Intelligence in Retina. Ophthalmic Surg Lasers Imaging Retina 2023;54:557-562. [PMID: 37847163 DOI: 10.3928/23258160-20230926-01] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2023]

Rojas-Carabali W, Cifuentes-González C, Wei X, Putera I, Sen A, Thng ZX, Agrawal R, Elze T, Sobrin L, Kempen JH, Lee B, Biswas J, Nguyen QD, Gupta V, de-la-Torre A, Agrawal R. Evaluating the Diagnostic Accuracy and Management Recommendations of ChatGPT in Uveitis. Ocul Immunol Inflamm 2023:1-6. [PMID: 37722842 DOI: 10.1080/09273948.2023.2253471] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 08/25/2023] [Indexed: 09/20/2023]

Affiliation(s)

William Rojas-Carabali National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore Department of Bioinformatics, Lee Kong Chiang School of Medicine, Nanyang Technological University, Singapore, Singapore
Carlos Cifuentes-González Neuroscience Research Group (NEUROS), Neurovitae Center for Neuroscience, Institute of Translational Medicine (IMT), Escuela de Medicina y Ciencias de la Salud, Universidad del Rosario, Bogotá, Colombia
Xin Wei National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
Ikhwanuliman Putera Department of Ophthalmology, Faculty of Medicine Universitas Indonesia - CiptoMangunkusmoKirana Eye Hospital, Jakarta, Indonesia Laboratory Medical Immunology, Department of Immunology, ErasmusMC, University Medical Centre, Rotterdam, the Netherlands Department of Internal Medicine, Division of Clinical Immunology, Erasmus MC, University Medical Center, Rotterdam, The Netherlands Department of Ophthalmology, Erasmus MC, University Medical Center, Rotterdam, The Netherlands
Alok Sen Department of Vitreoretina and Uveitis, Sadguru Netra Chikatsalya, Chitrakoot, India
Zheng Xian Thng National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
Rajdeep Agrawal Department of Bioinformatics, Lee Kong Chiang School of Medicine, Nanyang Technological University, Singapore, Singapore
Tobias Elze Department of Ophthalmology, Massachusetts Eye and Ear/Harvard Medical School, and Schepens Eye Research Institute, Boston, Massachusetts, USA
Lucia Sobrin Department of Ophthalmology, Massachusetts Eye and Ear/Harvard Medical School, and Schepens Eye Research Institute, Boston, Massachusetts, USA
John H Kempen Department of Ophthalmology, Massachusetts Eye and Ear/Harvard Medical School, and Schepens Eye Research Institute, Boston, Massachusetts, USA Community Ophthalmology, Sight for Souls, Bellevue, Washington, USA Department of Ophthalmology, Addis Ababa University, Addis Ababa, Ethiopia MyungSung Christian Medical Center (MCM) Eye Unit, MCM Comprehensive Specialized Hospital, and MyungSung Medical School, Addis Ababa, Ethiopia
Bernett Lee National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
Jyotirmay Biswas Department of Ocular Pathology and Uveitis, Medical Research Foundation, Sankara Netralaya, Chennai, India
Quan Dong Nguyen Byers Eye Institute, Stanford University, Palo Alto, California, USA
Vishali Gupta Post Graduate Institute of Medical Education and Research (PGIMER), Advance Eye Centre, Chandigarh, India
Alejandra de-la-Torre National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
Rupesh Agrawal MyungSung Christian Medical Center (MCM) Eye Unit, MCM Comprehensive Specialized Hospital, and MyungSung Medical School, Addis Ababa, Ethiopia Department of Ophthalmology and Visual Sciences, Academic Clinical Program, Duke-NUS Medical School, Singapore, Singapore Moorfields Eye Hospital, NHS Foundation Trust, London, UK Singapore Eye Research Institute, The Academia, Singapore, Singapore

Collapse

Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun CH, Lam JSH, Chen DZ, Goh JHL, Tan MCJ, Sheng B, Cheng CY, Koh VTC, Tham YC. Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine 2023;95:104770. [PMID: 37625267 PMCID: PMC10470220 DOI: 10.1016/j.ebiom.2023.104770] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 07/21/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023] Open

Abstract

BACKGROUND

Large language models (LLMs) are garnering wide interest due to their human-like and contextually relevant responses. However, LLMs' accuracy across specific medical domains has yet been thoroughly evaluated. Myopia is a frequent topic which patients and parents commonly seek information online. Our study evaluated the performance of three LLMs namely ChatGPT-3.5, ChatGPT-4.0, and Google Bard, in delivering accurate responses to common myopia-related queries.

METHODS

We curated thirty-one commonly asked myopia care-related questions, which were categorised into six domains-pathogenesis, risk factors, clinical presentation, diagnosis, treatment and prevention, and prognosis. Each question was posed to the LLMs, and their responses were independently graded by three consultant-level paediatric ophthalmologists on a three-point accuracy scale (poor, borderline, good). A majority consensus approach was used to determine the final rating for each response. 'Good' rated responses were further evaluated for comprehensiveness on a five-point scale. Conversely, 'poor' rated responses were further prompted for self-correction and then re-evaluated for accuracy.

FINDINGS

ChatGPT-4.0 demonstrated superior accuracy, with 80.6% of responses rated as 'good', compared to 61.3% in ChatGPT-3.5 and 54.8% in Google Bard (Pearson's chi-squared test, all p ≤ 0.009). All three LLM-Chatbots showed high mean comprehensiveness scores (Google Bard: 4.35; ChatGPT-4.0: 4.23; ChatGPT-3.5: 4.11, out of a maximum score of 5). All LLM-Chatbots also demonstrated substantial self-correction capabilities: 66.7% (2 in 3) of ChatGPT-4.0's, 40% (2 in 5) of ChatGPT-3.5's, and 60% (3 in 5) of Google Bard's responses improved after self-correction. The LLM-Chatbots performed consistently across domains, except for 'treatment and prevention'. However, ChatGPT-4.0 still performed superiorly in this domain, receiving 70% 'good' ratings, compared to 40% in ChatGPT-3.5 and 45% in Google Bard (Pearson's chi-squared test, all p ≤ 0.001).

INTERPRETATION

Our findings underscore the potential of LLMs, particularly ChatGPT-4.0, for delivering accurate and comprehensive responses to myopia-related queries. Continuous strategies and evaluations to improve LLMs' accuracy remain crucial.

FUNDING

Dr Yih-Chung Tham was supported by the National Medical Research Council of Singapore (NMRC/MOH/HCSAINV21nov-0001).

Collapse

Affiliation(s)

Zhi Wei Lim Yong Loo Lin School of Medicine, National University of Singapore, Singapore
Krithi Pushpanathan Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
Samantha Min Er Yew Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
Yien Lai Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Chen-Hsin Sun Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Janice Sing Harn Lam Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
David Ziyou Chen Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Jocelyn Hui Lin Goh Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
Marcus Chun Jin Tan Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Bin Sheng Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China; Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China; MoE Key Lab of Artificial Intelligence, Artificial Intelligence Institute, Shanghai Jiao Tong University, Shanghai, China
Ching-Yu Cheng Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore
Victor Teck Chang Koh Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Yih-Chung Tham Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore.

Collapse

Singh S, Watson S. ChatGPT as a tool for conducting literature review for dry eye disease. Clin Exp Ophthalmol 2023;51:731-732. [PMID: 37321598 DOI: 10.1111/ceo.14268] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/07/2023] [Accepted: 06/02/2023] [Indexed: 06/17/2023]

Nielsen JPS, von Buchwald C, Grønhøj C. Validity of the large language model ChatGPT (GPT4) as a patient information source in otolaryngology by a variety of doctors in a tertiary otorhinolaryngology department. Acta Otolaryngol 2023;143:779-782. [PMID: 37694729 DOI: 10.1080/00016489.2023.2254809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 08/26/2023] [Indexed: 09/12/2023]

Khanna RK, Ducloyer JB, Hage A, Rezkallah A, Durbant E, Bigoteau M, Mouchel R, Guillon-Rolf R, Le L, Tahiri R, Chammas J, Baudouin C. Evaluating the potential of ChatGPT-4 in ophthalmology: The good, the bad and the ugly. J Fr Ophtalmol 2023;46:697-705. [PMID: 37573231 DOI: 10.1016/j.jfo.2023.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/10/2023] [Accepted: 07/18/2023] [Indexed: 08/14/2023]

Watters C, Lemanski MK. Universal skepticism of ChatGPT: a review of early literature on chat generative pre-trained transformer. Front Big Data 2023;6:1224976. [PMID: 37680954 PMCID: PMC10482048 DOI: 10.3389/fdata.2023.1224976] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/10/2023] [Indexed: 09/09/2023] Open

Bernstein IA, Zhang Y(V, Govil D, Majid I, Chang RT, Sun Y, Shue A, Chou JC, Schehlein E, Christopher KL, Groth SL, Ludwig C, Wang SY. Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions. JAMA Netw Open 2023;6:e2330320. [PMID: 37606922 PMCID: PMC10445188 DOI: 10.1001/jamanetworkopen.2023.30320] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 07/13/2023] [Indexed: 08/23/2023] Open

Abstract

Importance

Large language models (LLMs) like ChatGPT appear capable of performing a variety of tasks, including answering patient eye care questions, but have not yet been evaluated in direct comparison with ophthalmologists. It remains unclear whether LLM-generated advice is accurate, appropriate, and safe for eye patients.

Objective

To evaluate the quality of ophthalmology advice generated by an LLM chatbot in comparison with ophthalmologist-written advice.

Design, Setting, and Participants

This cross-sectional study used deidentified data from an online medical forum, in which patient questions received responses written by American Academy of Ophthalmology (AAO)-affiliated ophthalmologists. A masked panel of 8 board-certified ophthalmologists were asked to distinguish between answers generated by the ChatGPT chatbot and human answers. Posts were dated between 2007 and 2016; data were accessed January 2023 and analysis was performed between March and May 2023.

Main Outcomes and Measures

Identification of chatbot and human answers on a 4-point scale (likely or definitely artificial intelligence [AI] vs likely or definitely human) and evaluation of responses for presence of incorrect information, alignment with perceived consensus in the medical community, likelihood to cause harm, and extent of harm.

Results

A total of 200 pairs of user questions and answers by AAO-affiliated ophthalmologists were evaluated. The mean (SD) accuracy for distinguishing between AI and human responses was 61.3% (9.7%). Of 800 evaluations of chatbot-written answers, 168 answers (21.0%) were marked as human-written, while 517 of 800 human-written answers (64.6%) were marked as AI-written. Compared with human answers, chatbot answers were more frequently rated as probably or definitely written by AI (prevalence ratio [PR], 1.72; 95% CI, 1.52-1.93). The likelihood of chatbot answers containing incorrect or inappropriate material was comparable with human answers (PR, 0.92; 95% CI, 0.77-1.10), and did not differ from human answers in terms of likelihood of harm (PR, 0.84; 95% CI, 0.67-1.07) nor extent of harm (PR, 0.99; 95% CI, 0.80-1.22).

Conclusions and Relevance

In this cross-sectional study of human-written and AI-generated responses to 200 eye care questions from an online advice forum, a chatbot appeared capable of responding to long user-written eye health posts and largely generated appropriate responses that did not differ significantly from ophthalmologist-written responses in terms of incorrect information, likelihood of harm, extent of harm, or deviation from ophthalmologist community standards. Additional research is needed to assess patient attitudes toward LLM-augmented ophthalmologists vs fully autonomous AI content generation, to evaluate clarity and acceptability of LLM-generated answers from the patient perspective, to test the performance of LLMs in a greater variety of clinical contexts, and to determine an optimal manner of utilizing LLMs that is ethical and minimizes harm.

Collapse

Liu J, Wang C, Liu S. Utility of ChatGPT in Clinical Practice. J Med Internet Res 2023;25:e48568. [PMID: 37379067 PMCID: PMC10365580 DOI: 10.2196/48568] [Citation(s) in RCA: 67] [Impact Index Per Article: 67.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 05/29/2023] [Accepted: 06/15/2023] [Indexed: 06/29/2023] Open

Honavar SG. Eye of the AI storm: Exploring the impact of AI tools in ophthalmology. Indian J Ophthalmol 2023;71:2328-2340. [PMID: 37322638 PMCID: PMC10418018 DOI: 10.4103/ijo.ijo_1478_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023] Open

Bhayana R, Bleakney RR, Krishna S. GPT-4 in Radiology: Improvements in Advanced Reasoning. Radiology 2023:230987. [PMID: 37191491 DOI: 10.1148/radiol.230987] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]

Singh S, Djalilian A, Ali MJ. ChatGPT and Ophthalmology: Exploring Its Potential with Discharge Summaries and Operative Notes. Semin Ophthalmol 2023:1-5. [PMID: 37133418 DOI: 10.1080/08820538.2023.2209166] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Abstract

PURPOSE

This study aimed to report the abilities of the large language model ChatGPT^R (OpenAI, San Francisco, USA) in constructing ophthalmic discharge summaries and operative notes.

METHODS

A set of prompts was constructed through statements incorporating common ophthalmic surgeries across the subspecialties of the cornea, retina, glaucoma, paediatric ophthalmology, neuro-ophthalmology, and ophthalmic plastics surgery. The responses of ChatGPT were assessed by three surgeons carefully and analyzed them for evidence-based content, specificity of the response, presence of generic text, disclaimers, factual inaccuracies, and its abilities to admit mistakes and challenge incorrect premises.

RESULTS

A total of 24 prompts were presented to the ChatGPT. Twelve prompts assessed its ability to construct discharge summaries, and an equal number explored the potential for preparing operative notes. The response was found to be tailored based on the quality of inputs given and was provided in a matter of seconds. The ophthalmic discharge summaries had a valid but significant generic text. ChatGPT could incorporate specific medications, follow-up instructions, consultation time, and location within the discharge summaries when prompted appropriately. While the operative notes were detailed, they required significant tuning. ChatGPT routinely admits its mistakes and corrects itself immediately when confronted with factual inaccuracies. The mistakes are avoided in subsequent reports when given similar prompts.

CONCLUSION

The performance of ChatGPT in the context of ophthalmic discharge summaries and operative notes was encouraging. These are constructed rapidly in a matter of seconds. Focused training of ChatGPT on these issues with inclusion of a human verification step has an enormous potential to impact healthcare positively.

Collapse

Dutton JJ. Artificial Intelligence and the Future of Computer-Assisted Medical Research and Writing. Ophthalmic Plast Reconstr Surg 2023;39:203-205. [PMID: 37166288 DOI: 10.1097/iop.0000000000002420] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]

Ali MJ. ChatGPT and Lacrimal Drainage Disorders: Performance and Scope of Improvement. Ophthalmic Plast Reconstr Surg 2023;39:221-225. [PMID: 37166289 PMCID: PMC10171282 DOI: 10.1097/iop.0000000000002418] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/11/2023] [Indexed: 05/12/2023]

Abstract

PURPOSE

This study aimed to report the performance of the large language model ChatGPT (OpenAI, San Francisco, CA, U.S.A.) in the context of lacrimal drainage disorders.

METHODS

A set of prompts was constructed through questions and statements spanning common and uncommon aspects of lacrimal drainage disorders. Care was taken to avoid constructing prompts that had significant or new knowledge beyond the year 2020. Each of the prompts was presented thrice to ChatGPT. The questions covered common disorders such as primary acquired nasolacrimal duct obstruction and congenital nasolacrimal duct obstruction and their cause and management. The prompts also tested ChatGPT on certain specifics, such as the history of dacryocystorhinostomy (DCR) surgery, lacrimal pump anatomy, and human canalicular surfactants. ChatGPT was also quizzed on controversial topics such as silicone intubation and the use of mitomycin C in DCR surgery. The responses of ChatGPT were carefully analyzed for evidence-based content, specificity of the response, presence of generic text, disclaimers, factual inaccuracies, and its abilities to admit mistakes and challenge incorrect premises. Three lacrimal surgeons graded the responses into three categories: correct, partially correct, and factually incorrect.

RESULTS

A total of 21 prompts were presented to the ChatGPT. The responses were detailed and were based according to the prompt structure. In response to most questions, ChatGPT provided a generic disclaimer that it could not give medical advice or professional opinion but then provided an answer to the question in detail. Specific prompts such as "how can I perform an external DCR?" were responded by a sequential listing of all the surgical steps. However, several factual inaccuracies were noted across many ChatGPT replies. Several responses on controversial topics such as silicone intubation and mitomycin C were generic and not precisely evidence-based. ChatGPT's response to specific questions such as canalicular surfactants and idiopathic canalicular inflammatory disease was poor. The presentation of variable prompts on a single topic led to responses with either repetition or recycling of the phrases. Citations were uniformly missing across all responses. Agreement among the three observers was high (95%) in grading the responses. The responses of ChatGPT were graded as correct for only 40% of the prompts, partially correct in 35%, and outright factually incorrect in 25%. Hence, some degree of factual inaccuracy was present in 60% of the responses, if we consider the partially correct responses. The exciting aspect was that ChatGPT was able to admit mistakes and correct them when presented with counterarguments. It was also capable of challenging incorrect prompts and premises.

CONCLUSION

The performance of ChatGPT in the context of lacrimal drainage disorders, at best, can be termed average. However, the potential of this AI chatbot to influence medicine is enormous. There is a need for it to be specifically trained and retrained for individual medical subspecialties.

Collapse