Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Koga S, Martin NB, Dickson DW. Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders. Brain Pathol 2024;34:e13207. [PMID: 37553205 PMCID: PMC11006994 DOI: 10.1111/bpa.13207] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 07/31/2023] [Indexed: 08/10/2023] Open

Number

Cited by Other Article(s)

Wu J, Ma Y, Wang J, Xiao M. The Application of ChatGPT in Medicine: A Scoping Review and Bibliometric Analysis. J Multidiscip Healthc 2024;17:1681-1692. [PMID: 38650670 PMCID: PMC11034560 DOI: 10.2147/jmdh.s463128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open

Abstract

Purpose

ChatGPT has a wide range of applications in the medical field. Therefore, this review aims to define the key issues and provide a comprehensive view of the literature based on the application of ChatGPT in medicine.

Methods

This scope follows Arksey and O'Malley's five-stage framework. A comprehensive literature search of publications (30 November 2022 to 16 August 2023) was conducted. Six databases were searched and relevant references were systematically catalogued. Attention was focused on the general characteristics of the articles, their fields of application, and the advantages and disadvantages of using ChatGPT. Descriptive statistics and narrative synthesis methods were used for data analysis.

Results

Of the 3426 studies, 247 met the criteria for inclusion in this review. The majority of articles (31.17%) were from the United States. Editorials (43.32%) ranked first, followed by experimental studys (11.74%). The potential applications of ChatGPT in medicine are varied, with the largest number of studies (45.75%) exploring clinical practice, including assisting with clinical decision support and providing disease information and medical advice. This was followed by medical education (27.13%) and scientific research (16.19%). Particularly noteworthy in the discipline statistics were radiology, surgery and dentistry at the top of the list. However, ChatGPT in medicine also faces issues of data privacy, inaccuracy and plagiarism.

Conclusion

The application of ChatGPT in medicine focuses on different disciplines and general application scenarios. ChatGPT has a paradoxical nature: it offers significant advantages, but at the same time raises great concerns about its application in healthcare settings. Therefore, it is imperative to develop theoretical frameworks that not only address its widespread use in healthcare but also facilitate a comprehensive assessment. In addition, these frameworks should contribute to the development of strict and effective guidelines and regulatory measures.

Collapse

Du X, Novoa-Laurentiev J, Plasek JM, Chuang YW, Wang L, Chang F, Datta S, Paek H, Lin B, Wei Q, Wang X, Wang J, Ding H, Manion FJ, Du J, Zhou L. Enhancing Early Detection of Cognitive Decline in the Elderly through Ensemble of NLP Techniques: A Comparative Study Utilizing Large Language Models in Clinical Notes. medRxiv 2024:2024.04.03.24305298. [PMID: 38633810 PMCID: PMC11023645 DOI: 10.1101/2024.04.03.24305298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]

Abstract

Background

Early detection of cognitive decline in elderly individuals facilitates clinical trial enrollment and timely medical interventions. This study aims to apply, evaluate, and compare advanced natural language processing techniques for identifying signs of cognitive decline in clinical notes.

Methods

This study, conducted at Mass General Brigham (MGB), Boston, MA, included clinical notes from the 4 years prior to initial mild cognitive impairment (MCI) diagnosis in 2019 for patients ≥ 50 years. Note sections regarding cognitive decline were labeled manually. A random sample of 4,949 note sections filtered with cognitive functions-related keywords were used for traditional AI model development, and 200 random subset were used for LLM and prompt development; another random sample of 1996 note sections without keyword filtering were used for testing. Prompt templates for large language models (LLM), Llama 2 on Amazon Web Service and GPT-4 on Microsoft Azure, were developed with multiple prompting approaches to select the optimal LLM-based method. Baseline comparisons were made with XGBoost and a hierarchical attention-based deep neural network model. An ensemble of the three models was then constructed using majority vote.

Results

GPT-4 demonstrated superior accuracy and efficiency to Llama 2. The ensemble model outperformed individual models, achieving a precision of 90.3%, recall of 94.2%, and F1-score of 92.2%. Notably, the ensemble model demonstrated a marked improvement in precision (from a 70%-79% range to above 90%) compared to the best performing single model. Error analysis revealed 63 samples were wrongly predicted by at least one model; however, only 2 cases (3.2%) were mutual errors across all models, indicating diverse error profiles among them.

Conclusion

Our findings indicate that LLMs and traditional models exhibit diverse error profiles. The ensemble of LLMs and locally trained machine learning models on EHR data was found to be complementary, enhancing performance and improving diagnostic accuracy.

Collapse

Carlà MM, Gambini G, Baldascino A, Boselli F, Giannuzzi F, Margollicci F, Rizzo S. Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison. Graefes Arch Clin Exp Ophthalmol 2024:10.1007/s00417-024-06470-5. [PMID: 38573349 DOI: 10.1007/s00417-024-06470-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/11/2024] [Accepted: 03/20/2024] [Indexed: 04/05/2024] Open

Wu L, Xu J, Thakkar S, Gray M, Qu Y, Li D, Tong W. A framework enabling LLMs into regulatory environment for transparency and trustworthiness and its application to drug labeling document. Regul Toxicol Pharmacol 2024;149:105613. [PMID: 38570021 DOI: 10.1016/j.yrtph.2024.105613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 03/18/2024] [Accepted: 03/26/2024] [Indexed: 04/05/2024]

Koga S. The double-edged nature of ChatGPT in self-diagnosis. Wien Klin Wochenschr 2024;136:243-244. [PMID: 38504058 DOI: 10.1007/s00508-024-02343-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 02/27/2024] [Indexed: 03/21/2024]

Elyoseph Z, Levkovich I. Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study. JMIR Ment Health 2024;11:e53043. [PMID: 38533615 PMCID: PMC11004608 DOI: 10.2196/53043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 01/24/2024] [Accepted: 02/11/2024] [Indexed: 03/28/2024] Open

Abstract

Background

The current paradigm in mental health care focuses on clinical recovery and symptom remission. This model's efficacy is influenced by therapist trust in patient recovery potential and the depth of the therapeutic relationship. Schizophrenia is a chronic illness with severe symptoms where the possibility of recovery is a matter of debate. As artificial intelligence (AI) becomes integrated into the health care field, it is important to examine its ability to assess recovery potential in major psychiatric disorders such as schizophrenia.

Objective

This study aimed to evaluate the ability of large language models (LLMs) in comparison to mental health professionals to assess the prognosis of schizophrenia with and without professional treatment and the long-term positive and negative outcomes.

Methods

Vignettes were inputted into LLMs interfaces and assessed 10 times by 4 AI platforms: ChatGPT-3.5, ChatGPT-4, Google Bard, and Claude. A total of 80 evaluations were collected and benchmarked against existing norms to analyze what mental health professionals (general practitioners, psychiatrists, clinical psychologists, and mental health nurses) and the general public think about schizophrenia prognosis with and without professional treatment and the positive and negative long-term outcomes of schizophrenia interventions.

Results

For the prognosis of schizophrenia with professional treatment, ChatGPT-3.5 was notably pessimistic, whereas ChatGPT-4, Claude, and Bard aligned with professional views but differed from the general public. All LLMs believed untreated schizophrenia would remain static or worsen without professional treatment. For long-term outcomes, ChatGPT-4 and Claude predicted more negative outcomes than Bard and ChatGPT-3.5. For positive outcomes, ChatGPT-3.5 and Claude were more pessimistic than Bard and ChatGPT-4.

Conclusions

The finding that 3 out of the 4 LLMs aligned closely with the predictions of mental health professionals when considering the "with treatment" condition is a demonstration of the potential of this technology in providing professional clinical prognosis. The pessimistic assessment of ChatGPT-3.5 is a disturbing finding since it may reduce the motivation of patients to start or persist with treatment for schizophrenia. Overall, although LLMs hold promise in augmenting health care, their application necessitates rigorous validation and a harmonious blend with human expertise.

Collapse

Raman R, Venugopalan M, Kamal A. Evaluating human resources management literacy: A performance analysis of ChatGPT and bard. Heliyon 2024;10:e27026. [PMID: 38486738 PMCID: PMC10937570 DOI: 10.1016/j.heliyon.2024.e27026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 02/16/2024] [Accepted: 02/22/2024] [Indexed: 03/17/2024] Open

Carlà MM, Gambini G, Baldascino A, Giannuzzi F, Boselli F, Crincoli E, D'Onofrio NC, Rizzo S. Exploring AI-chatbots' capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases. Br J Ophthalmol 2024:bjo-2023-325143. [PMID: 38448201 DOI: 10.1136/bjo-2023-325143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 02/16/2024] [Indexed: 03/08/2024]

Koga S, Du W. Integrating AI in medicine: Lessons from Chat-GPT's limitations in medical imaging. Dig Liver Dis 2024:S1590-8658(24)00275-5. [PMID: 38429138 DOI: 10.1016/j.dld.2024.02.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Accepted: 02/19/2024] [Indexed: 03/03/2024]

Wang L, Chen X, Deng X, Wen H, You M, Liu W, Li Q, Li J. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. NPJ Digit Med 2024;7:41. [PMID: 38378899 PMCID: PMC10879172 DOI: 10.1038/s41746-024-01029-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 02/05/2024] [Indexed: 02/22/2024] Open

García-Méndez S, de Arriba-Pérez F. Large Language Models and Healthcare Alliance: Potential and Challenges of Two Representative Use Cases. Ann Biomed Eng 2024:10.1007/s10439-024-03454-8. [PMID: 38310159 DOI: 10.1007/s10439-024-03454-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 01/15/2024] [Indexed: 02/05/2024]

Iannantuono GM, Bracken-Clarke D, Karzai F, Choo-Wosoba H, Gulley JL, Floudas CS. Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study. medRxiv 2023:2023.10.31.23297825. [PMID: 38076813 PMCID: PMC10705618 DOI: 10.1101/2023.10.31.23297825] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]