1
|
Syryca F, Gräßer C, Trenkwalder T, Nicol P. Automated generation of echocardiography reports using artificial intelligence: a novel approach to streamlining cardiovascular diagnostics. THE INTERNATIONAL JOURNAL OF CARDIOVASCULAR IMAGING 2025:10.1007/s10554-025-03382-1. [PMID: 40159559 DOI: 10.1007/s10554-025-03382-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Accepted: 03/12/2025] [Indexed: 04/02/2025]
Abstract
Accurate interpretation of echocardiography measurements is essential for diagnosing cardiovascular diseases and guiding clinical management. The emergence of large language models (LLMs) like ChatGPT presents a novel opportunity to automate the generation of echocardiography reports and provide clinical recommendations. This study aimed to evaluate the ability of an LLM (ChatGPT) to 1) generate comprehensive echocardiography reports based solely on provided echocardiographic measurements, and when enriched with clinical information 2) formulate accurate diagnoses, along with appropriate recommendations for further tests, treatment, and follow-up. Echocardiographic data from n = 13 fictional cases (Group 1) and n = 8 clinical cases (Group 2) were input into the LLM. The model's outputs were compared against standard clinical assessments conducted by experienced cardiologists. Using a dedicated scoring system, the LLM's performance was evaluated and stratified based on its accuracy in report generation, diagnostic precision, and the appropriateness of its recommendations. Patterns, frequency and examples of misinterpretations by LLM were analysed. Across all cases, mean total score was 6.86 (SD = 1.12). Group 1 had a mean total score of 6.54 (SD = 1.13) and accuracy of 3.92 (SD = 0.86), while Group 2 scored 7.38 (SD = 0.92) and 4.38 (SD = 0.92), respectively. Recommendations were 2.62 (SD = 0.51) for Group 1 and 3.00 (SD = 0.00) for Group 2, with no significant differences (p = 0.096). Fully acceptable reports were 85.7%, borderline acceptable 14.3%, and none were not acceptable. Of 299 parameters, 5.3% were misinterpreted. The LLM demonstrated a high level of accuracy in generating detailed echocardiography reports, mostly correctly identifying normal and abnormal findings, and making accurate diagnoses across a range of cardiovascular conditions. ChatGPT, as an LLM, shows significant potential in automating the interpretation of echocardiographic data, offering accurate diagnostic insights and clinical recommendations. These findings suggest that LLMs could serve as valuable tools in clinical practice, assisting and streamlining clinical workflow.
Collapse
Affiliation(s)
- Finn Syryca
- Department of Cardiovascular Diseases, German Heart Centre Munich, School of Medicine and Health, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Christian Gräßer
- Department of Cardiovascular Diseases, German Heart Centre Munich, School of Medicine and Health, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Teresa Trenkwalder
- Department of Cardiovascular Diseases, German Heart Centre Munich, School of Medicine and Health, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Philipp Nicol
- Department of Cardiovascular Diseases, German Heart Centre Munich, School of Medicine and Health, TUM University Hospital, Technical University of Munich, Munich, Germany.
- MVZ Med 360 Grad Alter Hof Kardiologe Und Nuklearmedizin, Dienerstraße 12, 80331, Munich, Germany.
| |
Collapse
|
2
|
Pedro T, Sousa JM, Fonseca L, Gama MG, Moreira G, Pintalhão M, Chaves PC, Aires A, Alves G, Augusto L, Pinheiro Albuquerque L, Castro P, Silva ML. Exploring the use of ChatGPT in predicting anterior circulation stroke functional outcomes after mechanical thrombectomy: a pilot study. J Neurointerv Surg 2025; 17:261-265. [PMID: 38453462 DOI: 10.1136/jnis-2024-021556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 02/27/2024] [Indexed: 03/09/2024]
Abstract
BACKGROUND Accurate prediction of functional outcomes is crucial in stroke management, but this remains challenging. OBJECTIVE To evaluate the performance of the generative language model ChatGPT in predicting the functional outcome of patients with acute ischemic stroke (AIS) 3 months after mechanical thrombectomy (MT) in order to assess whether ChatGPT can used to be accurately predict the modified Rankin Scale (mRS) score at 3 months post-thrombectomy. METHODS We conducted a retrospective analysis of clinical, neuroimaging, and procedure-related data from 163 patients with AIS undergoing MT. The agreement between ChatGPT's exact and dichotomized predictions and actual mRS scores was assessed using Cohen's κ. The added value of ChatGPT was measured by evaluating the agreement of predicted dichotomized outcomes using an existing validated score, the MT-DRAGON. RESULTS ChatGPT demonstrated fair (κ=0.354, 95% CI 0.260 to 0.448) and good (κ=0.727, 95% CI 0.620 to 0.833) agreement with the true exact and dichotomized mRS scores at 3 months, respectively, outperforming MT-DRAGON in overall and subgroup predictions. ChatGPT agreement was higher for patients with shorter last-time-seen-well-to-door delay, distal occlusions, and better modified Thrombolysis in Cerebral Infarction scores. CONCLUSIONS ChatGPT adequately predicted short-term functional outcomes in post-thrombectomy patients with AIS and was better than the existing risk score. Integrating AI models into clinical practice holds promise for patient care, yet refining these models is crucial for enhanced accuracy in stroke management.
Collapse
Affiliation(s)
- Tiago Pedro
- Department of Neuroradiology, Centro Hospitalar Universitário de São João, Porto, Portugal
| | - José Maria Sousa
- Department of Neuroradiology, Centro Hospitalar Universitário de São João, Porto, Portugal
| | - Luísa Fonseca
- Department of Medicine, University of Porto, Porto, Portugal
- Department of Internal Medicine, Centro Hospitalar Universitário de São João, Porto, Portugal
| | - Manuel G Gama
- Department of Medicine, University of Porto, Porto, Portugal
- Department of Internal Medicine, Centro Hospitalar Universitário de São João, Porto, Portugal
| | - Goreti Moreira
- Department of Medicine, University of Porto, Porto, Portugal
- Department of Internal Medicine, Centro Hospitalar Universitário de São João, Porto, Portugal
| | - Mariana Pintalhão
- Department of Medicine, University of Porto, Porto, Portugal
- Department of Internal Medicine, Centro Hospitalar Universitário de São João, Porto, Portugal
| | - Paulo C Chaves
- Department of Medicine, University of Porto, Porto, Portugal
- Department of Internal Medicine, Centro Hospitalar Universitário de São João, Porto, Portugal
| | - Ana Aires
- Department of Internal Medicine, Centro Hospitalar Universitário de São João, Porto, Portugal
- Department of Neurology, Centro Hospitalar Universitário de São João, Porto, Portugal
| | - Gonçalo Alves
- Department of Neuroradiology, Centro Hospitalar Universitário de São João, Porto, Portugal
- Centro de Referência de Neurorradiologia de Intervenção na Doença Cerebrovascular, Centro Hospitalar Universitário de São João, Porto, Portugal
| | - Luís Augusto
- Department of Neuroradiology, Centro Hospitalar Universitário de São João, Porto, Portugal
- Centro de Referência de Neurorradiologia de Intervenção na Doença Cerebrovascular, Centro Hospitalar Universitário de São João, Porto, Portugal
| | - Luís Pinheiro Albuquerque
- Department of Neuroradiology, Centro Hospitalar Universitário de São João, Porto, Portugal
- Centro de Referência de Neurorradiologia de Intervenção na Doença Cerebrovascular, Centro Hospitalar Universitário de São João, Porto, Portugal
| | - Pedro Castro
- Department of Neurology, Centro Hospitalar Universitário de São João, Porto, Portugal
- Department of Clinical Neurosciences and Mental Health, University of Porto, Porto, Portugal
| | - Maria Luís Silva
- Department of Neuroradiology, Centro Hospitalar Universitário de São João, Porto, Portugal
- Centro de Referência de Neurorradiologia de Intervenção na Doença Cerebrovascular, Centro Hospitalar Universitário de São João, Porto, Portugal
| |
Collapse
|
3
|
Mehta R, Reitz JG, Venna A, Selcuk A, Dhamala B, Klein J, Sawda C, Haverty M, Yerebakan C, Tongut A, Desai M, d'Udekem Y. Navigating the future of pediatric cardiovascular surgery: Insights and innovation powered by Chat Generative Pre-Trained Transformer (ChatGPT). J Thorac Cardiovasc Surg 2025:S0022-5223(25)00093-5. [PMID: 39894069 DOI: 10.1016/j.jtcvs.2025.01.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 12/16/2024] [Accepted: 01/10/2025] [Indexed: 02/04/2025]
Abstract
INTRODUCTION Interdisciplinary consultations are essential to decision-making for patients with congenital heart disease. The integration of artificial intelligence (AI) and natural language processing into medical practice is rapidly accelerating, opening new avenues to diagnosis and treatment. The main objective of this study was to consult the AI-trained model Chat Generative Pre-Trained Transformer (ChatGPT) regarding cases discussed during a cardiovascular surgery conference (CSC) at a single tertiary center and compare the ChatGPT suggestions with CSC expert consensus results. METHODS In total, 37 cases discussed at a single CSC were retrospectively identified. Clinical information comprised deidentified data from the last electrocardiogram, echocardiogram, intensive care unit progress note (or cardiology clinic note if outpatient), as well as a patient summary. The diagnosis was removed from the summary and possible treatment options were deleted from all notes. ChatGPT (version 4.0) was asked to summarize the case, identify diagnoses, and recommend surgical procedures and timing of surgery. The responses of ChatGPT were compared with the results of the CSC. RESULTS Of the 37 cases uploaded to ChatGPT, 45.9% (n = 17) were considered to be less complex cases, with only 1 treatment option, and 54.1% (n = 20) were considered more complex, with several treatment options. ChatGPT correctly provided a detailed and systematically written summary for each case within 10 to 15 seconds. ChatGPT correctly identified diagnoses for approximately 94.5% (n = 35) cases. The surgical intervention plan matched the group decision for approximately 40.5% (n = 15) cases; however, it differed in 27% cases. In 23 of 37 cases, timing of surgery was the same between CSC group and ChatGPT. Overall, the match between ChatGPT responses and CSC decisions for diagnosis was 94.5%, surgical intervention was 40.5%, and timing of surgery was 62.2%. However, within complex cases, we have 25% agreement for surgical intervention and 67% for timing of surgery. CONCLUSIONS ChatGPT can be used as an augmentative tool for surgical conferences to systematically summarize large amounts of patient data from electronic health records and clinical notes in seconds. In addition, our study points out the potential of ChatGPT as an AI-based decision support tool in surgery, particularly for less-complex cases. The discrepancy, particularly in complex cases, emphasizes on the need for caution when using ChatGPT in decision-making for the complex cases in pediatric cardiovascular surgery. There is little doubt that the public will soon use this comparative tool.
Collapse
Affiliation(s)
- Rittal Mehta
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Justus G Reitz
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Alyssia Venna
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Arif Selcuk
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Bishakha Dhamala
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Jennifer Klein
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Christine Sawda
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Mitchell Haverty
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Can Yerebakan
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Aybala Tongut
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Manan Desai
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Yves d'Udekem
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC.
| |
Collapse
|
4
|
Ghozali MT. Assessing ChatGPT's accuracy and reliability in asthma general knowledge: implications for artificial intelligence use in public health education. J Asthma 2025:1-9. [PMID: 39773167 DOI: 10.1080/02770903.2025.2450482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 12/21/2024] [Accepted: 01/02/2025] [Indexed: 01/11/2025]
Abstract
BACKGROUND Integrating Artificial Intelligence (AI) into public health education represents a pivotal advancement in medical knowledge dissemination, particularly for chronic diseases such as asthma. This study assesses the accuracy and comprehensiveness of ChatGPT, a conversational AI model, in providing asthma-related information. METHODS Employing a rigorous mixed-methods approach, healthcare professionals evaluated ChatGPT's responses to the Asthma General Knowledge Questionnaire for Adults (AGKQA), a standardized instrument covering various asthma-related topics. Responses were graded for accuracy and completeness and analyzed using statistical tests to assess reproducibility and consistency. RESULTS ChatGPT showed notable proficiency in conveying asthma knowledge, with flawless success in the etiology and pathophysiology categories and substantial accuracy in medication information (70%). However, limitations were noted in medication-related responses, where mixed accuracy (30%) highlights the need for further refinement of ChatGPT's capabilities to ensure reliability in critical areas of asthma education. Reproducibility analysis demonstrated a consistent 100% rate across all categories, affirming ChatGPT's reliability in delivering uniform information. Statistical analyses further underscored ChatGPT's stability and reliability. CONCLUSION These findings underscore ChatGPT's promise as a valuable educational tool for asthma while emphasizing the necessity of ongoing improvements to address observed limitations, particularly regarding medication-related information.
Collapse
Affiliation(s)
- Muhammad Thesa Ghozali
- Department of Pharmaceutical Management, School of Pharmacy, Faculty of Medicine and Health Sciences, Universitas Muhammadiyah Yogyakarta
| |
Collapse
|
5
|
Wang D, Liang J, Ye J, Li J, Li J, Zhang Q, Hu Q, Pan C, Wang D, Liu Z, Shi W, Shi D, Li F, Qu B, Zheng Y. Enhancement of the Performance of Large Language Models in Diabetes Education through Retrieval-Augmented Generation: Comparative Study. J Med Internet Res 2024; 26:e58041. [PMID: 39046096 PMCID: PMC11584532 DOI: 10.2196/58041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/03/2024] [Accepted: 07/15/2024] [Indexed: 07/25/2024] Open
Abstract
BACKGROUND Large language models (LLMs) demonstrated advanced performance in processing clinical information. However, commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information. Given the need for self-management in diabetes, patients commonly seek information online. We introduce the Retrieval-augmented Information System for Enhancement (RISE) framework and evaluate its performance in enhancing LLMs to provide accurate responses to diabetes-related inquiries. OBJECTIVE This study aimed to evaluate the potential of the RISE framework, an information retrieval and augmentation tool, to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries. METHODS The RISE, an innovative retrieval augmentation framework, comprises 4 steps: rewriting query, information retrieval, summarization, and execution. Using a set of 43 common diabetes-related questions, we evaluated 3 base LLMs (GPT-4, Anthropic Claude 2, Google Bard) and their RISE-enhanced versions respectively. Assessments were conducted by clinicians for accuracy and comprehensiveness and by patients for understandability. RESULTS The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all 3 base LLMs. On average, the percentage of accurate responses increased by 12% (15/129) with RISE. Specifically, the rates of accurate responses increased by 7% (3/43) for GPT-4, 19% (8/43) for Claude 2, and 9% (4/43) for Google Bard. The framework also enhanced response comprehensiveness, with mean scores improving by 0.44 (SD 0.10). Understandability was also enhanced by 0.19 (SD 0.13) on average. Data collection was conducted from September 30, 2023 to February 5, 2024. CONCLUSIONS The RISE significantly improves LLMs' performance in responding to diabetes-related inquiries, enhancing accuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role in patient education and chronic illness self-management, which contributes to relieving medical resource pressures and raising public awareness of medical knowledge.
Collapse
Affiliation(s)
- Dingqiao Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| | - Jiangbo Liang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| | - Jinguo Ye
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| | - Jingni Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| | - Jingpeng Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| | - Qikai Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| | - Qiuling Hu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| | - Caineng Pan
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| | - Dongliang Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| | - Zhong Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| | - Wen Shi
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| | - Danli Shi
- Research Centre for SHARP Vision, The Hong Kong Polytechnic University, Hong Kong, China
| | - Fei Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| | - Bo Qu
- Peking University Third Hospital, Beijing, China
| | - Yingfeng Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, GuangZhou, China
| |
Collapse
|
6
|
Leon M, Ruaengsri C, Pelletier G, Bethencourt D, Shibata M, Flores MQ, Shudo Y. Harnessing the Power of ChatGPT in Cardiovascular Medicine: Innovations, Challenges, and Future Directions. J Clin Med 2024; 13:6543. [PMID: 39518681 PMCID: PMC11546989 DOI: 10.3390/jcm13216543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 10/08/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Cardiovascular diseases remain the leading cause of morbidity and mortality globally, posing significant challenges to public health. The rapid evolution of artificial intelligence (AI), particularly with large language models such as ChatGPT, has introduced transformative possibilities in cardiovascular medicine. This review examines ChatGPT's broad applications in enhancing clinical decision-making-covering symptom analysis, risk assessment, and differential diagnosis; advancing medical education for both healthcare professionals and patients; and supporting research and academic communication. Key challenges associated with ChatGPT, including potential inaccuracies, ethical considerations, data privacy concerns, and inherent biases, are discussed. Future directions emphasize improving training data quality, developing specialized models, refining AI technology, and establishing regulatory frameworks to enhance ChatGPT's clinical utility and mitigate associated risks. As cardiovascular medicine embraces AI, ChatGPT stands out as a powerful tool with substantial potential to improve therapeutic outcomes, elevate care quality, and advance research innovation. Fully understanding and harnessing this potential is essential for the future of cardiovascular health.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Yasuhiro Shudo
- Department of Cardiothoracic Surgery, Stanford University School of Medicine, 300 Pasteur Drive, Falk CVRB, Stanford, CA 94305, USA; (C.R.); (G.P.); (D.B.); (M.Q.F.)
| |
Collapse
|
7
|
Hwai H, Ho YJ, Wang CH, Huang CH. Large language model application in emergency medicine and critical care. J Formos Med Assoc 2024:S0929-6646(24)00400-5. [PMID: 39198112 DOI: 10.1016/j.jfma.2024.08.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 08/13/2024] [Accepted: 08/23/2024] [Indexed: 09/01/2024] Open
Abstract
In the rapidly evolving healthcare landscape, artificial intelligence (AI), particularly the large language models (LLMs), like OpenAI's Chat Generative Pretrained Transformer (ChatGPT), has shown transformative potential in emergency medicine and critical care. This review article highlights the advancement and applications of ChatGPT, from diagnostic assistance to clinical documentation and patient communication, demonstrating its ability to perform comparably to human professionals in medical examinations. ChatGPT could assist clinical decision-making and medication selection in critical care, showcasing its potential to optimize patient care management. However, integrating LLMs into healthcare raises legal, ethical, and privacy concerns, including data protection and the necessity for informed consent. Finally, we addressed the challenges related to the accuracy of LLMs, such as the risk of providing incorrect medical advice. These concerns underscore the importance of ongoing research and regulation to ensure their ethical and practical use in healthcare.
Collapse
Affiliation(s)
- Haw Hwai
- Department of Emergency Medicine, National Taiwan University Hospital, National Taiwan University Medical College, Taipei, Taiwan.
| | - Yi-Ju Ho
- Department of Emergency Medicine, National Taiwan University Hospital, National Taiwan University Medical College, Taipei, Taiwan.
| | - Chih-Hung Wang
- Department of Emergency Medicine, National Taiwan University Hospital, National Taiwan University Medical College, Taipei, Taiwan.
| | - Chien-Hua Huang
- Department of Emergency Medicine, National Taiwan University Hospital, National Taiwan University Medical College, Taipei, Taiwan.
| |
Collapse
|
8
|
Wang X, Ye S, Feng J, Feng K, Yang H, Li H. Performance of ChatGPT on prehospital acute ischemic stroke and large vessel occlusion (LVO) stroke screening. Digit Health 2024; 10:20552076241297127. [PMID: 39507012 PMCID: PMC11539183 DOI: 10.1177/20552076241297127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 10/17/2024] [Indexed: 11/08/2024] Open
Abstract
BACKGROUND The management of acute ischemic stroke (AIS) is time-sensitive, yet prehospital delays remain prevalent. The application of large language models (LLMs) for medical text analysis may play a potential role in clinical decision support. We assess the performance of LLMs on prehospital AIS and large vessel occlusion (LVO) stroke screening. METHODS This retrospective study sourced cases from the electronic medical record database of the emergency department (ED) at Maoming People's Hospital, encompassing patients who presented to the ED between June and November 2023. We evaluate the diagnostic accuracy of GPT-3.5 and GPT-4 for the detection of AIS and LVO stroke by comparing the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and positive likelihood ratio and AUC of both LLMs. The neurological reasoning of LLMs was rated on a five-point Likert scale for factual correctness and the occurrence of errors. RESULT On 400 records from 400 patients (mean age, 70.0 years ± 12.5 [SD]; 273 male), GPT-4 outperformed GPT-3.5 in AIS screening (AUC 0.75 (0.65-0.84) vs 0.59 (0.50-0.69), P = 0.015) and LVO identification (AUC 0.71 (0.65-0.77) vs 0.60 (0.53-0.66), P < 0.001). GPT-4 achieved higher accuracy than GPT-3.5 in screening of AIS (89.3% [95% CI: 85.8, 91.9] vs 86.5% [95% CI: 82.8, 89.5]) and LVO stroke identification (67.0% [95% CI: 62.3%, 71.4%] vs 47.3% [95% CI: 42.4%, 52.2%]). In neurological reasoning, GPT-4 had higher Likert scale scores for factual correctness (4.24 vs 3.62), with a lower rate of error (6.8% vs 24.8%) than GPT-3.5 (all P < 0.001). CONCLUSIONS The result demonstrates that LLMs possess diagnostic capability in the prehospital identification of ischemic stroke, with the ability to exhibit neurologically informed reasoning processes. Notably, GPT-4 outperforms GPT-3.5 in the recognition of AIS and LVO stroke, achieving results comparable to prehospital scales. LLMs are supposed to become a promising supportive decision-making tool for EMS practitioners in screening prehospital stroke.
Collapse
Affiliation(s)
- Xinhao Wang
- Department of Neurology, Maoming People’s Hospital, Maoming, Guangdong, China
| | - Shisheng Ye
- Department of Neurology, Maoming People’s Hospital, Maoming, Guangdong, China
| | - Jinwen Feng
- Department of Neurology, Maoming People’s Hospital, Maoming, Guangdong, China
| | - Kaiyan Feng
- Department of Neurology, Maoming People’s Hospital, Maoming, Guangdong, China
| | - Heng Yang
- Department of Neurology, Maoming People’s Hospital, Maoming, Guangdong, China
| | - Hao Li
- Department of Neurology, Maoming People’s Hospital, Maoming, Guangdong, China
| |
Collapse
|