1
|
Bui N, Nguyen G, Nguyen N, Vo B, Vo L, Huynh T, Tang A, Tran VN, Huynh T, Nguyen HQ, Dinh M. Fine-tuning large language models for improved health communication in low-resource languages. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 263:108655. [PMID: 39987667 DOI: 10.1016/j.cmpb.2025.108655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 12/10/2024] [Accepted: 02/05/2025] [Indexed: 02/25/2025]
Abstract
BACKGROUND The reported study illustrates a methodology for compiling training datasets to fine-tune Large Language Models (LLMs) for healthcare information in Vietnamese, a low-resource language. The objective is to bridge the gap in medical information accessibility and enhance healthcare communication in developing countries by adapting LLMs to specific linguistic nuances and domain needs. METHOD The methodology involves selecting a base model, compiling a domain-specific dataset, and fine-tuning the model with this dataset. Three open-source models were selected. The dataset, comprising approximately 337,000 prompt-response pairs in Vietnamese, was compiled using existing datasets, data crawled from Vietnamese medical online forums, and distilled from Vietnamese medical textbooks. The three models were fine-tuned using the Low-Rank adaptation (LoRA) and Quantized Low-Rank adaptation (QLoRA) techniques. Models' performances were evaluated using BertScore score, Rouge-L score, and the "LLM-as-a-Judge" method. RESULTS The fine-tuned models showed enhancements in performance over their base versions across evaluation metrics in BertScore score, Rouge-L score and "LLM-as-a-Judge" method, confirming the effectiveness of the fine-tuning process. This study details the process of fine-tuning open-source LLMs for health information inquiries in Vietnamese, demonstrating its potential to improve healthcare communication in low-resource languages. Deploying the fine-tuned LLM on-premise enhances data privacy and security. However, the significant computing power and costs required pose challenges, especially for organizations in developing countries. CONCLUSION This case study highlights the unique challenges faced by developing countries using low-resource languages. Initiatives are needed to emphasize efforts to bridge healthcare gaps in underserved areas and contribute to global health equity.
Collapse
Affiliation(s)
- Nhat Bui
- School of Science, Engineering and Technology, RMIT University, Ho Chi Minh City, Vietnam
| | - Giang Nguyen
- School of Science, Engineering and Technology, RMIT University, Ho Chi Minh City, Vietnam
| | - Nguyen Nguyen
- School of Science, Engineering and Technology, RMIT University, Ho Chi Minh City, Vietnam
| | - Bao Vo
- School of Science, Engineering and Technology, RMIT University, Ho Chi Minh City, Vietnam
| | - Luan Vo
- School of Science, Engineering and Technology, RMIT University, Ho Chi Minh City, Vietnam
| | - Tom Huynh
- School of Science, Engineering and Technology, RMIT University, Ho Chi Minh City, Vietnam
| | - Arthur Tang
- School of Science, Engineering and Technology, RMIT University, Ho Chi Minh City, Vietnam.
| | - Van Nhiem Tran
- AI Research Center, Hon Hai Research Institute, Taipei 114699, Taiwan
| | - Tuyen Huynh
- Oxford University Clinical Research Unit (OUCRU), Ho Chi Minh City, Vietnam
| | - Huy Quang Nguyen
- Oxford University Clinical Research Unit (OUCRU), Ho Chi Minh City, Vietnam
| | - Minh Dinh
- School of Science, Engineering and Technology, RMIT University, Ho Chi Minh City, Vietnam
| |
Collapse
|
2
|
Gilad-Bachrach R, Obolski U. Guidance on reporting the use of natural language processing methods. Clin Microbiol Infect 2025; 31:677-679. [PMID: 39725081 DOI: 10.1016/j.cmi.2024.12.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 12/08/2024] [Accepted: 12/18/2024] [Indexed: 12/28/2024]
Affiliation(s)
- Ran Gilad-Bachrach
- Department of Biomedical Engineering, Tel-Aviv University, Tel-Aviv, Israel; Edmond J. Safra Center for Bioinformatics, Tel Aviv University, Tel-Aviv, Israel
| | - Uri Obolski
- Edmond J. Safra Center for Bioinformatics, Tel Aviv University, Tel-Aviv, Israel; Department of Epidemiology and Preventive Medicine, School of Public Health, Faculty of Medical and Health Sciences, Tel-Aviv University, Tel-Aviv, Israel.
| |
Collapse
|
3
|
Vrdoljak J, Boban Z, Vilović M, Kumrić M, Božić J. A Review of Large Language Models in Medical Education, Clinical Decision Support, and Healthcare Administration. Healthcare (Basel) 2025; 13:603. [PMID: 40150453 PMCID: PMC11942098 DOI: 10.3390/healthcare13060603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 02/07/2025] [Accepted: 03/06/2025] [Indexed: 03/29/2025] Open
Abstract
Background/Objectives: Large language models (LLMs) have shown significant potential to transform various aspects of healthcare. This review aims to explore the current applications, challenges, and future prospects of LLMs in medical education, clinical decision support, and healthcare administration. Methods: A comprehensive literature review was conducted, examining the applications of LLMs across the three key domains. The analysis included their performance, challenges, and advancements, with a focus on techniques like retrieval-augmented generation (RAG). Results: In medical education, LLMs show promise as virtual patients, personalized tutors, and tools for generating study materials. Some models have outperformed junior trainees in specific medical knowledge assessments. Concerning clinical decision support, LLMs exhibit potential in diagnostic assistance, treatment recommendations, and medical knowledge retrieval, though performance varies across specialties and tasks. In healthcare administration, LLMs effectively automate tasks like clinical note summarization, data extraction, and report generation, potentially reducing administrative burdens on healthcare professionals. Despite their promise, challenges persist, including hallucination mitigation, addressing biases, and ensuring patient privacy and data security. Conclusions: LLMs have transformative potential in medicine but require careful integration into healthcare settings. Ethical considerations, regulatory challenges, and interdisciplinary collaboration between AI developers and healthcare professionals are essential. Future advancements in LLM performance and reliability through techniques such as RAG, fine-tuning, and reinforcement learning will be critical to ensuring patient safety and improving healthcare delivery.
Collapse
Affiliation(s)
- Josip Vrdoljak
- Department for Pathophysiology, School of Medicine, University of Split, 21000 Split, Croatia; (J.V.); (M.V.); (M.K.)
| | - Zvonimir Boban
- Department for Medical Physics, School of Medicine, University of Split, 21000 Split, Croatia;
| | - Marino Vilović
- Department for Pathophysiology, School of Medicine, University of Split, 21000 Split, Croatia; (J.V.); (M.V.); (M.K.)
| | - Marko Kumrić
- Department for Pathophysiology, School of Medicine, University of Split, 21000 Split, Croatia; (J.V.); (M.V.); (M.K.)
| | - Joško Božić
- Department for Pathophysiology, School of Medicine, University of Split, 21000 Split, Croatia; (J.V.); (M.V.); (M.K.)
| |
Collapse
|
4
|
Ono D, Sekiya H, Maier AR, Murray ME, Koga S, Dickson DW. Parkinsonism in Alzheimer's disease without Lewy bodies in association with nigral neuron loss: A data-driven clinicopathologic study. Alzheimers Dement 2025; 21:e14628. [PMID: 40042515 PMCID: PMC11881629 DOI: 10.1002/alz.14628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 12/23/2024] [Accepted: 01/21/2025] [Indexed: 05/13/2025]
Abstract
INTRODUCTION Parkinsonism in patients with Alzheimer's disease (AD) is often attributed to Lewy-related pathology, given its high comorbidity. In the era of anti-amyloid therapy, recognizing parkinsonism caused by AD pathology is needed to optimize the treatment. METHODS This study aimed to quantitatively characterize parkinsonism and nigral neuropathology in AD without Lewy bodies (LB). Nigral neurons were counted automatically. Fine-tuned ChatGPT collected structured clinical data. RESULTS Among 635 AD patients without LB, 62 (9.7%) presented parkinsonism, which correlated with reduced nigral neuron density (p < 0.01). Tau burden did not explain the nigral neuronal loss. TAR DNA-binding protein 43 (TDP-43) pathology correlated with reduced nigral pigmented neuron density (p = 0.03). DISCUSSION Our findings suggest that parkinsonism in AD without LB is related to nigral neuronal loss in association with TDP-43 pathology. Recognition of parkinsonism in AD without LB is crucial for appropriate therapy. HIGHLIGHTS One in 10 Alzheimer's disease (AD) patients without Lewy bodies had parkinsonism. Parkinsonism in AD was correlated with reduced nigral neuron density. TAR DNA-binding protein 43 pathology was associated with nigral degeneration in AD. AD should be included in the differential diagnosis of dementia with parkinsonism.
Collapse
Affiliation(s)
- Daisuke Ono
- Department of NeuroscienceMayo ClinicJacksonvilleFloridaUSA
| | - Hiroaki Sekiya
- Department of NeuroscienceMayo ClinicJacksonvilleFloridaUSA
| | | | - Melissa E. Murray
- Department of NeuroscienceMayo ClinicJacksonvilleFloridaUSA
- Department of Laboratory Medicine and PathologyMayo ClinicJacksonvilleFloridaUSA
| | - Shunsuke Koga
- Department of NeuroscienceMayo ClinicJacksonvilleFloridaUSA
- Department of Pathology and Laboratory MedicineHospital of the University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Dennis W. Dickson
- Department of NeuroscienceMayo ClinicJacksonvilleFloridaUSA
- Department of Laboratory Medicine and PathologyMayo ClinicJacksonvilleFloridaUSA
| |
Collapse
|
5
|
Shaheen A, Afflitto GG, Swaminathan SS. ChatGPT-Assisted Classification of Postoperative Bleeding Following Microinvasive Glaucoma Surgery Using Electronic Health Record Data. OPHTHALMOLOGY SCIENCE 2025; 5:100602. [PMID: 39380881 PMCID: PMC11459071 DOI: 10.1016/j.xops.2024.100602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 07/01/2024] [Accepted: 08/15/2024] [Indexed: 10/10/2024]
Abstract
Purpose To evaluate the performance of a large language model (LLM) in classifying electronic health record (EHR) text, and to use this classification to evaluate the type and resolution of hemorrhagic events (HEs) after microinvasive glaucoma surgery (MIGS). Design Retrospective cohort study. Participants Eyes from the Bascom Palmer Glaucoma Repository. Methods Eyes that underwent MIGS between July 1, 2014 and February 1, 2022 were analyzed. Chat Generative Pre-trained Transformer (ChatGPT) was used to classify deidentified EHR anterior chamber examination text into HE categories (no hyphema, microhyphema, clot, and hyphema). Agreement between classifications by ChatGPT and a glaucoma specialist was evaluated using Cohen's Kappa and precision-recall (PR) curve. Time to resolution of HEs was assessed using Cox proportional-hazards models. Goniotomy HE resolution was evaluated by degree of angle treatment (90°-179°, 180°-269°, 270°-360°). Logistic regression was used to identify HE risk factors. Main Outcome Measures Accuracy of ChatGPT HE classification and incidence and resolution of HEs. Results The study included 434 goniotomy eyes (368 patients) and 528 Schlemm's canal stent (SCS) eyes (390 patients). Chat Generative Pre-trained Transformer facilitated excellent HE classification (Cohen's kappa 0.93, area under PR curve 0.968). Using ChatGPT classifications, at postoperative day 1, HEs occurred in 67.8% of goniotomy and 25.2% of SCS eyes (P < 0.001). The 270° to 360° goniotomy group had the highest HE rate (84.0%, P < 0.001). At postoperative week 1, HEs were observed in 43.4% and 11.3% of goniotomy and SCS eyes, respectively (P < 0.001). By postoperative month 1, HE rates were 13.3% and 1.3% among goniotomy and SCS eyes, respectively (P < 0.001). Time to HE resolution differed between the goniotomy angle groups (log-rank P = 0.034); median time to resolution was 10, 10, and 15 days for the 90° to 179°, 180° to 269°, and 270° to 360° groups, respectively. Risk factor analysis demonstrated greater goniotomy angle was the only significant predictor of HEs (odds ratio for 270°-360°: 4.08, P < 0.001). Conclusions Large language models can be effectively used to classify longitudinal EHR free-text examination data with high accuracy, highlighting a promising direction for future LLM-assisted research and clinical decision support. Hemorrhagic events are relatively common self-resolving complications that occur more often in goniotomy cases and with larger goniotomy treatments. Time to HE resolution differs significantly between goniotomy groups. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Abdulla Shaheen
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida
| | - Gabriele Gallo Afflitto
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida
- Ophthalmology Unit, Department of Experimental Medicine, Università di Roma “Tor Vergata,” Rome, Italy
| | - Swarup S. Swaminathan
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida
| |
Collapse
|
6
|
Thandla SR, Armstrong GQ, Menon A, Shah A, Gueye DL, Harb C, Hernandez E, Iyer Y, Hotchner AR, Modi R, Mudigonda A, Prokos MA, Rao TM, Thomas OR, Beltran CA, Guerrieri T, LeBlanc S, Moorthy S, Yacoub SG, Gardner JE, Greenberg BM, Hubal A, Lapina YP, Moran J, O'Brien JP, Winnicki AC, Yoka C, Zhang J, Zimmerman PA. Comparing new tools of artificial intelligence to the authentic intelligence of our global health students. BioData Min 2024; 17:58. [PMID: 39696442 PMCID: PMC11656723 DOI: 10.1186/s13040-024-00408-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Accepted: 11/19/2024] [Indexed: 12/20/2024] Open
Abstract
INTRODUCTION The transformative feature of Artificial Intelligence (AI) is the massive capacity for interpreting and transforming unstructured data into a coherent and meaningful context. In general, the potential that AI will alter traditional approaches to student research and its evaluation appears to be significant. With regard to research in global health, it is important for students and research experts to assess strengths and limitations of GenAI within this space. Thus, the goal of our research was to evaluate the information literacy of GenAI compared to expectations that graduate students meet in writing research papers. METHODS After completing the course, Fundamentals of Global Health (INTH 401) at Case Western Reserve University (CWRU), Graduate students who successfully completed their required research paper were recruited to compare their original papers with a paper they generated by ChatGPT-4o using the original assignment prompt. Students also completed a Google Forms survey to evaluate different sections of the AI-generated paper (e.g., Adherence to Introduction guidelines, Presentation of three perspectives, Conclusion) and their original papers and their overall satisfaction with the AI work. The original student to ChatGPT-4o comparison also enabled evaluation of narrative elements and references. RESULTS Of the 54 students who completed the required research paper, 28 (51.8%) agreed to collaborate in the comparison project. A summary of the survey responses suggested that students evaluated the AI-generated paper as inferior or similar to their own paper (overall satisfaction average = 2.39 (1.61-3.17); Likert scale: 1 to 5 with lower scores indicating inferiority). Evaluating the average individual student responses for 5 Likert item queries showed that 17 scores were < 2.9; 7 scores were between 3.0 to 3.9; 4 scores were ≥ 4.0, consistent with inferiority of the AI-generated paper. Evaluation of reference selection by ChatGPT-4o (n = 729 total references) showed that 54% (n = 396) were authentic, 46% (n = 333) did not exist. Of the authentic references, 26.5% (105/396) were relevant to the paper narrative; 14.4% of the 729 total references. DISCUSSION Our findings reveal strengths and limitations on the potential of AI tools to assist in understanding the complexities of global health topics. Strengths mentioned by students included the ability of ChatGPT-4o to produce content very quickly and to suggest topics that they had not considered in the 3-perspective sections of their papers. Consistently presenting up-to-date facts and references, as well as further examining or summarizing the complexities of global health topics, appears to be a current limitation of ChatGPT-4o. Because ChatGPT-4o generated references from highly credible biomedical research journals that did not exist, our findings conclude that ChatGPT-4o failed an important component in using information effectively. Moreover, misrepresenting trusted sources of public health information is highly concerning, particularly given recent experiences from the COVID-19 pandemic and more recently in reporting on the impact of, and response to natural disasters. This is a significant limitation of GenAI's ability to meet information literacy standards expected of graduate students.
Collapse
Affiliation(s)
- Shilpa R Thandla
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Grace Q Armstrong
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Adil Menon
- School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Aashna Shah
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - David L Gueye
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Clara Harb
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
- Bioethics Department, Case Western Reserve University, Cleveland, OH, USA
| | - Estefania Hernandez
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Yasaswini Iyer
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Abigail R Hotchner
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Riddhi Modi
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Anusha Mudigonda
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Maria A Prokos
- Nutrition Department, Case Western Reserve University, Cleveland, OH, USA
| | - Tharun M Rao
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Olivia R Thomas
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Camilo A Beltran
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Taylor Guerrieri
- Anthropology Department, Case Western Reserve University, Cleveland, OH, USA
| | - Sydney LeBlanc
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Skanda Moorthy
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Sara G Yacoub
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Jacob E Gardner
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
- School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | | | - Alyssa Hubal
- Pathology Department, Case Western Reserve University, Cleveland, OH, USA
- Division of Infectious Diseases and HIV Medicine, Cleveland, OH, USA
| | - Yuliana P Lapina
- Anthropology Department, Case Western Reserve University, Cleveland, OH, USA
| | - Jacqueline Moran
- Anthropology Department, Case Western Reserve University, Cleveland, OH, USA
| | - Joseph P O'Brien
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
- School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Anna C Winnicki
- Pathology Department, Case Western Reserve University, Cleveland, OH, USA
- Center for Global Health and Diseases, Cleveland, OH, USA
| | - Christina Yoka
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
- Department of Public Health, Cleveland, OH, USA
| | - Junwei Zhang
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Peter A Zimmerman
- Master of Public Health Program, Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA.
- Pathology Department, Case Western Reserve University, Cleveland, OH, USA.
- Center for Global Health and Diseases, Cleveland, OH, USA.
| |
Collapse
|
7
|
Kwok KO, Huynh T, Wei WI, Wong SYS, Riley S, Tang A. Utilizing large language models in infectious disease transmission modelling for public health preparedness. Comput Struct Biotechnol J 2024; 23:3254-3257. [PMID: 39286528 PMCID: PMC11402906 DOI: 10.1016/j.csbj.2024.08.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 08/07/2024] [Accepted: 08/07/2024] [Indexed: 09/19/2024] Open
Abstract
Introduction OpenAI's ChatGPT, a Large Language Model (LLM), is a powerful tool across domains, designed for text and code generation, fostering collaboration, especially in public health. Investigating the role of this advanced LLM chatbot in assisting public health practitioners in shaping disease transmission models to inform infection control strategies, marks a new era in infectious disease epidemiology research. This study used a case study to illustrate how ChatGPT collaborates with a public health practitioner in co-designing a mathematical transmission model. Methods Using natural conversation, the practitioner initiated a dialogue involving an iterative process of code generation, refinement, and debugging with ChatGPT to develop a model to fit 10 days of prevalence data to estimate two key epidemiological parameters: i) basic reproductive number (Ro) and ii) final epidemic size. Verification and validation processes are conducted to ensure the accuracy and functionality of the final model. Results ChatGPT developed a validated transmission model which replicated the epidemic curve and gave estimates of Ro of 4.19 (95 % CI: 4.13- 4.26) and a final epidemic size of 98.3 % of the population within 60 days. It highlighted the advantages of using maximum likelihood estimation with Poisson distribution over least squares method. Conclusion Integration of LLM in medical research accelerates model development, reducing technical barriers for health practitioners, democratizing access to advanced modeling and potentially enhancing pandemic preparedness globally, particularly in resource-constrained populations.
Collapse
Affiliation(s)
- Kin On Kwok
- JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong Special Administrative Region of China
- Hong Kong Institute of Asia-Pacific Studies, The Chinese University of Hong Kong, Hong Kong Special Administrative Region of China
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
| | - Tom Huynh
- School of Science, Engineering and Technology, RMIT University, Viet Nam
| | - Wan In Wei
- JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong Special Administrative Region of China
| | - Samuel Y S Wong
- JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong Special Administrative Region of China
| | - Steven Riley
- MRC Centre for Global Infectious Disease Analysis and Jameel Institute, Imperial College London, London, United Kingdom
- School of Public Health, Imperial College London, Norfolk Place, London W2 1PG, United Kingdom
| | - Arthur Tang
- School of Science, Engineering and Technology, RMIT University, Viet Nam
| |
Collapse
|
8
|
Liu W, Kan H, Jiang Y, Geng Y, Nie Y, Yang M. MED-ChatGPT CoPilot: a ChatGPT medical assistant for case mining and adjunctive therapy. Front Med (Lausanne) 2024; 11:1460553. [PMID: 39478827 PMCID: PMC11521861 DOI: 10.3389/fmed.2024.1460553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Accepted: 10/03/2024] [Indexed: 11/02/2024] Open
Abstract
Background The large-scale language model, GPT-4-1106-preview, supports text of up to 128 k characters, which has enhanced the capability of processing vast quantities of text. This model can perform efficient and accurate text data mining without the need for retraining, aided by prompt engineering. Method The research approach includes prompt engineering and text vectorization processing. In this study, prompt engineering is applied to assist ChatGPT in text mining. Subsequently, the mined results are vectorized and incorporated into a local knowledge base. After cleansing 306 medical papers, data extraction was performed using ChatGPT. Following a validation and filtering process, 241 medical case data entries were obtained, leading to the construction of a local medical knowledge base. Additionally, drawing upon the Langchain framework and utilizing the local knowledge base in conjunction with ChatGPT, we successfully developed a fast and reliable chatbot. This chatbot is capable of providing recommended diagnostic and treatment information for various diseases. Results The performance of the designed ChatGPT model, which was enhanced by data from the local knowledge base, exceeded that of the original model by 7.90% on a set of medical questions. Conclusion ChatGPT, assisted by prompt engineering, demonstrates effective data mining capabilities for large-scale medical texts. In the future, we plan to incorporate a richer array of medical case data, expand the scale of the knowledge base, and enhance ChatGPT's performance in the medical field.
Collapse
Affiliation(s)
- Wei Liu
- School of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Hongxing Kan
- School of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Yanfei Jiang
- School of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
| | - Yingbao Geng
- School of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
| | - Yiqi Nie
- School of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
- Anhui Computer Application Research Institute of Chinese Medicine, China Academy of Chinese Medical Sciences, Hefei, Anhui, China
| | - Mingguang Yang
- School of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, Anhui, China
| |
Collapse
|
9
|
Finch L, Broach V, Feinberg J, Al-Niaimi A, Abu-Rustum NR, Zhou Q, Iasonos A, Chi DS. ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? - A Memorial Sloan Kettering Cancer Center Team Ovary study. Gynecol Oncol 2024; 189:75-79. [PMID: 39042956 PMCID: PMC11402584 DOI: 10.1016/j.ygyno.2024.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/08/2024] [Accepted: 07/15/2024] [Indexed: 07/25/2024]
Abstract
OBJECTIVES We evaluated the performance of a chatbot compared to the National Comprehensive Cancer Network (NCCN) Guidelines for the management of ovarian cancer. METHODS Using NCCN Guidelines, we generated 10 questions and answers regarding management of ovarian cancer at a single point in time. Questions were thematically divided into risk factors, surgical management, medical management, and surveillance. We asked ChatGPT (GPT-4) to provide responses without prompting (unprompted GPT) and with prompt engineering (prompted GPT). Responses were blinded and evaluated for accuracy and completeness by 5 gynecologic oncologists. A score of 0 was defined as inaccurate, 1 as accurate and incomplete, and 2 as accurate and complete. Evaluations were compared among NCCN, unprompted GPT, and prompted GPT answers. RESULTS Overall, 48% of responses from NCCN, 64% from unprompted GPT, and 66% from prompted GPT were accurate and complete. The percentage of accurate but incomplete responses was higher for NCCN vs GPT-4. The percentage of accurate and complete scores for questions regarding risk factors, surgical management, and surveillance was higher for GPT-4 vs NCCN; however, for questions regarding medical management, the percentage was lower for GPT-4 vs NCCN. Overall, 14% of responses from unprompted GPT, 12% from prompted GPT, and 10% from NCCN were inaccurate. CONCLUSIONS GPT-4 provided accurate and complete responses at a single point in time to a limited set of questions regarding ovarian cancer, with best performance in areas of risk factors, surgical management, and surveillance. Occasional inaccuracies, however, should limit unsupervised use of chatbots at this time.
Collapse
Affiliation(s)
- Lindsey Finch
- Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Vance Broach
- Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA
| | - Jacqueline Feinberg
- Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA
| | - Ahmed Al-Niaimi
- Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA
| | - Nadeem R Abu-Rustum
- Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA
| | - Qin Zhou
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Alexia Iasonos
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dennis S Chi
- Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA.
| |
Collapse
|
10
|
Woo B, Huynh T, Tang A, Bui N, Nguyen G, Tam W. Transforming nursing with large language models: from concept to practice. Eur J Cardiovasc Nurs 2024; 23:549-552. [PMID: 38178303 DOI: 10.1093/eurjcn/zvad120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 11/19/2023] [Indexed: 01/06/2024]
Abstract
Large language models (LLMs) such as ChatGPT have emerged as potential game-changers in nursing, aiding in patient education, diagnostic assistance, treatment recommendations, and administrative task efficiency. While these advancements signal promising strides in healthcare, integrated LLMs are not without challenges, particularly artificial intelligence hallucination and data privacy concerns. Methodologies such as prompt engineering, temperature adjustments, model fine-tuning, and local deployment are proposed to refine the accuracy of LLMs and ensure data security. While LLMs offer transformative potential, it is imperative to acknowledge that they cannot substitute the intricate expertise of human professionals in the clinical field, advocating for a synergistic approach in patient care.
Collapse
Affiliation(s)
- Brigitte Woo
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Tom Huynh
- School of Science, Engineering and Technology, RMIT University, 702 Nguyen Van Linh Blvd., District 7, Ho Chin Minh 756000, Ho Chin Minh City, Vietnam
| | - Arthur Tang
- School of Science, Engineering and Technology, RMIT University, 702 Nguyen Van Linh Blvd., District 7, Ho Chin Minh 756000, Ho Chin Minh City, Vietnam
| | - Nhat Bui
- School of Science, Engineering and Technology, RMIT University, 702 Nguyen Van Linh Blvd., District 7, Ho Chin Minh 756000, Ho Chin Minh City, Vietnam
| | - Giang Nguyen
- School of Science, Engineering and Technology, RMIT University, 702 Nguyen Van Linh Blvd., District 7, Ho Chin Minh 756000, Ho Chin Minh City, Vietnam
| | - Wilson Tam
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| |
Collapse
|
11
|
McMurry AJ, Zipursky AR, Geva A, Olson KL, Jones JR, Ignatov V, Miller TA, Mandl KD. Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study. J Med Internet Res 2024; 26:e53367. [PMID: 38573752 PMCID: PMC11027052 DOI: 10.2196/53367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 11/30/2023] [Accepted: 02/27/2024] [Indexed: 04/05/2024] Open
Abstract
BACKGROUND Real-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records. OBJECTIVE This study sought to validate and test an artificial intelligence (AI)-based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients. We specifically study patients presenting to the emergency department (ED) who can be sentinel cases in an outbreak. METHODS Subjects in this retrospective cohort study are patients who are 21 years of age and younger, who presented to a pediatric ED at a large academic children's hospital between March 1, 2020, and May 31, 2022. The ED notes for all patients were processed with an NLP pipeline tuned to detect the mention of 11 COVID-19 symptoms based on Centers for Disease Control and Prevention (CDC) criteria. For a gold standard, 3 subject matter experts labeled 226 ED notes and had strong agreement (F1-score=0.986; positive predictive value [PPV]=0.972; and sensitivity=1.0). F1-score, PPV, and sensitivity were used to compare the performance of both NLP and the International Classification of Diseases, 10th Revision (ICD-10) coding to the gold standard chart review. As a formative use case, variations in symptom patterns were measured across SARS-CoV-2 variant eras. RESULTS There were 85,678 ED encounters during the study period, including 4% (n=3420) with patients with COVID-19. NLP was more accurate at identifying encounters with patients that had any of the COVID-19 symptoms (F1-score=0.796) than ICD-10 codes (F1-score =0.451). NLP accuracy was higher for positive symptoms (sensitivity=0.930) than ICD-10 (sensitivity=0.300). However, ICD-10 accuracy was higher for negative symptoms (specificity=0.994) than NLP (specificity=0.917). Congestion or runny nose showed the highest accuracy difference (NLP: F1-score=0.828 and ICD-10: F1-score=0.042). For encounters with patients with COVID-19, prevalence estimates of each NLP symptom differed across variant eras. Patients with COVID-19 were more likely to have each NLP symptom detected than patients without this disease. Effect sizes (odds ratios) varied across pandemic eras. CONCLUSIONS This study establishes the value of AI-based NLP as a highly effective tool for real-time COVID-19 symptom detection in pediatric patients, outperforming traditional ICD-10 methods. It also reveals the evolving nature of symptom prevalence across different virus variants, underscoring the need for dynamic, technology-driven approaches in infectious disease surveillance.
Collapse
Affiliation(s)
- Andrew J McMurry
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| | - Amy R Zipursky
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Division of Pediatric Emergency Medicine, Department of Pediatrics, The Hospital for Sick Children, Toronto, ON, Canada
| | - Alon Geva
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, MA, United States
- Department of Anaesthesia, Harvard Medical School, Boston, MA, United States
| | - Karen L Olson
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| | - James R Jones
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Vladimir Ignatov
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA, United States
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
12
|
Cheng J. Applications of Large Language Models in Pathology. Bioengineering (Basel) 2024; 11:342. [PMID: 38671764 PMCID: PMC11047860 DOI: 10.3390/bioengineering11040342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 03/27/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024] Open
Abstract
Large language models (LLMs) are transformer-based neural networks that can provide human-like responses to questions and instructions. LLMs can generate educational material, summarize text, extract structured data from free text, create reports, write programs, and potentially assist in case sign-out. LLMs combined with vision models can assist in interpreting histopathology images. LLMs have immense potential in transforming pathology practice and education, but these models are not infallible, so any artificial intelligence generated content must be verified with reputable sources. Caution must be exercised on how these models are integrated into clinical practice, as these models can produce hallucinations and incorrect results, and an over-reliance on artificial intelligence may lead to de-skilling and automation bias. This review paper provides a brief history of LLMs and highlights several use cases for LLMs in the field of pathology.
Collapse
Affiliation(s)
- Jerome Cheng
- Department of Pathology, University of Michigan, Ann Arbor, MI 48105, USA
| |
Collapse
|
13
|
Wang L, Chen X, Deng X, Wen H, You M, Liu W, Li Q, Li J. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. NPJ Digit Med 2024; 7:41. [PMID: 38378899 PMCID: PMC10879172 DOI: 10.1038/s41746-024-01029-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 02/05/2024] [Indexed: 02/22/2024] Open
Abstract
The use of large language models (LLMs) in clinical medicine is currently thriving. Effectively transferring LLMs' pertinent theoretical knowledge from computer science to their application in clinical medicine is crucial. Prompt engineering has shown potential as an effective method in this regard. To explore the application of prompt engineering in LLMs and to examine the reliability of LLMs, different styles of prompts were designed and used to ask different LLMs about their agreement with the American Academy of Orthopedic Surgeons (AAOS) osteoarthritis (OA) evidence-based guidelines. Each question was asked 5 times. We compared the consistency of the findings with guidelines across different evidence levels for different prompts and assessed the reliability of different prompts by asking the same question 5 times. gpt-4-Web with ROT prompting had the highest overall consistency (62.9%) and a significant performance for strong recommendations, with a total consistency of 77.5%. The reliability of the different LLMs for different prompts was not stable (Fleiss kappa ranged from -0.002 to 0.984). This study revealed that different prompts had variable effects across various models, and the gpt-4-Web with ROT prompt was the most consistent. An appropriate prompt could improve the accuracy of responses to professional medical questions.
Collapse
Affiliation(s)
- Li Wang
- Sports Medicine Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Orthopedics and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China
| | - Xi Chen
- Sports Medicine Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Orthopedics and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China
| | - XiangWen Deng
- Shenzhen International Graduate School, Tsinghua University, Beijing, China
| | - Hao Wen
- Shenzhen International Graduate School, Tsinghua University, Beijing, China
| | - MingKe You
- Sports Medicine Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Orthopedics and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China
| | - WeiZhi Liu
- Sports Medicine Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Orthopedics and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China
| | - Qi Li
- Sports Medicine Center, West China Hospital, Sichuan University, Chengdu, China.
- Department of Orthopedics and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China.
| | - Jian Li
- Sports Medicine Center, West China Hospital, Sichuan University, Chengdu, China.
- Department of Orthopedics and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
14
|
Kwok KO, Wei WI, Mcneil EB, Tang A, Tang JWT, Wong SYS, Yeoh EK. Comparative analysis of symptom profile and risk of death associated with infection by SARS-CoV-2 and its variants in Hong Kong. J Med Virol 2024; 96:e29326. [PMID: 38345166 DOI: 10.1002/jmv.29326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 11/19/2023] [Accepted: 12/07/2023] [Indexed: 02/15/2024]
Abstract
The recurrent multiwave nature of coronavirus disease 2019 (COVID-19) necessitates updating its symptomatology. We characterize the effect of variants on symptom presentation, identify the symptoms predictive and protective of death, and quantify the effect of vaccination on symptom development. With the COVID-19 cases reported up to August 25, 2022 in Hong Kong, an iterative multitier text-matching algorithm was developed to identify symptoms from free text. Multivariate regression was used to measure associations between variants, symptom development, death, and vaccination status. A least absolute shrinkage and selection operator technique was used to identify a parsimonious set of symptoms jointly associated with death. Overall, 70.9% (54 450/76 762) of cases were symptomatic with 102 symptoms identified. Intrinsically, the wild-type and delta variant caused similar symptoms among unvaccinated symptomatic cases, whereas the wild-type and omicron BA.2 subvariant had heterogeneous patterns, with seven symptoms (fatigue, fever, chest pain, runny nose, sputum production, nausea/vomiting, and sore throat) more frequent in the BA.2 cohort. With ≥2 vaccine doses, BA.2 was more likely than delta to cause fever among symptomatic cases. Fever, blocked nose, pneumonia, and shortness of breath remained jointly predictive of death among unvaccinated symptomatic elderly in the wild-type-to-omicron transition. Number of vaccine doses required for reducing occurrence varied by symptoms. We substantiate that omicron has a different clinical presentation compared to previous variants. Syndromic surveillance can be bettered with reduced reliance on symptom-based case identification, increased weighing on symptoms predictive of death in outcome prediction, individual-based risk assessment in care homes, and incorporating free-text symptom reporting.
Collapse
Affiliation(s)
- Kin On Kwok
- JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
- Stanley Ho Centre for Emerging Infectious Diseases, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
- Hong Kong Institute of Asia-Pacific Studies, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Wan In Wei
- JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Edward B Mcneil
- JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Arthur Tang
- School of Science, Engineering and Technology, RMIT University, Ho Chi Minh City, Vietnam
| | - Julian W-T Tang
- Department of Respiratory Sciences, University of Leicester, Leicester, United Kingdom
- Department of Clinical Microbiology, Leicester Royal Infirmary, Leicester, United Kingdom
| | - Samuel Y S Wong
- JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
| | - Eng Kiong Yeoh
- Centre for Health Systems and Policy Research, The Chinese University of Hong Kong, Hong Kong Special Administrative Region, China
| |
Collapse
|