1
|
Ganzinger M, Kunz N, Fuchs P, Lyu CK, Loos M, Dugas M, Pausch TM. Automated generation of discharge summaries: leveraging large language models with clinical data. Sci Rep 2025; 15:16466. [PMID: 40355506 PMCID: PMC12069548 DOI: 10.1038/s41598-025-01618-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Accepted: 05/07/2025] [Indexed: 05/14/2025] Open
Abstract
This study explores the use of open-source large language models (LLMs) to automate generation of German discharge summaries from structured clinical data. The structured data used to produce AI-generated summaries were manually extracted from electronic health records (EHRs) by a trained medical professional. By leveraging structured documentation collected for research and quality management, the goal is to assist physicians with editable draft summaries. After de-identifying 25 patient datasets, we optimized the output of the LLaMA3 model through prompt engineering and evaluated it using error analysis, as well as quantitative and qualitative metrics. The LLM-generated summaries were rated by physicians on comprehensiveness, conciseness, correctness, and fluency. Key results include an error rate of 2.84 mistakes per summary, and low-to-moderate alignment between generated and physician-written summaries (ROUGE-1: 0.25, BERTScore: 0.64). Medical professionals rated the summaries 3.72 ± 0.89 for comprehensiveness and 3.88 ± 0.97 for factual correctness on a 5-point Likert-scale; however, only 60% rated the comprehensiveness as good (4 or 5 out of 5). Despite overall informativeness, essential details-such as patient history, lifestyle factors, and intraoperative findings-were frequently omitted, reflecting gaps in summary completeness. While the LLaMA3 model captured much of the clinical information, complex cases and temporal reasoning presented challenges, leading to factual inaccuracies, such as incorrect age calculations. Limitations include a small dataset size, missing structured data elements, and the model's limited proficiency with German medical terminology, highlighting the need for large, more complete datasets and potential model fine-tuning. In conclusion, this work provides a set of real-world methods, findings, experiences, insights, and descriptive results for a focused use case that may be useful to guide future work in the LLM generation of discharge summaries, perhaps especially for those working with German and possibly other non-English content.
Collapse
Affiliation(s)
- Matthias Ganzinger
- Institute of Medical Informatics, Heidelberg University, Heidelberg, Germany.
| | - Nicola Kunz
- Institute of Medical Informatics, Heidelberg University, Heidelberg, Germany
| | - Pascal Fuchs
- Department of General, Visceral, and Transplantation Surgery, Heidelberg University Hospital, Heidelberg, Germany
| | - Cornelia K Lyu
- Department of General, Visceral, and Transplantation Surgery, Heidelberg University Hospital, Heidelberg, Germany
| | - Martin Loos
- Department of General, Visceral, and Transplantation Surgery, Heidelberg University Hospital, Heidelberg, Germany
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University, Heidelberg, Germany
| | - Thomas M Pausch
- Institute of Medical Informatics, Heidelberg University, Heidelberg, Germany
- Department of General, Visceral, and Transplantation Surgery, Heidelberg University Hospital, Heidelberg, Germany
| |
Collapse
|
2
|
Habs M, Knecht S, Schmidt-Wilcke T. Using artificial intelligence (AI) for form and content checks of medical reports: Proofreading by ChatGPT4.0 in a neurology department. ZEITSCHRIFT FUR EVIDENZ, FORTBILDUNG UND QUALITAT IM GESUNDHEITSWESEN 2025:S1865-9217(25)00079-0. [PMID: 40107951 DOI: 10.1016/j.zefq.2025.02.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 02/10/2025] [Accepted: 02/14/2025] [Indexed: 03/22/2025]
Abstract
INTRODUCTION Medical reports contain critical information and require concise language, yet often display errors despite advances in digital tools. This study compared the effectiveness of ChatGPT 4.0 in reporting orthographic, grammatical, and content errors in German neurology reports to a human expert. MATERIALS AND METHODS Ten neurology reports were embedded with ten linguistic errors each, including typographical and grammatical mistakes, and one significant content error. The reports were reviewed by ChatGPT 4.0 using three prompts: (1) check the text for spelling and grammatical errors and report them in a list format without altering the original text, (2) identify spelling and grammatical errors and generate a revised version of the text, ensuring content integrity, (3) evaluate the text for factual inaccuracies, including incorrect information and treatment errors, and report them without modifying the original text. Human control was provided by an experienced medical secretary. Outcome parameters were processing time, percentage of identified errors, and overall error detection rate. RESULTS Artificial intelligence (AI) accuracy in error detection was 35% (median) for Prompt 1 and 75% for Prompt 2. The mean word count of erroneous medical reports was 980 (SD = 180). AI-driven report generation was significantly faster than human review (AI Prompt 1: 102.4 s; AI Prompt 2: 209.4 s; Human: 374.0 s; p < 0.0001). Prompt 1, a tabular error report, was faster but less accurate than Prompt 2, a revised version of the report (p = 0.0013). Content analysis by Prompt 3 identified 70% of errors in 34.6 seconds. CONCLUSIONS AI-driven text processing for medical reports is feasible and effective. ChatGPT 4.0 demonstrated strong performance in detecting and reporting errors. The effectiveness of AI depends on prompt design, significantly impacting quality and duration. Integration into medical workflows could enhance accuracy and efficiency. AI holds promise in improving medical report writing. However, proper prompt design seems to be crucial. Appropriately integrated AI can significantly enhance supervision and quality control in health care documentation.
Collapse
Affiliation(s)
- Maximilian Habs
- Department of Neurology, Bezirksklinikum Mainkofen (BKM), Deggendorf, Germany.
| | - Stefan Knecht
- University Hospital of Düsseldorf (UKD), Düsseldorf, Germany
| | - Tobias Schmidt-Wilcke
- Department of Neurology, Bezirksklinikum Mainkofen (BKM), Deggendorf, Germany; University Hospital of Düsseldorf (UKD), Düsseldorf, Germany
| |
Collapse
|
3
|
Koh MCY, Ngiam JN, Oon JEL, Lum LHW, Smitasin N, Archuleta S. Using ChatGPT for writing hospital inpatient discharge summaries - perspectives from an inpatient infectious diseases service. BMC Health Serv Res 2025; 25:221. [PMID: 39924512 PMCID: PMC11809107 DOI: 10.1186/s12913-025-12373-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 02/03/2025] [Indexed: 02/11/2025] Open
Abstract
BACKGROUND Hospital discharge summaries are important tools for communication between healthcare professionals. They convey events that occurred during hospitalisation, as well as the subsequent follow-up plans. Artificial intelligence models can be used to summarise information succinctly from large amounts of raw data input. We explored ChatGPT's ability to generate effective discharge summaries to assist junior doctors in writing these documents. METHODS We constructed three hypothetical scenarios of inpatient encounters, with three different outcomes: i) discharge home with follow-up with a general practitioner, ii) discharge to a stepdown facility for further physical rehabilitation, iii) transfer to a tertiary centre for more advanced care. ChatGPT was used to generate discharge summaries for these three scenarios. The quality of the responses provided were evaluated. RESULTS ChatGPT was able to provide an effective framework for discharge summaries. It processed large volumes of text, summarising pertinent issues and communicating follow-up plans clearly. It is a potentially useful tool for documentation for clinicians. However, pitfalls remain, where close reading is still required to ensure the veracity of the output provided. CONCLUSIONS ChatGPT was able to synthesize patient information from a long prosaic format to provide a structured discharge summary. Future prospective study could evaluate if this framework provided by ChatGPT is helpful to aid junior doctors in learning about and writing discharge summaries more efficiently.
Collapse
Affiliation(s)
- Matthew Chung Yi Koh
- Division of Infectious Diseases, Department of Medicine, National University Hospital, National University Health System, 1E Kent Ridge Rd, NUHS Tower Block, Level 10, Singapore, 119228, Singapore
| | - Jinghao Nicholas Ngiam
- Division of Infectious Diseases, Department of Medicine, National University Hospital, National University Health System, 1E Kent Ridge Rd, NUHS Tower Block, Level 10, Singapore, 119228, Singapore.
| | - Jolene Ee Ling Oon
- Division of Infectious Diseases, Department of Medicine, National University Hospital, National University Health System, 1E Kent Ridge Rd, NUHS Tower Block, Level 10, Singapore, 119228, Singapore
| | - Lionel Hon-Wai Lum
- Division of Infectious Diseases, Department of Medicine, National University Hospital, National University Health System, 1E Kent Ridge Rd, NUHS Tower Block, Level 10, Singapore, 119228, Singapore
| | - Nares Smitasin
- Division of Infectious Diseases, Department of Medicine, National University Hospital, National University Health System, 1E Kent Ridge Rd, NUHS Tower Block, Level 10, Singapore, 119228, Singapore
| | - Sophia Archuleta
- Division of Infectious Diseases, Department of Medicine, National University Hospital, National University Health System, 1E Kent Ridge Rd, NUHS Tower Block, Level 10, Singapore, 119228, Singapore
| |
Collapse
|
4
|
Swisher AR, Wu AW, Liu GC, Lee MK, Carle TR, Tang DM. Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT's Large Language Model. Otolaryngol Head Neck Surg 2024; 171:1751-1757. [PMID: 39105460 DOI: 10.1002/ohn.927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 07/03/2024] [Accepted: 07/20/2024] [Indexed: 08/07/2024]
Abstract
OBJECTIVE To use an artificial intelligence (AI)-powered large language model (LLM) to improve readability of patient handouts. STUDY DESIGN Review of online material modified by AI. SETTING Academic center. METHODS Five handout materials obtained from the American Rhinologic Society (ARS) and the American Academy of Facial Plastic and Reconstructive Surgery websites were assessed using validated readability metrics. The handouts were inputted into OpenAI's ChatGPT-4 after prompting: "Rewrite the following at a 6th-grade reading level." The understandability and actionability of both native and LLM-revised versions were evaluated using the Patient Education Materials Assessment Tool (PEMAT). Results were compared using Wilcoxon rank-sum tests. RESULTS The mean readability scores of the standard (ARS, American Academy of Facial Plastic and Reconstructive Surgery) materials corresponded to "difficult," with reading categories ranging between high school and university grade levels. Conversely, the LLM-revised handouts had an average seventh-grade reading level. LLM-revised handouts had better readability in nearly all metrics tested: Flesch-Kincaid Reading Ease (70.8 vs 43.9; P < .05), Gunning Fog Score (10.2 vs 14.42; P < .05), Simple Measure of Gobbledygook (9.9 vs 13.1; P < .05), Coleman-Liau (8.8 vs 12.6; P < .05), and Automated Readability Index (8.2 vs 10.7; P = .06). PEMAT scores were significantly higher in the LLM-revised handouts for understandability (91 vs 74%; P < .05) with similar actionability (42 vs 34%; P = .15) when compared to the standard materials. CONCLUSION Patient-facing handouts can be augmented by ChatGPT with simple prompting to tailor information with improved readability. This study demonstrates the utility of LLMs to aid in rewriting patient handouts and may serve as a tool to help optimize education materials. LEVEL OF EVIDENCE Level VI.
Collapse
Affiliation(s)
- Austin R Swisher
- Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona, USA
| | - Arthur W Wu
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| | - Gene C Liu
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| | - Matthew K Lee
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| | - Taylor R Carle
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| | - Dennis M Tang
- Division of Otolaryngology-Head and Neck Surgery, Cedars-Sinai, Los Angeles, California, USA
| |
Collapse
|
5
|
Lee C, Britto S, Diwan K. Evaluating the Impact of Artificial Intelligence (AI) on Clinical Documentation Efficiency and Accuracy Across Clinical Settings: A Scoping Review. Cureus 2024; 16:e73994. [PMID: 39703286 PMCID: PMC11658896 DOI: 10.7759/cureus.73994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/18/2024] [Indexed: 12/21/2024] Open
Abstract
Artificial intelligence (AI) technologies (natural language processing (NLP), speech recognition (SR), and machine learning (ML)) can transform clinical documentation in healthcare. This scoping review evaluates the impact of AI on the accuracy and efficiency of clinical documentation across various clinical settings (hospital wards, emergency departments, and outpatient clinics). We found 176 articles by applying a specific search string on Ovid. To ensure a more comprehensive search process, we also performed manual searches on PubMed and BMJ, examining any relevant references we encountered. In this way, we were able to add 46 more articles, resulting in 222 articles in total. After removing duplicates, 208 articles were screened. This led to the inclusion of 36 studies. We were mostly interested in articles discussing the impact of AI technologies, such as NLP, ML, and SR, and their accuracy and efficiency in clinical documentation. To ensure that our research reflected recent work, we focused our efforts on studies published in 2019 and beyond. This criterion was pilot-tested beforehand and necessary adjustments were made. After comparing screened articles independently, we ensured inter-rater reliability (Cohen's kappa=1.0), and data extraction was completed on these 36 articles. We conducted this study according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. This scoping review shows improvements in clinical documentation using AI technologies, with an emphasis on accuracy and efficiency. There was a reduction in clinician workload, with the streamlining of the documentation processes. Subsequently, doctors also had more time for patient care. However, these articles also raised various challenges surrounding the use of AI in clinical settings. These challenges included the management of errors, legal liability, and integration of AI with electronic health records (EHRs). There were also some ethical concerns regarding the use of AI with patient data. AI shows massive potential for improving the day-to-day work life of doctors across various clinical settings. However, more research is needed to address the many challenges associated with its use. Studies demonstrate improved accuracy and efficiency in clinical documentation with the use of AI. With better regulatory frameworks, implementation, and research, AI can significantly reduce the burden placed on doctors by documentation.
Collapse
Affiliation(s)
- Craig Lee
- General Internal Medicine, University Hospitals Plymouth NHS Trust, Plymouth, GBR
| | - Shawn Britto
- General Internal Medicine, University Hospitals Plymouth NHS Trust, Plymouth, GBR
| | - Khaled Diwan
- General Internal Medicine, University Hospitals Plymouth NHS Trust, Plymouth, GBR
| |
Collapse
|
6
|
Jabin MSR. The need for a refined classification system and national incident reporting system for health information technology-related incidents. Front Digit Health 2024; 6:1422396. [PMID: 39131183 PMCID: PMC11310167 DOI: 10.3389/fdgth.2024.1422396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 07/17/2024] [Indexed: 08/13/2024] Open
Affiliation(s)
- Md Shafiqur Rahman Jabin
- Department of Medicine & Optometry, Linnaeus University, Kalmar, Sweden
- Centre for Digital Innovations in Health and Social Care, University of Bradford, West Yorkshire, United Kingdom
| |
Collapse
|
7
|
Clough RAJ, Sparkes WA, Clough OT, Sykes JT, Steventon AT, King K. Transforming healthcare documentation: harnessing the potential of AI to generate discharge summaries. BJGP Open 2024; 8:BJGPO.2023.0116. [PMID: 37699649 PMCID: PMC11169980 DOI: 10.3399/bjgpo.2023.0116] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 08/14/2023] [Accepted: 09/01/2023] [Indexed: 09/14/2023] Open
Abstract
BACKGROUND Hospital discharge summaries play an essential role in informing GPs of recent admissions to ensure excellent continuity of care and prevent adverse events; however, they are notoriously poorly written, time-consuming, and can result in delayed discharge. AIM To evaluate the potential of artificial intelligence (AI) to produce high-quality discharge summaries equivalent to the level of a doctor who has completed the UK Foundation Programme. DESIGN & SETTING Feasibility study using 25 mock patient vignettes. METHOD Twenty-five mock patient vignettes were written by the authors. Five junior doctors wrote discharge summaries from the case vignettes (five each). The same case vignettes were input into ChatGPT. In total, 50 discharge summaries were generated; 25 by Al and 25 by junior doctors. Quality and suitability were determined through both independent GP evaluators and adherence to a minimum dataset. RESULTS Of the 25 AI-written discharge summaries 100% were deemed by GPs to be of an acceptable quality compared with 92% of the junior doctor summaries. They both showed a mean compliance of 97% with the minimum dataset. In addition, the ability of GPs to determine if the summary was written by ChatGPT was poor, with only a 60% accuracy of detection. Similarly, when run through an AI-detection tool all were recognised as being very unlikely to be written by AI. CONCLUSION AI has proven to produce discharge summaries of equivalent quality to a junior doctor who has completed the UK Foundation Programme; however, larger studies with real-world patient data with NHS-approved AI tools will need to be conducted.
Collapse
Affiliation(s)
| | | | | | | | | | - Kate King
- Academic Department of Military General Practice, Research & Clinical Innovation, Defence Medical Services, ICT Centre,, Birmingham, UK
| |
Collapse
|
8
|
Bidlespacher K, Mulkey DC. The Effect of Teach-Back on Readmission Rates in Rehabilitation Patients. Rehabil Nurs 2024; 49:65-72. [PMID: 38289196 DOI: 10.1097/rnj.0000000000000452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
PURPOSE Thirty-day readmissions often occur in rehabilitation patients and can happen for many reasons. One of those reasons is when patients do not fully understand how to effectively manage their health after discharge. The purpose of this evidence-based quality improvement project was to determine if implementing the teach-back intervention from the Agency for Healthcare Research and Quality's (AHRQ) Health Literacy Universal Precautions Toolkit would impact 30-day readmission rates among adult rehabilitation patients. METHODS Data were collected from the electronic health record of rehabilitation patients. The comparative group included all rehabilitation admissions for 8 weeks prior to the intervention. The implementation group was composed of the rehabilitation admissions for 8 weeks post-implementation. All patients were then followed for 30 days postdischarge to capture readmissions. RESULTS The total sample size was 79 ( n = 43 in the comparative group, n = 36 in the implementation group). There was a 45% decrease in the mean percentage of the 30-day readmission rate in the implementation group as compared with the comparative group. CONCLUSION Based on the results, using the teach-back intervention from AHRQ's Health Literacy Universal Precautions Toolkit may impact 30-day readmission rates.
Collapse
Affiliation(s)
- Kelly Bidlespacher
- School of Nursing & Health Sciences, Pennsylvania College of Technology, Williamsport, PA, USA
| | - David C Mulkey
- College of Nursing and Health Care Professions, Grand Canyon University, Phoenix, AZ, USA
| |
Collapse
|
9
|
Sharma SC, Ramchandani JP, Thakker A, Lahiri A. ChatGPT in Plastic and Reconstructive Surgery. Indian J Plast Surg 2023; 56:320-325. [PMID: 37705820 PMCID: PMC10497341 DOI: 10.1055/s-0043-1771514] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023] Open
Abstract
Background Chat Generative Pre-Trained Transformer (ChatGPT) is a versatile large language model-based generative artificial intelligence. It is proficient in a variety of tasks from drafting emails to coding to composing music to passing medical licensing exams. While the potential role of ChatGPT in plastic surgery is promising, evidence-based research is needed to guide its implementation in practice. Methods This review aims to summarize the literature surrounding ChatGPT's use in plastic surgery. Results A literature search revealed several applications for ChatGPT in the field of plastic surgery, including the ability to create academic literature and to aid the production of research. However, the ethical implications of using such chatbots in scientific writing requires careful consideration. ChatGPT can also generate high-quality patient discharge summaries and operation notes within seconds, freeing up busy junior doctors to complete other tasks. However, currently clinical information must still be manually inputted, and clinicians must consider data privacy implications. Its use in aiding patient communication and education and training is also widely documented in the literature. However, questions have been raised over the accuracy of answers generated given that current versions of ChatGPT cannot access the most up-to-date sources. Conclusions While one must be aware of its shortcomings, ChatGPT is a useful tool for plastic surgeons to improve productivity for a range of tasks from manuscript preparation to healthcare communication generation to drafting teaching sessions to studying and learning. As access improves and technology becomes more refined, surely more uses for ChatGPT in plastic surgery will become apparent.
Collapse
Affiliation(s)
- Sanjeev Chaand Sharma
- Department of Plastic Surgery, Leicester Royal Infirmary, Infirmary Square, Leicester, United Kingdom
| | - Jai Parkash Ramchandani
- Faculty of Life Sciences & Medicine, King's College London, Guy's Campus, Great Maze Pond, London, United Kingdom
| | - Arjuna Thakker
- Academic Team of Musculoskeletal Surgery, Leicester General Hospital, University Hospitals of Leicester NHS Trust, United Kingdom
| | - Anindya Lahiri
- Department of Plastic Surgery, Sandwell General Hospital, West Bromwich, United Kingdom
| |
Collapse
|