1
|
Yang HY, Raghunathan K, Widera E, Pantilat SZ, Brender T, Heintz TA, Espejo E, Boscardin J, Mills H, Lee A, Berchuck J, Cobert J. Lexical associations can characterize clinical documentation trends related to palliative care and metastatic cancer. Sci Rep 2025; 15:17245. [PMID: 40383724 PMCID: PMC12086223 DOI: 10.1038/s41598-025-01828-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Accepted: 05/08/2025] [Indexed: 05/20/2025] Open
Abstract
Palliative care is known to improve quality of life in advanced cancer. Natural language processing offers insights to how documentation around palliative care in relation to metastatic cancer has changed. We analyzed inpatient clinical notes using unsupervised language models that learn how words related to metastatic cancer (e.g. "mets", "metastases") and palliative care (e.g. "palliative care", "pal care") appear relationally and change over time. We included any note from adults hospitalized at the University of California, San Francisco system. The primary outcome was how similarly terms related to metastatic cancer and palliative care appeared in notes using a mathematical approach (cosine similarity). We used word2vec to model language numerically as vectors. Relational data between vectors was captured using cosine similarity. We performed linear regression to identify changes in these relationships of terms over time. As a sensitivity analysis, we performed the same analysis per year restricted only to patients with an ICD-9/10 diagnosis code for metastatic cancer. Metastatic cancer and palliative care terms appeared in similar contexts in clinical notes each year, suggesting a close relationship in documentation. However, over time, this relationship weakened, with these terms becoming less commonly used together as measured by cosine similarities. We found similar trends when we retrained models just on patients with a diagnosis code for metastatic cancer. Text in clinical notes offers unique insights into how medical providers document palliative care in patients with advanced malignancies and how these documentation practices evolve over time.
Collapse
Affiliation(s)
- Hao Yuan Yang
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - Karthik Raghunathan
- Department of Anesthesia and Perioperative Care, Duke University, Durham, NC, USA
| | - Eric Widera
- Division of Geriatrics, San Francisco VA Health Care System, San Francisco, CA, USA
| | - Steven Z Pantilat
- Division of Palliative Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Teva Brender
- Department of Internal Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Timothy A Heintz
- Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Edie Espejo
- Geriatrics, Palliative, and Extended Care, Veterans Affairs Medical Center, San Francisco, CA, USA
| | - John Boscardin
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
- Division of Geriatrics, San Francisco VA Health Care System, San Francisco, CA, USA
| | - Hunter Mills
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA
| | - Albert Lee
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA
| | - Jacob Berchuck
- Division of Oncology, Winship Cancer Institute, Emory University, Atlanta, GA, USA
| | - Julien Cobert
- Anesthesia Service, San Francisco VA Health Care System, 4150 Clement St Building 6, Office 206, San Francisco, CA, USA.
- Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
2
|
Shashikumar SP, Mohammadi S, Krishnamoorthy R, Patel A, Wardi G, Ahn JC, Singh K, Aronoff-Spencer E, Nemati S. Development and prospective implementation of a large language model based system for early sepsis prediction. NPJ Digit Med 2025; 8:290. [PMID: 40379845 PMCID: PMC12084535 DOI: 10.1038/s41746-025-01689-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Accepted: 04/27/2025] [Indexed: 05/19/2025] Open
Abstract
Sepsis is a dysregulated host response to infection with high mortality and morbidity. Early detection and intervention have been shown to improve patient outcomes, but existing computational models relying on structured electronic health record data often miss contextual information from unstructured clinical notes. This study introduces COMPOSER-LLM, an open-source large language model (LLM) integrated with the COMPOSER model to enhance early sepsis prediction. For high-uncertainty predictions, the LLM extracts additional context to assess sepsis-mimics, improving accuracy. Evaluated on 2500 patient encounters, COMPOSER-LLM achieved a sensitivity of 72.1%, positive predictive value of 52.9%, F-1 score of 61.0%, and 0.0087 false alarms per patient hour, outperforming the standalone COMPOSER model. Prospective validation yielded similar results. Manual chart review found 62% of false positives had bacterial infections, demonstrating potential clinical utility. Our findings suggest that integrating LLMs with traditional models can enhance predictive performance by leveraging unstructured data, representing a significant advance in healthcare analytics.
Collapse
Affiliation(s)
| | - Sina Mohammadi
- Division of Biomedical Informatics, UC San Diego, San Diego, CA, USA
| | | | - Avi Patel
- Department of Emergency Medicine, UC San Diego, San Diego, CA, USA
| | - Gabriel Wardi
- Department of Emergency Medicine, UC San Diego, San Diego, CA, USA
- Division of Pulmonary, Critical Care and Sleep Medicine, UC San Diego, San Diego, CA, USA
| | - Joseph C Ahn
- Division of Biomedical Informatics, UC San Diego, San Diego, CA, USA
- Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, NY, USA
| | - Karandeep Singh
- Division of Biomedical Informatics, UC San Diego, San Diego, CA, USA
- Jacobs Center for Health Innovation, UC San Diego Health, San Diego, CA, USA
| | - Eliah Aronoff-Spencer
- Division of Infectious Diseases and Global Public Health, UC San Diego, San Diego, CA, USA
| | - Shamim Nemati
- Division of Biomedical Informatics, UC San Diego, San Diego, CA, USA.
| |
Collapse
|
3
|
Treloar EC, Ting YY, Bruening MH, Reid JL, Edwards S, Bradshaw EL, Ey JD, Wichmann M, Herath M, Maddern GJ. Lost in transcription - how accurately are we documenting the surgical ward round? ANZ J Surg 2025; 95:1005-1010. [PMID: 40202286 DOI: 10.1111/ans.70109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Revised: 02/07/2025] [Accepted: 03/04/2025] [Indexed: 04/10/2025]
Abstract
BACKGROUND Ward rounds are crucial to providing high-quality patient care in hospitals. Ward round quality is strongly linked to patient outcomes, yet ward round best practice is severely underrepresented in the literature. Accurate and thorough ward round documentation is essential to improving communication and patient outcomes. METHODS A prospective observational cohort study was performed by reviewing 135 audio-visual recordings of surgical ward rounds over 2 years at two hospitals. Recordings were transcribed, and an external reviewer stratified discussion points as Major, Minor, or Not Significant. Discussion was compared to the ward round note to assess the accuracy of documentation based on bedside discussion. The primary endpoint was the accuracy of Major discussion in the patient case notes. Secondary objectives involved investigating variables that may have impacted accuracy (e.g., patient age, sex, length of stay in hospital, and individual clinicians). RESULTS Nearly one third (32.4%) of important (Major) spoken information regarding plans and patient care in the ward round was omitted from the patients' written medical record. Further, 11% of patient case notes contained significant errors. Patient age (P = 0.04), the day of the week on which the ward round occurred (P = 0.05) and who the scribing intern was (P ≤ 0.001) were found to impact documentation accuracy. There was a large variation in interns documenting ability (35.5%-88.9% accuracy). CONCLUSIONS This study highlighted that a significant portion of important discussion conducted during the ward round is not documented in the case note. These results suggest that system-wide change is needed to improve patient safety and outcomes.
Collapse
Affiliation(s)
- Ellie C Treloar
- Department of Surgery, The University of Adelaide, The Queen Elizabeth Hospital, Woodville, South Australia, Australia
| | - Ying Y Ting
- Department of Surgery, The University of Adelaide, The Queen Elizabeth Hospital, Woodville, South Australia, Australia
| | - Martin H Bruening
- Department of Surgery, The University of Adelaide, The Queen Elizabeth Hospital, Woodville, South Australia, Australia
| | - Jessica L Reid
- Department of Surgery, The University of Adelaide, The Queen Elizabeth Hospital, Woodville, South Australia, Australia
| | - Suzanne Edwards
- Department of Surgery, The University of Adelaide, The Queen Elizabeth Hospital, Woodville, South Australia, Australia
| | - Emma L Bradshaw
- Department of Surgery, The University of Adelaide, The Queen Elizabeth Hospital, Woodville, South Australia, Australia
| | - Jesse D Ey
- Department of Surgery, The University of Adelaide, The Queen Elizabeth Hospital, Woodville, South Australia, Australia
| | - Matthias Wichmann
- Department of General Surgery, Mount Gambier and Districts Health Service, Mount Gambier, South Australia, Australia
| | - Matheesha Herath
- Department of Surgery, The University of Adelaide, The Queen Elizabeth Hospital, Woodville, South Australia, Australia
| | - Guy J Maddern
- Department of Surgery, The University of Adelaide, The Queen Elizabeth Hospital, Woodville, South Australia, Australia
| |
Collapse
|
4
|
Shashikumar SP, Mohammadi S, Krishnamoorthy R, Patel A, Wardi G, Ahn JC, Singh K, Aronoff-Spencer E, Nemati S. Development and Prospective Implementation of a Large Language Model based System for Early Sepsis Prediction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.03.07.25323589. [PMID: 40162268 PMCID: PMC11952477 DOI: 10.1101/2025.03.07.25323589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Sepsis is a dysregulated host response to infection with high mortality and morbidity. Early detection and intervention have been shown to improve patient outcomes, but existing computational models relying on structured electronic health record data often miss contextual information from unstructured clinical notes. This study introduces COMPOSER-LLM, an open-source large language model (LLM) integrated with the COMPOSER model to enhance early sepsis prediction. For high-uncertainty predictions, the LLM extracts additional context to assess sepsis-mimics, improving accuracy. Evaluated on 2,500 patient encounters, COMPOSER-LLM achieved a sensitivity of 72.1%, positive predictive value of 52.9%, F-1 score of 61.0%, and 0.0087 false alarms per patient hour, outperforming the standalone COMPOSER model. Prospective validation yielded similar results. Manual chart review found 62% of false positives had bacterial infections, demonstrating potential clinical utility. Our findings suggest that integrating LLMs with traditional models can enhance predictive performance by leveraging unstructured data, representing a significant advance in healthcare analytics.
Collapse
Affiliation(s)
| | - Sina Mohammadi
- Division of Biomedical Informatics, UC San Diego, San Diego, USA
| | | | - Avi Patel
- Department of Emergency Medicine, UC San Diego, San Diego, USA
| | - Gabriel Wardi
- Department of Emergency Medicine, UC San Diego, San Diego, USA
- Division of Pulmonary, Critical Care and Sleep Medicine, UC San Diego, San Diego, USA
| | - Joseph C. Ahn
- Division of Biomedical Informatics, UC San Diego, San Diego, USA
- Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, USA
| | - Karandeep Singh
- Division of Biomedical Informatics, UC San Diego, San Diego, USA
- Jacobs Center for Health Innovation, UC San Diego Health, San Diego, USA
| | - Eliah Aronoff-Spencer
- Division of Infectious Diseases and Global Public Health, UC San Diego, San Diego, USA
| | - Shamim Nemati
- Division of Biomedical Informatics, UC San Diego, San Diego, USA
| |
Collapse
|
5
|
Gao Y, Li R, Croxford E, Caskey J, Patterson BW, Churpek M, Miller T, Dligach D, Afshar M. Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study. JMIR AI 2025; 4:e58670. [PMID: 39993309 PMCID: PMC11894347 DOI: 10.2196/58670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 08/07/2024] [Accepted: 11/07/2024] [Indexed: 02/26/2025]
Abstract
BACKGROUND Electronic health records (EHRs) and routine documentation practices play a vital role in patients' daily care, providing a holistic record of health, diagnoses, and treatment. However, complex and verbose EHR narratives can overwhelm health care providers, increasing the risk of diagnostic inaccuracies. While large language models (LLMs) have showcased their potential in diverse language tasks, their application in health care must prioritize the minimization of diagnostic errors and the prevention of patient harm. Integrating knowledge graphs (KGs) into LLMs offers a promising approach because structured knowledge from KGs could enhance LLMs' diagnostic reasoning by providing contextually relevant medical information. OBJECTIVE This study introduces DR.KNOWS (Diagnostic Reasoning Knowledge Graph System), a model that integrates Unified Medical Language System-based KGs with LLMs to improve diagnostic predictions from EHR data by retrieving contextually relevant paths aligned with patient-specific information. METHODS DR.KNOWS combines a stack graph isomorphism network for node embedding with an attention-based path ranker to identify and rank knowledge paths relevant to a patient's clinical context. We evaluated DR.KNOWS on 2 real-world EHR datasets from different geographic locations, comparing its performance to baseline models, including QuickUMLS and standard LLMs (Text-to-Text Transfer Transformer and ChatGPT). To assess diagnostic reasoning quality, we designed and implemented a human evaluation framework grounded in clinical safety metrics. RESULTS DR.KNOWS demonstrated notable improvements over baseline models, showing higher accuracy in extracting diagnostic concepts and enhanced diagnostic prediction metrics. Prompt-based fine-tuning of Text-to-Text Transfer Transformer with DR.KNOWS knowledge paths achieved the highest ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation-Longest Common Subsequence) and concept unique identifier F1-scores, highlighting the benefits of KG integration. Human evaluators found the diagnostic rationales of DR.KNOWS to be aligned strongly with correct clinical reasoning, indicating improved abstraction and reasoning. Recognized limitations include potential biases within the KG data, which we addressed by emphasizing case-specific path selection and proposing future bias-mitigation strategies. CONCLUSIONS DR.KNOWS offers a robust approach for enhancing diagnostic accuracy and reasoning by integrating structured KG knowledge into LLM-based clinical workflows. Although further work is required to address KG biases and extend generalizability, DR.KNOWS represents progress toward trustworthy artificial intelligence-driven clinical decision support, with a human evaluation framework focused on diagnostic safety and alignment with clinical standards.
Collapse
Affiliation(s)
- Yanjun Gao
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Denver, CO, United States
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - Ruizhe Li
- University of Aberdeen, Aberdeen, United Kingdom
| | - Emma Croxford
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - John Caskey
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - Brian W Patterson
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - Matthew Churpek
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - Timothy Miller
- Boston Children's Hospital, Harvard Medical School, Boston, MA, United States
| | | | - Majid Afshar
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
6
|
Liu J, Koopman B, Brown NJ, Chu K, Nguyen A. Generating synthetic clinical text with local large language models to identify misdiagnosed limb fractures in radiology reports. Artif Intell Med 2025; 159:103027. [PMID: 39580897 DOI: 10.1016/j.artmed.2024.103027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 09/26/2024] [Accepted: 11/15/2024] [Indexed: 11/26/2024]
Abstract
Large language models (LLMs) demonstrate impressive capabilities in generating human-like content and have much potential to improve the performance and efficiency of healthcare. An important application of LLMs is to generate synthetic clinical reports that could alleviate the burden of annotating and collecting real-world data in training AI models. Meanwhile, there could be concerns and limitations in using commercial LLMs to handle sensitive clinical data. In this study, we examined the use of open-source LLMs as an alternative to generate synthetic radiology reports to supplement real-world annotated data. We found LLMs hosted locally can achieve similar performance compared to ChatGPT and GPT-4 in augmenting training data for the downstream report classification task of identifying misdiagnosed fractures. We also examined the predictive value of using synthetic reports alone for training downstream models, where our best setting achieved more than 90 % of the performance using real-world data. Overall, our findings show that open-source, local LLMs can be a favourable option for creating synthetic clinical reports for downstream tasks.
Collapse
Affiliation(s)
- Jinghui Liu
- Australian e-Health Research Centre, CSIRO, Brisbane, Queensland, Australia.
| | - Bevan Koopman
- Australian e-Health Research Centre, CSIRO, Brisbane, Queensland, Australia
| | - Nathan J Brown
- Emergency and Trauma Centre, Royal Brisbane and Women's Hospital, Brisbane, Queensland, Australia
| | - Kevin Chu
- Emergency and Trauma Centre, Royal Brisbane and Women's Hospital, Brisbane, Queensland, Australia
| | - Anthony Nguyen
- Australian e-Health Research Centre, CSIRO, Brisbane, Queensland, Australia
| |
Collapse
|
7
|
Ding S, Ye J, Hu X, Zou N. Distilling the knowledge from large-language model for health event prediction. Sci Rep 2024; 14:30675. [PMID: 39730390 DOI: 10.1038/s41598-024-75331-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 10/04/2024] [Indexed: 12/29/2024] Open
Abstract
Health event prediction is empowered by the rapid and wide application of electronic health records (EHR). In the Intensive Care Unit (ICU), precisely predicting the health related events in advance is essential for providing treatment and intervention to improve the patients outcomes. EHR is a kind of multi-modal data containing clinical text, time series, structured data, etc. Most health event prediction works focus on a single modality, e.g., text or tabular EHR. How to effectively learn from the multi-modal EHR for health event prediction remains a challenge. Inspired by the strong capability in text processing of large language model (LLM), we propose the framework CKLE for health event prediction by distilling the knowledge from LLM and learning from multi-modal EHR. There are two challenges of applying LLM in the health event prediction, the first one is most LLM can only handle text data rather than other modalities, e.g., structured data. The second challenge is the privacy issue of health applications requires the LLM to be locally deployed, which may be limited by the computational resource. CKLE solves the challenges of LLM scalability and portability in the healthcare domain by distilling the cross-modality knowledge from LLM into the health event predictive model. To fully take advantage of the strong power of LLM, the raw clinical text is refined and augmented with prompt learning. The embedding of clinical text are generated by LLM. To effectively distill the knowledge of LLM into the predictive model, we design a cross-modality knowledge distillation (KD) method. A specially designed training objective will be used for the KD process with the consideration of multiple modality and patient similarity. The KD loss function consists of two parts. The first one is cross-modality contrastive loss function, which models the correlation of different modalities from the same patient. The second one is patient similarity learning loss function to model the correlations between similar patients. The cross-modality knowledge distillation can distill the rich information in clinical text and the knowledge of LLM into the predictive model on structured EHR data. To demonstrate the effectiveness of CKLE, we evaluate CKLE on two health event prediction tasks in the field of cardiology, heart failure prediction and hypertension prediction. We select the 7125 patients from MIMIC-III dataset and split them into train/validation/test sets. We can achieve a maximum 4.48% improvement in accuracy compared to state-of-the-art predictive model designed for health event prediction. The results demonstrate CKLE can surpass the baseline prediction models significantly on both normal and limited label settings. We also conduct the case study on cardiology disease analysis in the heart failure and hypertension prediction. Through the feature importance calculation, we analyse the salient features related to the cardiology disease which corresponds to the medical domain knowledge. The superior performance and interpretability of CKLE pave a promising way to leverage the power and knowledge of LLM in the health event prediction in real-world clinical settings.
Collapse
Affiliation(s)
- Sirui Ding
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
| | | | - Xia Hu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Na Zou
- Department of Industrial Engineering, University of Houston, Houston, TX, USA.
| |
Collapse
|
8
|
Jiang S, Lam BD, Agrawal M, Shen S, Kurtzman N, Horng S, Karger DR, Sontag D. Machine learning to predict notes for chart review in the oncology setting: a proof of concept strategy for improving clinician note-writing. J Am Med Inform Assoc 2024; 31:1578-1582. [PMID: 38700253 PMCID: PMC11187428 DOI: 10.1093/jamia/ocae092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 04/05/2024] [Accepted: 04/25/2024] [Indexed: 05/05/2024] Open
Abstract
OBJECTIVE Leverage electronic health record (EHR) audit logs to develop a machine learning (ML) model that predicts which notes a clinician wants to review when seeing oncology patients. MATERIALS AND METHODS We trained logistic regression models using note metadata and a Term Frequency Inverse Document Frequency (TF-IDF) text representation. We evaluated performance with precision, recall, F1, AUC, and a clinical qualitative assessment. RESULTS The metadata only model achieved an AUC 0.930 and the metadata and TF-IDF model an AUC 0.937. Qualitative assessment revealed a need for better text representation and to further customize predictions for the user. DISCUSSION Our model effectively surfaces the top 10 notes a clinician wants to review when seeing an oncology patient. Further studies can characterize different types of clinician users and better tailor the task for different care settings. CONCLUSION EHR audit logs can provide important relevance data for training ML models that assist with note-writing in the oncology setting.
Collapse
Affiliation(s)
- Sharon Jiang
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
| | - Barbara D Lam
- Division of Hematology and Oncology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States
- Division of Clinical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States
| | - Monica Agrawal
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
| | - Shannon Shen
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
| | - Nicholas Kurtzman
- Department of Emergency Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States
| | - Steven Horng
- Division of Clinical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States
- Department of Emergency Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States
| | - David R Karger
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
| | - David Sontag
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
| |
Collapse
|
9
|
Cobert J, Mills H, Lee A, Gologorskaya O, Espejo E, Jeon SY, Boscardin WJ, Heintz TA, Kennedy CJ, Ashana DC, Chapman AC, Raghunathan K, Smith AK, Lee SJ. Measuring Implicit Bias in ICU Notes Using Word-Embedding Neural Network Models. Chest 2024; 165:1481-1490. [PMID: 38199323 PMCID: PMC11317817 DOI: 10.1016/j.chest.2023.12.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 12/12/2023] [Accepted: 12/29/2023] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Language in nonmedical data sets is known to transmit human-like biases when used in natural language processing (NLP) algorithms that can reinforce disparities. It is unclear if NLP algorithms of medical notes could lead to similar transmissions of biases. RESEARCH QUESTION Can we identify implicit bias in clinical notes, and are biases stable across time and geography? STUDY DESIGN AND METHODS To determine whether different racial and ethnic descriptors are similar contextually to stigmatizing language in ICU notes and whether these relationships are stable across time and geography, we identified notes on critically ill adults admitted to the University of California, San Francisco (UCSF), from 2012 through 2022 and to Beth Israel Deaconess Hospital (BIDMC) from 2001 through 2012. Because word meaning is derived largely from context, we trained unsupervised word-embedding algorithms to measure the similarity (cosine similarity) quantitatively of the context between a racial or ethnic descriptor (eg, African-American) and a stigmatizing target word (eg, nonco-operative) or group of words (violence, passivity, noncompliance, nonadherence). RESULTS In UCSF notes, Black descriptors were less likely to be similar contextually to violent words compared with White descriptors. Contrastingly, in BIDMC notes, Black descriptors were more likely to be similar contextually to violent words compared with White descriptors. The UCSF data set also showed that Black descriptors were more similar contextually to passivity and noncompliance words compared with Latinx descriptors. INTERPRETATION Implicit bias is identifiable in ICU notes. Racial and ethnic group descriptors carry different contextual relationships to stigmatizing words, depending on when and where notes were written. Because NLP models seem able to transmit implicit bias from training data, use of NLP algorithms in clinical prediction could reinforce disparities. Active debiasing strategies may be necessary to achieve algorithmic fairness when using language models in clinical research.
Collapse
Affiliation(s)
- Julien Cobert
- Anesthesia Service, San Francisco VA Health Care System, University of California, San Francisco, San Francisco, CA; Department of Anesthesia and Perioperative Care, University of California, San Francisco, San Francisco, CA.
| | - Hunter Mills
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA
| | - Albert Lee
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA
| | - Oksana Gologorskaya
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA
| | - Edie Espejo
- Division of Geriatrics, University of California, San Francisco, San Francisco, CA
| | - Sun Young Jeon
- Division of Geriatrics, University of California, San Francisco, San Francisco, CA
| | - W John Boscardin
- Division of Geriatrics, University of California, San Francisco, San Francisco, CA
| | - Timothy A Heintz
- School of Medicine, University of California, San Diego, San Diego, CA
| | - Christopher J Kennedy
- Department of Psychiatry, Harvard Medical School, Boston, MA; Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA
| | - Deepshikha C Ashana
- Division of Pulmonary, Allergy, and Critical Care Medicine, Duke University, Durham, NC
| | - Allyson Cook Chapman
- Department of Medicine, the Division of Critical Care and Palliative Medicine, University of California, San Francisco, San Francisco, CA; Department of Surgery, University of California, San Francisco, San Francisco, CA
| | - Karthik Raghunathan
- Department of Anesthesia and Perioperative Care, Duke University, Durham, NC
| | - Alex K Smith
- Department of Geriatrics, Palliative, and Extended Care, Veterans Affairs Medical Center, University of California, San Francisco, San Francisco, CA; Division of Geriatrics, University of California, San Francisco, San Francisco, CA
| | - Sei J Lee
- Division of Geriatrics, University of California, San Francisco, San Francisco, CA
| |
Collapse
|
10
|
Boonstra MJ, Weissenbacher D, Moore JH, Gonzalez-Hernandez G, Asselbergs FW. Artificial intelligence: revolutionizing cardiology with large language models. Eur Heart J 2024; 45:332-345. [PMID: 38170821 PMCID: PMC10834163 DOI: 10.1093/eurheartj/ehad838] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 12/01/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open
Abstract
Natural language processing techniques are having an increasing impact on clinical care from patient, clinician, administrator, and research perspective. Among others are automated generation of clinical notes and discharge letters, medical term coding for billing, medical chatbots both for patients and clinicians, data enrichment in the identification of disease symptoms or diagnosis, cohort selection for clinical trial, and auditing purposes. In the review, an overview of the history in natural language processing techniques developed with brief technical background is presented. Subsequently, the review will discuss implementation strategies of natural language processing tools, thereby specifically focusing on large language models, and conclude with future opportunities in the application of such techniques in the field of cardiology.
Collapse
Affiliation(s)
- Machteld J Boonstra
- Department of Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centre, University of Amsterdam, Amsterdam, Netherlands
| | - Davy Weissenbacher
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | | | - Folkert W Asselbergs
- Department of Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centre, University of Amsterdam, Amsterdam, Netherlands
- Institute of Health Informatics, University College London, London, UK
- The National Institute for Health Research University College London Hospitals Biomedical Research Centre, University College London, London, UK
| |
Collapse
|
11
|
Liu J, Capurro D, Nguyen A, Verspoor K. Attention-based multimodal fusion with contrast for robust clinical prediction in the face of missing modalities. J Biomed Inform 2023; 145:104466. [PMID: 37549722 DOI: 10.1016/j.jbi.2023.104466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 06/09/2023] [Accepted: 08/01/2023] [Indexed: 08/09/2023]
Abstract
OBJECTIVE With the increasing amount and growing variety of healthcare data, multimodal machine learning supporting integrated modeling of structured and unstructured data is an increasingly important tool for clinical machine learning tasks. However, it is non-trivial to manage the differences in dimensionality, volume, and temporal characteristics of data modalities in the context of a shared target task. Furthermore, patients can have substantial variations in the availability of data, while existing multimodal modeling methods typically assume data completeness and lack a mechanism to handle missing modalities. METHODS We propose a Transformer-based fusion model with modality-specific tokens that summarize the corresponding modalities to achieve effective cross-modal interaction accommodating missing modalities in the clinical context. The model is further refined by inter-modal, inter-sample contrastive learning to improve the representations for better predictive performance. We denote the model as Attention-based cRoss-MOdal fUsion with contRast (ARMOUR). We evaluate ARMOUR using two input modalities (structured measurements and unstructured text), six clinical prediction tasks, and two evaluation regimes, either including or excluding samples with missing modalities. RESULTS Our model shows improved performances over unimodal or multimodal baselines in both evaluation regimes, including or excluding patients with missing modalities in the input. The contrastive learning improves the representation power and is shown to be essential for better results. The simple setup of modality-specific tokens enables ARMOUR to handle patients with missing modalities and allows comparison with existing unimodal benchmark results. CONCLUSION We propose a multimodal model for robust clinical prediction to achieve improved performance while accommodating patients with missing modalities. This work could inspire future research to study the effective incorporation of multiple, more complex modalities of clinical data into a single model.
Collapse
Affiliation(s)
- Jinghui Liu
- Australian e-Health Research Centre, CSIRO, Queensland, Australia; School of Computing and Information Systems, The University of Melbourne, Victoria, Australia
| | - Daniel Capurro
- School of Computing and Information Systems, The University of Melbourne, Victoria, Australia; Centre for Digital Transformation of Health, The University of Melbourne, Victoria, Australia
| | - Anthony Nguyen
- Australian e-Health Research Centre, CSIRO, Queensland, Australia
| | - Karin Verspoor
- School of Computing and Information Systems, The University of Melbourne, Victoria, Australia; School of Computing Technologies, RMIT University, Victoria, Australia.
| |
Collapse
|
12
|
Houssein EH, Mohamed RE, Ali AA. Heart disease risk factors detection from electronic health records using advanced NLP and deep learning techniques. Sci Rep 2023; 13:7173. [PMID: 37138014 PMCID: PMC10156668 DOI: 10.1038/s41598-023-34294-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 04/27/2023] [Indexed: 05/05/2023] Open
Abstract
Heart disease remains the major cause of death, despite recent improvements in prediction and prevention. Risk factor identification is the main step in diagnosing and preventing heart disease. Automatically detecting risk factors for heart disease in clinical notes can help with disease progression modeling and clinical decision-making. Many studies have attempted to detect risk factors for heart disease, but none have identified all risk factors. These studies have proposed hybrid systems that combine knowledge-driven and data-driven techniques, based on dictionaries, rules, and machine learning methods that require significant human effort. The National Center for Informatics for Integrating Biology and Beyond (i2b2) proposed a clinical natural language processing (NLP) challenge in 2014, with a track (track2) focused on detecting risk factors for heart disease risk factors in clinical notes over time. Clinical narratives provide a wealth of information that can be extracted using NLP and Deep Learning techniques. The objective of this paper is to improve on previous work in this area as part of the 2014 i2b2 challenge by identifying tags and attributes relevant to disease diagnosis, risk factors, and medications by providing advanced techniques of using stacked word embeddings. The i2b2 heart disease risk factors challenge dataset has shown significant improvement by using the approach of stacking embeddings, which combines various embeddings. Our model achieved an F1 score of 93.66% by using BERT and character embeddings (CHARACTER-BERT Embedding) stacking. The proposed model has significant results compared to all other models and systems that we developed for the 2014 i2b2 challenge.
Collapse
Affiliation(s)
- Essam H Houssein
- Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Rehab E Mohamed
- Faculty of Computers and Information, Minia University, Minia, Egypt
| | - Abdelmgeid A Ali
- Faculty of Computers and Information, Minia University, Minia, Egypt
| |
Collapse
|
13
|
Derton A, Guevara M, Chen S, Moningi S, Kozono DE, Liu D, Miller TA, Savova GK, Mak RH, Bitterman DS. Natural Language Processing Methods to Empirically Explore Social Contexts and Needs in Cancer Patient Notes. JCO Clin Cancer Inform 2023; 7:e2200196. [PMID: 37235847 DOI: 10.1200/cci.22.00196] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/22/2023] [Accepted: 03/23/2023] [Indexed: 05/28/2023] Open
Abstract
PURPOSE There is an unmet need to empirically explore and understand drivers of cancer disparities, particularly social determinants of health. We explored natural language processing methods to automatically and empirically extract clinical documentation of social contexts and needs that may underlie disparities. METHODS This was a retrospective analysis of 230,325 clinical notes from 5,285 patients treated with radiotherapy from 2007 to 2019. We compared linguistic features among White versus non-White, low-income insurance versus other insurance, and male versus female patients' notes. Log odds ratios with an informative Dirichlet prior were calculated to compare words over-represented in each group. A variational autoencoder topic model was applied, and topic probability was compared between groups. The presence of machine-learnable bias was explored by developing statistical and neural demographic group classifiers. RESULTS Terms associated with varied social contexts and needs were identified for all demographic group comparisons. For example, notes of non-White and low-income insurance patients were over-represented with terms associated with housing and transportation, whereas notes of White and other insurance patients were over-represented with terms related to physical activity. Topic models identified a social history topic, and topic probability varied significantly between the demographic group comparisons. Classification models performed poorly at classifying notes of non-White and low-income insurance patients (F1 of 0.30 and 0.23, respectively). CONCLUSION Exploration of linguistic differences in clinical notes between patients of different race/ethnicity, insurance status, and sex identified social contexts and needs in patients with cancer and revealed high-level differences in notes. Future work is needed to validate whether these findings may play a role in cancer disparities.
Collapse
Affiliation(s)
- Abigail Derton
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
| | - Marco Guevara
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Shan Chen
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Shalini Moningi
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - David E Kozono
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Dianbo Liu
- Mila-Quebec AI Institute, Montreal, QC, Canada
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | - Raymond H Mak
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Danielle S Bitterman
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| |
Collapse
|
14
|
Venkatesh KP, Raza MM, Kvedar JC. Automating the overburdened clinical coding system: challenges and next steps. NPJ Digit Med 2023; 6:16. [PMID: 36737496 PMCID: PMC9898522 DOI: 10.1038/s41746-023-00768-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 01/27/2023] [Indexed: 02/05/2023] Open
|