1
|
Nyongesa CA, Hogarth M, Pa J. Artificial intelligence-driven natural language processing for identifying linguistic patterns in Alzheimer's disease and mild cognitive impairment: A study of lexical, syntactic, and cohesive features of speech through picture description tasks. J Alzheimers Dis 2025:13872877251339756. [PMID: 40336266 DOI: 10.1177/13872877251339756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2025]
Abstract
BackgroundLanguage deficits often occur early in the neurodegenerative process, yet traditional methods frequently fail to detect subtle changes. Natural language processing (NLP) offers a novel approach to identifying linguistic patterns associated with cognitive impairment.ObjectiveWe aimed to analyze linguistic features that differentiate cognitively unimpaired (CU), mild cognitive impairment (MCI), and Alzheimer's disease (AD) groups.MethodsData was extracted from picture description tasks performed by 336 participants in the DementiaBank datasets. 53 linguistic features aggregated into 4 categories: lexical, structural, syntactic, and discourse domains, were identified using NLP toolkits. With normal diagnostic cutoffs, cognitive function was evaluated with the Mini-Mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA).ResultsWith age and education as covariates, ANOVA and post-hoc Tukey's HSD tests revealed that linguistic features such as pronoun usage, syntactic complexity, and lexical sophistication showed significant differences between CU, MCI, and AD groups (p < 0.05). Notably, past tense and personal references were higher in AD than both CU and MCI (p < 0.001), while pronoun usage differed between AD and CU (p < 0.0001). Correlations indicated that higher pronoun rates and lower syntactic complexity were associated with lower MMSE scores and although some features like conjunctions and determiners approached significance, they lacked consistent differentiation.ConclusionsWith the growing adoption of artificial intelligence (AI)-based scribing, these results emphasize the potential of targeted linguistic analysis as a digital biomarker to enable continuous screening for cognitive impairment.
Collapse
Affiliation(s)
- Cynthia A Nyongesa
- Alzheimer's Disease Cooperative Study (ADCS), Department of Neurosciences, University of California, San Diego, CA, USA
| | - Mike Hogarth
- Division of Biomedical Informatics, Department of Medicine, University of California, San Diego, CA, USA
| | - Judy Pa
- Alzheimer's Disease Cooperative Study (ADCS), Department of Neurosciences, University of California, San Diego, CA, USA
| |
Collapse
|
2
|
Guan H, Novoa-Laurentiev J, Zhou L. CD-Tron: Leveraging large clinical language model for early detection of cognitive decline from electronic health records. J Biomed Inform 2025; 166:104830. [PMID: 40320101 DOI: 10.1016/j.jbi.2025.104830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 03/28/2025] [Accepted: 04/13/2025] [Indexed: 05/08/2025]
Abstract
BACKGROUND Early detection of cognitive decline during the preclinical stage of Alzheimer's disease and related dementias (AD/ADRD) is crucial for timely intervention and treatment. Clinical notes in the electronic health record contain valuable information that can aid in the early identification of cognitive decline. In this study, we utilize advanced large clinical language models, fine-tuned on clinical notes, to improve the early detection of cognitive decline. METHODS We collected clinical notes from 2,166 patients spanning the 4 years preceding their initial mild cognitive impairment (MCI) diagnosis from the Enterprise Data Warehouse of Mass General Brigham. To train the model, we developed CD-Tron, built upon a large clinical language model that was finetuned using 4,949 expert-labeled note sections. For evaluation, the trained model was applied to 1,996 independent note sections to assess its performance on real-world unstructured clinical data. Additionally, we used explainable AI techniques, specifically SHAP values (SHapley Additive exPlanations), to interpret the model's predictions and provide insight into the most influential features. Error analysis was also facilitated to further analyze the model's prediction. RESULTS CD-Tron significantly outperforms baseline models, achieving notable improvements in precision, recall, and AUC metrics for detecting cognitive decline (CD). Tested on many real-world clinical notes, CD-Tron demonstrated high sensitivity with only one false negative, crucial for clinical applications prioritizing early and accurate CD detection. SHAP-based interpretability analysis highlighted key textual features contributing to model predictions, supporting transparency and clinician understanding. CONCLUSION CD-Tron offers a novel approach to early cognitive decline detection by applying large clinical language models to free-text EHR data. Pretrained on real-world clinical notes, it accurately identifies early cognitive decline and integrates SHAP for interpretability, enhancing transparency in predictions.
Collapse
Affiliation(s)
- Hao Guan
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
| | - John Novoa-Laurentiev
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
3
|
Ge W, Godeiro Coelho LM, Donahue MA, Rice HJ, Blacker D, Hsu J, Newhouse JP, Hernández-Díaz S, Haneuse S, Westover B, Moura LMVR. Automated identification of fall-related injuries in unstructured clinical notes. Am J Epidemiol 2025; 194:1097-1105. [PMID: 39060160 PMCID: PMC11978607 DOI: 10.1093/aje/kwae240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 05/17/2024] [Accepted: 07/22/2024] [Indexed: 07/28/2024] Open
Abstract
Fall-related injuries (FRIs) are a major cause of hospitalizations among older patients, but identifying them in unstructured clinical notes poses challenges for large-scale research. In this study, we developed and evaluated natural language processing (NLP) models to address this issue. We utilized all available clinical notes from the Mass General Brigham health-care system for 2100 older adults, identifying 154 949 paragraphs of interest through automatic scanning for FRI-related keywords. Two clinical experts directly labeled 5000 paragraphs to generate benchmark-standard labels, while 3689 validated patterns were annotated, indirectly labeling 93 157 paragraphs as validated-standard labels. Five NLP models, including vanilla bidirectional encoder representations from transformers (BERT), the robustly optimized BERT approach (RoBERTa), ClinicalBERT, DistilBERT, and support vector machine (SVM), were trained using 2000 benchmark paragraphs and all validated paragraphs. BERT-based models were trained in 3 stages: masked language modeling, general boolean question-answering, and question-answering for FRIs. For validation, 500 benchmark paragraphs were used, and the remaining 2500 were used for testing. Performance metrics (precision, recall, F1 scores, area under the receiver operating characteristic curve [AUROC], and area under the precision-recall [AUPR] curve) were employed by comparison, with RoBERTa showing the best performance. Precision was 0.90 (95% CI, 0.88-0.91), recall was 0.91 (95% CI, 0.90-0.93), the F1 score was 0.91 (95% CI, 0.89-0.92), and the AUROC and AUPR curves were [both??] 0.96 (95% CI, 0.95-0.97). These NLP models accurately identify FRIs from unstructured clinical notes, potentially enhancing clinical-notes-based research efficiency.
Collapse
Affiliation(s)
- Wendong Ge
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, United States
| | | | - Maria A Donahue
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, United States
| | - Hunter J Rice
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, United States
| | - Deborah Blacker
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA 02114, United States
| | - John Hsu
- Department of Health Care Policy, Harvard Medical School, Boston, MA 02115, United States
- Mongan Institute, Massachusetts General Hospital, Boston, MA 02114, United States
- Department of Medicine, Harvard Medical School, Boston, MA 02115, United States
| | - Joseph P Newhouse
- Department of Health Care Policy, Harvard Medical School, Boston, MA 02115, United States
- National Bureau of Economic Research, Cambridge, MA 02138, United States
- Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States
- John F. Kennedy School of Government, Harvard University, Cambridge, MA 02138, United States
| | - Sonia Hernández-Díaz
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States
| | - Sebastien Haneuse
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States
- Department of Neurology, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States
| | - Brandon Westover
- Department of Neurology, Harvard Medical School, Boston, MA 02115, United States
- Department of Psychiatry, Harvard Medical School, Boston, MA 02215, United States
| | - Lidia M V R Moura
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, United States
- Department of Neurology, Harvard Medical School, Boston, MA 02115, United States
| |
Collapse
|
4
|
Pan J, Fan Z, Smith GE, Guo Y, Bian J, Xu J. Federated learning with multi-cohort real-world data for predicting the progression from mild cognitive impairment to Alzheimer's disease. Alzheimers Dement 2025; 21:e70128. [PMID: 40219846 PMCID: PMC11992589 DOI: 10.1002/alz.70128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Revised: 03/03/2025] [Accepted: 03/03/2025] [Indexed: 04/14/2025]
Abstract
INTRODUCTION Leveraging routinely collected electronic health records (EHRs) from multiple health-care institutions, this approach aims to assess the feasibility of using federated learning (FL) to predict the progression from mild cognitive impairment (MCI) to Alzheimer's disease (AD). METHODS We analyzed EHR data from the OneFlorida+ consortium, simulating six sites, and used a long short-term memory (LSTM) model with a federated averaging (FedAvg) algorithm. A personalized FL approach was used to address between-site heterogeneity. Model performance was assessed using the area under the receiver operating characteristic curve (AUC) and feature importance techniques. RESULTS Of 44,899 MCI patients, 6391 progressed to AD. FL models achieved a 6% improvement in AUC compared to local models. Key predictive features included body mass index, vitamin B12, blood pressure, and others. DISCUSSION FL showed promise in predicting AD progression by integrating heterogeneous data across multiple institutions while preserving privacy. Despite limitations, it offers potential for future clinical applications. HIGHLIGHTS We applied long short-term memory and federated learning (FL) to predict mild cognitive impairment to Alzheimer's disease progression using electronic health record data from multiple institutions. FL improved prediction performance, with a 6% increase in area under the receiver operating characteristic curve compared to local models. We identified key predictive features, such as body mass index, vitamin B12, and blood pressure. FL shows effectiveness in handling data heterogeneity across multiple sites while ensuring data privacy. Personalized and pooled FL models generally performed better than global and local models.
Collapse
Affiliation(s)
- Jinqian Pan
- Department of Health Outcomes & Biomedical InformaticsUniversity of FloridaGainesvilleFloridaUSA
| | - Zhengkang Fan
- Department of Health Outcomes & Biomedical InformaticsUniversity of FloridaGainesvilleFloridaUSA
| | - Glenn E. Smith
- Department of Clinical and Health PsychologyUniversity of FloridaGainesvilleFloridaUSA
| | - Yi Guo
- Department of Health Outcomes & Biomedical InformaticsUniversity of FloridaGainesvilleFloridaUSA
| | - Jiang Bian
- Department of Biostatistics and Health Data ScienceIndiana UniversityIndianapolisIndianaUSA
| | - Jie Xu
- Department of Health Outcomes & Biomedical InformaticsUniversity of FloridaGainesvilleFloridaUSA
| |
Collapse
|
5
|
Alba C, Xue B, Abraham J, Kannampallil T, Lu C. The foundational capabilities of large language models in predicting postoperative risks using clinical notes. NPJ Digit Med 2025; 8:95. [PMID: 39934379 DOI: 10.1038/s41746-025-01489-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Accepted: 01/28/2025] [Indexed: 02/13/2025] Open
Abstract
Clinical notes recorded during a patient's perioperative journey holds immense informational value. Advances in large language models (LLMs) offer opportunities for bridging this gap. Using 84,875 preoperative notes and its associated surgical cases from 2018 to 2021, we examine the performance of LLMs in predicting six postoperative risks using various fine-tuning strategies. Pretrained LLMs outperformed traditional word embeddings by an absolute AUROC of 38.3% and AUPRC of 33.2%. Self-supervised fine-tuning further improved performance by 3.2% and 1.5%. Incorporating labels into training further increased AUROC by 1.8% and AUPRC by 2%. The highest performance was achieved with a unified foundation model, with improvements of 3.6% for AUROC and 2.6% for AUPRC compared to self-supervision, highlighting the foundational capabilities of LLMs in predicting postoperative risks, which could be potentially beneficial when deployed for perioperative care.
Collapse
Affiliation(s)
- Charles Alba
- AI for Health Institute, Washington University in St. Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
- McKelvey School of Engineering, Washington University in St Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
- Brown School, Washington University in St Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
| | - Bing Xue
- AI for Health Institute, Washington University in St. Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
- McKelvey School of Engineering, Washington University in St Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
| | - Joanna Abraham
- AI for Health Institute, Washington University in St. Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
- School of Medicine, Washington University in St Louis, 660 S Euclid Ave, St. Louis, 63110, MO, USA
- Institute for Informatics, Data Science, and Biostatistics, Washington University in St Louis, 660 S Euclid Ave, St. Louis, 63110, MO, USA
| | - Thomas Kannampallil
- AI for Health Institute, Washington University in St. Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
- McKelvey School of Engineering, Washington University in St Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
- School of Medicine, Washington University in St Louis, 660 S Euclid Ave, St. Louis, 63110, MO, USA
- Institute for Informatics, Data Science, and Biostatistics, Washington University in St Louis, 660 S Euclid Ave, St. Louis, 63110, MO, USA
| | - Chenyang Lu
- AI for Health Institute, Washington University in St. Louis, 1 Brookings Drive, St Louis, 63130, MO, USA.
- McKelvey School of Engineering, Washington University in St Louis, 1 Brookings Drive, St Louis, 63130, MO, USA.
- School of Medicine, Washington University in St Louis, 660 S Euclid Ave, St. Louis, 63110, MO, USA.
| |
Collapse
|
6
|
Feng R, Brennan KA, Azizi Z, Goyal J, Deb B, Chang HJ, Ganesan P, Clopton P, Pedron M, Ruipérez-Campillo S, Desai Y, De Larochellière H, Baykaner T, Perez M, Rodrigo M, Rogers AJ, Narayan SM. Engineering of Generative Artificial Intelligence and Natural Language Processing Models to Accurately Identify Arrhythmia Recurrence. Circ Arrhythm Electrophysiol 2025; 18:e013023. [PMID: 39676642 PMCID: PMC11771986 DOI: 10.1161/circep.124.013023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 11/11/2024] [Indexed: 12/17/2024]
Abstract
BACKGROUND Large language models (LLMs) such as Chat Generative Pre-trained Transformer (ChatGPT) excel at interpreting unstructured data from public sources, yet are limited when responding to queries on private repositories, such as electronic health records (EHRs). We hypothesized that prompt engineering could enhance the accuracy of LLMs for interpreting EHR data without requiring domain knowledge, thus expanding their utility for patients and personalized diagnostics. METHODS We designed and systematically tested prompt engineering techniques to improve the ability of LLMs to interpret EHRs for nuanced diagnostic questions, referenced to a panel of medical experts. In 490 full-text EHR notes from 125 patients with prior life-threatening heart rhythm disorders, we asked GPT-4-turbo to identify recurrent arrhythmias distinct from prior events and tested 220 563 queries. To provide context, results were compared with rule-based natural language processing and Bidirectional Encoder Representations from Transformer-based language models. Experiments were repeated for 2 additional LLMs. RESULTS In an independent hold-out set of 389 notes, GPT-4-turbo had a balanced accuracy of 64.3%±4.7% out-of-the-box at baseline. This increased when asking GPT-4-turbo to provide a rationale for its answers, a structured data output, and in-context exemplars, to a balanced accuracy of 91.4%±3.8% (P<0.05). This surpassed the traditional logic-based natural language processing and BERT-based models (P<0.05). Results were consistent for GPT-3.5-turbo and Jurassic-2 LLMs. CONCLUSIONS The use of prompt engineering strategies enables LLMs to identify clinical end points from EHRs with an accuracy that surpassed natural language processing and approximated experts, yet without the need for expert knowledge. These approaches could be applied to LLM queries for other domains, to facilitate automated analysis of nuanced data sets with high accuracy by nonexperts.
Collapse
Affiliation(s)
- Ruibin Feng
- Department of Medicine, Stanford University, Stanford, CA
| | | | - Zahra Azizi
- Department of Medicine, Stanford University, Stanford, CA
| | - Jatin Goyal
- Department of Medicine, Stanford University, Stanford, CA
| | - Brototo Deb
- Department of Medicine, Stanford University, Stanford, CA
- School of Information Science, University of California, Berkeley, CA
| | - Hui Ju Chang
- Department of Medicine, Stanford University, Stanford, CA
| | | | - Paul Clopton
- Department of Medicine, Stanford University, Stanford, CA
| | - Maxime Pedron
- Department of Medicine, Stanford University, Stanford, CA
| | - Samuel Ruipérez-Campillo
- Department of Medicine, Stanford University, Stanford, CA
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
| | - Yaanik Desai
- Department of Medicine, Stanford University, Stanford, CA
| | | | - Tina Baykaner
- Department of Medicine, Stanford University, Stanford, CA
| | - Marco Perez
- Department of Medicine, Stanford University, Stanford, CA
| | - Miguel Rodrigo
- Department of Medicine, Stanford University, Stanford, CA
- CoMMLab, Universitat Politècnica de València, Valencia, Spain
| | | | - Sanjiv M. Narayan
- Department of Medicine, Stanford University, Stanford, CA
- School of Information Science, University of California, Berkeley, CA
| |
Collapse
|
7
|
Sprint G, Schmitter-Edgecombe M, Cook D. Building a Human Digital Twin (HDTwin) Using Large Language Models for Cognitive Diagnosis: Algorithm Development and Validation. JMIR Form Res 2024; 8:e63866. [PMID: 39715540 PMCID: PMC11704625 DOI: 10.2196/63866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 09/30/2024] [Accepted: 11/07/2024] [Indexed: 12/25/2024] Open
Abstract
BACKGROUND Human digital twins have the potential to change the practice of personalizing cognitive health diagnosis because these systems can integrate multiple sources of health information and influence into a unified model. Cognitive health is multifaceted, yet researchers and clinical professionals struggle to align diverse sources of information into a single model. OBJECTIVE This study aims to introduce a method called HDTwin, for unifying heterogeneous data using large language models. HDTwin is designed to predict cognitive diagnoses and offer explanations for its inferences. METHODS HDTwin integrates cognitive health data from multiple sources, including demographic, behavioral, ecological momentary assessment, n-back test, speech, and baseline experimenter testing session markers. Data are converted into text prompts for a large language model. The system then combines these inputs with relevant external knowledge from scientific literature to construct a predictive model. The model's performance is validated using data from 3 studies involving 124 participants, comparing its diagnostic accuracy with baseline machine learning classifiers. RESULTS HDTwin achieves a peak accuracy of 0.81 based on the automated selection of markers, significantly outperforming baseline classifiers. On average, HDTwin yielded accuracy=0.77, precision=0.88, recall=0.63, and Matthews correlation coefficient=0.57. In comparison, the baseline classifiers yielded average accuracy=0.65, precision=0.86, recall=0.35, and Matthews correlation coefficient=0.36. The experiments also reveal that HDTwin yields superior predictive accuracy when information sources are fused compared to single sources. HDTwin's chatbot interface provides interactive dialogues, aiding in diagnosis interpretation and allowing further exploration of patient data. CONCLUSIONS HDTwin integrates diverse cognitive health data, enhancing the accuracy and explainability of cognitive diagnoses. This approach outperforms traditional models and provides an interface for navigating patient information. The approach shows promise for improving early detection and intervention strategies in cognitive health.
Collapse
Affiliation(s)
- Gina Sprint
- Department of Computer Science, Gonzaga University, Spokane, WA, United States
| | - Maureen Schmitter-Edgecombe
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States
| | - Diane Cook
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States
| |
Collapse
|
8
|
Guan H, Novoa-Laurentiev J, Zhou L. SCD-Tron: Leveraging Large Clinical Language Model for Early Detection of Cognitive Decline from Electronic Health Records. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.10.31.24316386. [PMID: 39574862 PMCID: PMC11581067 DOI: 10.1101/2024.10.31.24316386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2024]
Abstract
Background Early detection of cognitive decline during the preclinical stage of Alzheimer's disease is crucial for timely intervention and treatment. Clinical notes, often found in unstructured electronic health records (EHRs), contain valuable information that can aid in the early identification of cognitive decline. In this study, we utilize advanced large clinical language models, fine-tuned on clinical notes, to improve the early detection of cognitive decline. Methods We collected clinical notes from 2,166 patients spanning the 4 years preceding their initial mild cognitive impairment (MCI) diagnosis from the Enterprise Data Warehouse (EDW) of Mass General Brigham (MGB). To train the model, we developed SCD-Tron, a large clinical language model on 4,949 note sections labeled by experts. For evaluation, the trained model was applied to 1,996 independent note sections to assess its performance on real-world unstructured clinical data. Additionally, we used explainable AI techniques, specifically SHAP values, to interpret the models predictions and provide insight into the most influential features. Error analysis was also facilitated to further analyze the model's prediction. Results SCD-Tron significantly outperforms baseline models, achieving notable improvements in precision, recall, and AUC metrics for detecting Subjective Cognitive Decline (SCD). Tested on many real-world clinical notes, SCD-Tron demonstrated high sensitivity with only one false negative, crucial for clinical applications prioritizing early and accurate SCD detection. SHAP-based interpretability analysis highlighted key textual features contributing to model predictions, supporting transparency and clinician understanding. Conclusion SCD-Tron offers a novel approach to early cognitive decline detection by applying large clinical language models to unstructured EHR data. Pretrained on real-world clinical notes, it accurately identifies early cognitive decline and integrates SHAP for interpretability, enhancing transparency in predictions.
Collapse
Affiliation(s)
- Hao Guan
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115, USA
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - John Novoa-Laurentiev
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115, USA
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115, USA
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
9
|
Poor FF, Dodge HH, Mahoor MH. A multimodal cross-transformer-based model to predict mild cognitive impairment using speech, language and vision. Comput Biol Med 2024; 182:109199. [PMID: 39332117 DOI: 10.1016/j.compbiomed.2024.109199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 09/02/2024] [Accepted: 09/22/2024] [Indexed: 09/29/2024]
Abstract
Mild Cognitive Impairment (MCI) is an early stage of memory loss or other cognitive ability loss in individuals who maintain the ability to independently perform most activities of daily living. It is considered a transitional stage between normal cognitive stage and more severe cognitive declines like dementia or Alzheimer's. Based on the reports from the National Institute of Aging (NIA), people with MCI are at a greater risk of developing dementia, thus it is of great importance to detect MCI at the earliest possible to mitigate the transformation of MCI to Alzheimer's and dementia. Recent studies have harnessed Artificial Intelligence (AI) to develop automated methods to predict and detect MCI. The majority of the existing research is based on unimodal data (e.g., only speech or prosody), but recent studies have shown that multimodality leads to a more accurate prediction of MCI. However, effectively exploiting different modalities is still a big challenge due to the lack of efficient fusion methods. This study proposes a robust fusion architecture utilizing an embedding-level fusion via a co-attention mechanism to leverage multimodal data for MCI prediction. This approach addresses the limitations of early and late fusion methods, which often fail to preserve inter-modal relationships. Our embedding-level fusion aims to capture complementary information across modalities, enhancing predictive accuracy. We used the I-CONECT dataset, where a large number of semi-structured conversations via internet/webcam between participants aged 75+ years old and interviewers were recorded. We introduce a multimodal speech-language-vision Deep Learning-based method to differentiate MCI from Normal Cognition (NC). Our proposed architecture includes co-attention blocks to fuse three different modalities at the embedding level to find the potential interactions between speech (audio), language (transcribed speech), and vision (facial videos) within the cross-Transformer layer. Experimental results demonstrate that our fusion method achieves an average AUC of 85.3% in detecting MCI from NC, significantly outperforming unimodal (60.9%) and bimodal (76.3%) baseline models. This superior performance highlights the effectiveness of our model in capturing and utilizing the complementary information from multiple modalities, offering a more accurate and reliable approach for MCI prediction.
Collapse
Affiliation(s)
- Farida Far Poor
- Department of Electrical and Computer Engineering, University of Denver, Denver, CO, USA.
| | - Hiroko H Dodge
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
| | - Mohammad H Mahoor
- Department of Electrical and Computer Engineering, University of Denver, Denver, CO, USA.
| |
Collapse
|
10
|
Nunes M, Boné J, Ferreira JC, Chaves P, Elvas LB. MediAlbertina: An European Portuguese medical language model. Comput Biol Med 2024; 182:109233. [PMID: 39362002 DOI: 10.1016/j.compbiomed.2024.109233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 09/28/2024] [Accepted: 09/30/2024] [Indexed: 10/05/2024]
Abstract
BACKGROUND Patient medical information often exists in unstructured text containing abbreviations and acronyms deemed essential to conserve time and space but posing challenges for automated interpretation. Leveraging the efficacy of Transformers in natural language processing, our objective was to use the knowledge acquired by a language model and continue its pre-training to develop an European Portuguese (PT-PT) healthcare-domain language model. METHODS After carrying out a filtering process, Albertina PT-PT 900M was selected as our base language model, and we continued its pre-training using more than 2.6 million electronic medical records from Portugal's largest public hospital. MediAlbertina 900M has been created through domain adaptation on this data using masked language modelling. RESULTS The comparison with our baseline was made through the usage of both perplexity, which decreased from about 20 to 1.6 values, and the fine-tuning and evaluation of information extraction models such as Named Entity Recognition and Assertion Status. MediAlbertina PT-PT outperformed Albertina PT-PT in both tasks by 4-6% on recall and f1-score. CONCLUSIONS This study contributes with the first publicly available medical language model trained with PT-PT data. It underscores the efficacy of domain adaptation and offers a contribution to the scientific community in overcoming obstacles of non-English languages. With MediAlbertina, further steps can be taken to assist physicians, in creating decision support systems or building medical timelines in order to perform profiling, by fine-tuning MediAlbertina for PT- PT medical tasks.
Collapse
Affiliation(s)
- Miguel Nunes
- ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026, Lisbon, Portugal
| | - João Boné
- Select Data, Anaheim, CA, 92807, USA
| | - João C Ferreira
- Department of Logistics, Molde University College, Molde, 6410, Norway; ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026, Lisbon, Portugal; Inov Inesc Inovação - Instituto de Novas Tecnologias, 1000-029, Lisbon, Portugal
| | | | - Luis B Elvas
- Department of Logistics, Molde University College, Molde, 6410, Norway; ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026, Lisbon, Portugal; Inov Inesc Inovação - Instituto de Novas Tecnologias, 1000-029, Lisbon, Portugal.
| |
Collapse
|
11
|
Nunes M, Bone J, Ferreira JC, Elvas LB. Health Care Language Models and Their Fine-Tuning for Information Extraction: Scoping Review. JMIR Med Inform 2024; 12:e60164. [PMID: 39432345 PMCID: PMC11535799 DOI: 10.2196/60164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 07/14/2024] [Accepted: 08/06/2024] [Indexed: 10/22/2024] Open
Abstract
BACKGROUND In response to the intricate language, specialized terminology outside everyday life, and the frequent presence of abbreviations and acronyms inherent in health care text data, domain adaptation techniques have emerged as crucial to transformer-based models. This refinement in the knowledge of the language models (LMs) allows for a better understanding of the medical textual data, which results in an improvement in medical downstream tasks, such as information extraction (IE). We have identified a gap in the literature regarding health care LMs. Therefore, this study presents a scoping literature review investigating domain adaptation methods for transformers in health care, differentiating between English and non-English languages, focusing on Portuguese. Most specifically, we investigated the development of health care LMs, with the aim of comparing Portuguese with other more developed languages to guide the path of a non-English-language with fewer resources. OBJECTIVE This study aimed to research health care IE models, regardless of language, to understand the efficacy of transformers and what are the medical entities most commonly extracted. METHODS This scoping review was conducted using the PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) methodology on Scopus and Web of Science Core Collection databases. Only studies that mentioned the creation of health care LMs or health care IE models were included, while large language models (LLMs) were excluded. The latest were not included since we wanted to research LMs and not LLMs, which are architecturally different and have distinct purposes. RESULTS Our search query retrieved 137 studies, 60 of which met the inclusion criteria, and none of them were systematic literature reviews. English and Chinese are the languages with the most health care LMs developed. These languages already have disease-specific LMs, while others only have general-health care LMs. European Portuguese does not have any public health care LM and should take examples from other languages to develop, first, general-health care LMs and then, in an advanced phase, disease-specific LMs. Regarding IE models, transformers were the most commonly used method, and named entity recognition was the most popular topic, with only a few studies mentioning Assertion Status or addressing medical lexical problems. The most extracted entities were diagnosis, posology, and symptoms. CONCLUSIONS The findings indicate that domain adaptation is beneficial, achieving better results in downstream tasks. Our analysis allowed us to understand that the use of transformers is more developed for the English and Chinese languages. European Portuguese lacks relevant studies and should draw examples from other non-English languages to develop these models and drive progress in AI. Health care professionals could benefit from highlighting medically relevant information and optimizing the reading of the textual data, or this information could be used to create patient medical timelines, allowing for profiling.
Collapse
Affiliation(s)
- Miguel Nunes
- ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), Lisbon, Portugal
| | - Joao Bone
- Select Data, Anaheim, CA, United States
| | - Joao C Ferreira
- ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), Lisbon, Portugal
- Department of Logistics, Molde, University College, Molde, Norway
- INOV Inesc Inovação, Instituto de Novas Tecnologias, Lisbon, Portugal
| | - Luis B Elvas
- ISTAR, Instituto Universitário de Lisboa (ISCTE-IUL), Lisbon, Portugal
- Department of Logistics, Molde, University College, Molde, Norway
- INOV Inesc Inovação, Instituto de Novas Tecnologias, Lisbon, Portugal
| |
Collapse
|
12
|
Bian J, Peng Y, Mendonca E, Banerjee I, Xu H. Call for papers: Special issue on biomedical multimodal large language models - novel approaches and applications. J Biomed Inform 2024; 157:104703. [PMID: 39111608 DOI: 10.1016/j.jbi.2024.104703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 07/31/2024] [Indexed: 08/10/2024]
Affiliation(s)
- Jiang Bian
- Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, FL, USA.
| | - Yifan Peng
- Population Health Sciences, Weill Cornell Medical College, New York, NY, USA.
| | - Eneida Mendonca
- Division of Biomedical Informatics, Cincinnati Children's, Cincinnati, OH, USA.
| | - Imon Banerjee
- Department of Radiology, Mayo Clinic, Scottsdale, AZ, USA.
| | - Hua Xu
- Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
13
|
Iqbal MS, Belal Bin Heyat M, Parveen S, Ammar Bin Hayat M, Roshanzamir M, Alizadehsani R, Akhtar F, Sayeed E, Hussain S, Hussein HS, Sawan M. Progress and trends in neurological disorders research based on deep learning. Comput Med Imaging Graph 2024; 116:102400. [PMID: 38851079 DOI: 10.1016/j.compmedimag.2024.102400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 05/07/2024] [Accepted: 05/13/2024] [Indexed: 06/10/2024]
Abstract
In recent years, deep learning (DL) has emerged as a powerful tool in clinical imaging, offering unprecedented opportunities for the diagnosis and treatment of neurological disorders (NDs). This comprehensive review explores the multifaceted role of DL techniques in leveraging vast datasets to advance our understanding of NDs and improve clinical outcomes. Beginning with a systematic literature review, we delve into the utilization of DL, particularly focusing on multimodal neuroimaging data analysis-a domain that has witnessed rapid progress and garnered significant scientific interest. Our study categorizes and critically analyses numerous DL models, including Convolutional Neural Networks (CNNs), LSTM-CNN, GAN, and VGG, to understand their performance across different types of Neurology Diseases. Through particular analysis, we identify key benchmarks and datasets utilized in training and testing DL models, shedding light on the challenges and opportunities in clinical neuroimaging research. Moreover, we discuss the effectiveness of DL in real-world clinical scenarios, emphasizing its potential to revolutionize ND diagnosis and therapy. By synthesizing existing literature and describing future directions, this review not only provides insights into the current state of DL applications in ND analysis but also covers the way for the development of more efficient and accessible DL techniques. Finally, our findings underscore the transformative impact of DL in reshaping the landscape of clinical neuroimaging, offering hope for enhanced patient care and groundbreaking discoveries in the field of neurology. This review paper is beneficial for neuropathologists and new researchers in this field.
Collapse
Affiliation(s)
- Muhammad Shahid Iqbal
- Department of Computer Science and Information Technology, Women University of Azad Jammu & Kashmir, Bagh, Pakistan.
| | - Md Belal Bin Heyat
- CenBRAIN Neurotech Center of Excellence, School of Engineering, Westlake University, Hangzhou, Zhejiang, China.
| | - Saba Parveen
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China.
| | | | - Mohamad Roshanzamir
- Department of Computer Engineering, Faculty of Engineering, Fasa University, Fasa, Iran.
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation, Deakin University, VIC 3216, Australia.
| | - Faijan Akhtar
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China.
| | - Eram Sayeed
- Kisan Inter College, Dhaurahara, Kushinagar, India.
| | - Sadiq Hussain
- Department of Examination, Dibrugarh University, Assam 786004, India.
| | - Hany S Hussein
- Electrical Engineering Department, Faculty of Engineering, King Khalid University, Abha 61411, Saudi Arabia; Electrical Engineering Department, Faculty of Engineering, Aswan University, Aswan 81528, Egypt.
| | - Mohamad Sawan
- CenBRAIN Neurotech Center of Excellence, School of Engineering, Westlake University, Hangzhou, Zhejiang, China.
| |
Collapse
|
14
|
Luo X, Deng Z, Yang B, Luo MY. Pre-trained language models in medicine: A survey. Artif Intell Med 2024; 154:102904. [PMID: 38917600 DOI: 10.1016/j.artmed.2024.102904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/15/2024] [Accepted: 06/03/2024] [Indexed: 06/27/2024]
Abstract
With the rapid progress in Natural Language Processing (NLP), Pre-trained Language Models (PLM) such as BERT, BioBERT, and ChatGPT have shown great potential in various medical NLP tasks. This paper surveys the cutting-edge achievements in applying PLMs to various medical NLP tasks. Specifically, we first brief PLMS and outline the research of PLMs in medicine. Next, we categorise and discuss the types of tasks in medical NLP, covering text summarisation, question-answering, machine translation, sentiment analysis, named entity recognition, information extraction, medical education, relation extraction, and text mining. For each type of task, we first provide an overview of the basic concepts, the main methodologies, the advantages of applying PLMs, the basic steps of applying PLMs application, the datasets for training and testing, and the metrics for task evaluation. Subsequently, a summary of recent important research findings is presented, analysing their motivations, strengths vs weaknesses, similarities vs differences, and discussing potential limitations. Also, we assess the quality and influence of the research reviewed in this paper by comparing the citation count of the papers reviewed and the reputation and impact of the conferences and journals where they are published. Through these indicators, we further identify the most concerned research topics currently. Finally, we look forward to future research directions, including enhancing models' reliability, explainability, and fairness, to promote the application of PLMs in clinical practice. In addition, this survey also collect some download links of some model codes and the relevant datasets, which are valuable references for researchers applying NLP techniques in medicine and medical professionals seeking to enhance their expertise and healthcare service through AI technology.
Collapse
Affiliation(s)
- Xudong Luo
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Zhiqi Deng
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Binxia Yang
- School of Computer Science and Engineering, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining, Guangxi Normal University, Guilin 541004, China; Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China.
| | - Michael Y Luo
- Emmanuel College, Cambridge University, Cambridge, CB2 3AP, UK.
| |
Collapse
|
15
|
Kim H, Park H, Kang S, Kim J, Kim J, Jung J, Taira R. Evaluating the validity of the nursing statements algorithmically generated based on the International Classifications of Nursing Practice for respiratory nursing care using large language models. J Am Med Inform Assoc 2024; 31:1397-1403. [PMID: 38630586 PMCID: PMC11105147 DOI: 10.1093/jamia/ocae070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/21/2024] [Accepted: 03/19/2024] [Indexed: 04/19/2024] Open
Abstract
OBJECTIVE This study aims to facilitate the creation of quality standardized nursing statements in South Korea's hospitals using algorithmic generation based on the International Classifications of Nursing Practice (ICNP) and evaluation through Large Language Models. MATERIALS AND METHODS We algorithmically generated 15 972 statements related to acute respiratory care using 117 concepts and concept composition models of ICNP. Human reviewers, Generative Pre-trained Transformers 4.0 (GPT-4.0), and Bio_Clinical Bidirectional Encoder Representations from Transformers (BERT) evaluated the generated statements for validity. The evaluation by GPT-4.0 and Bio_ClinicalBERT was conducted with and without contextual information and training. RESULTS Of the generated statements, 2207 were deemed valid by expert reviewers. GPT-4.0 showed a zero-shot AUC of 0.857, which aggravated with contextual information. Bio_ClinicalBERT, after training, significantly improved, reaching an AUC of 0.998. CONCLUSION Bio_ClinicalBERT effectively validates auto-generated nursing statements, offering a promising solution to enhance and streamline healthcare documentation processes.
Collapse
Affiliation(s)
- Hyeoneui Kim
- College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
- The Research Institute of Nursing Science, Seoul National University, Seoul, 03080, Republic of Korea
- Center for Human-Caring Nurse Leaders for the Future by Brain Korea 21 (BK 21) Four Project, College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
| | - Hyewon Park
- College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
- Samsung Medical Center, Seoul, 06351, Republic of Korea
| | - Sunghoon Kang
- The Department of Science Studies, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jinsol Kim
- College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
- Center for Human-Caring Nurse Leaders for the Future by Brain Korea 21 (BK 21) Four Project, College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
| | - Jeongha Kim
- College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
- Asan Medical Center, Seoul, 05505, Republic of Korea
| | - Jinsun Jung
- College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
- Center for Human-Caring Nurse Leaders for the Future by Brain Korea 21 (BK 21) Four Project, College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
| | - Ricky Taira
- The Department of Radiological Science, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, United States
| |
Collapse
|
16
|
Treder MS, Lee S, Tsvetanov KA. Introduction to Large Language Models (LLMs) for dementia care and research. FRONTIERS IN DEMENTIA 2024; 3:1385303. [PMID: 39081594 PMCID: PMC11285660 DOI: 10.3389/frdem.2024.1385303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 04/23/2024] [Indexed: 08/02/2024]
Abstract
Introduction Dementia is a progressive neurodegenerative disorder that affects cognitive abilities including memory, reasoning, and communication skills, leading to gradual decline in daily activities and social engagement. In light of the recent advent of Large Language Models (LLMs) such as ChatGPT, this paper aims to thoroughly analyse their potential applications and usefulness in dementia care and research. Method To this end, we offer an introduction into LLMs, outlining the key features, capabilities, limitations, potential risks, and practical considerations for deployment as easy-to-use software (e.g., smartphone apps). We then explore various domains related to dementia, identifying opportunities for LLMs to enhance understanding, diagnostics, and treatment, with a broader emphasis on improving patient care. For each domain, the specific contributions of LLMs are examined, such as their ability to engage users in meaningful conversations, deliver personalized support, and offer cognitive enrichment. Potential benefits encompass improved social interaction, enhanced cognitive functioning, increased emotional well-being, and reduced caregiver burden. The deployment of LLMs in caregiving frameworks also raises a number of concerns and considerations. These include privacy and safety concerns, the need for empirical validation, user-centered design, adaptation to the user's unique needs, and the integration of multimodal inputs to create more immersive and personalized experiences. Additionally, ethical guidelines and privacy protocols must be established to ensure responsible and ethical deployment of LLMs. Results We report the results on a questionnaire filled in by people with dementia (PwD) and their supporters wherein we surveyed the usefulness of different application scenarios of LLMs as well as the features that LLM-powered apps should have. Both PwD and supporters were largely positive regarding the prospect of LLMs in care, although concerns were raised regarding bias, data privacy and transparency. Discussion Overall, this review corroborates the promising utilization of LLMs to positively impact dementia care by boosting cognitive abilities, enriching social interaction, and supporting caregivers. The findings underscore the importance of further research and development in this field to fully harness the benefits of LLMs and maximize their potential for improving the lives of individuals living with dementia.
Collapse
Affiliation(s)
- Matthias S. Treder
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Sojin Lee
- Olive AI Limited, London, United Kingdom
| | - Kamen A. Tsvetanov
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, United Kingdom
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|