1
|
Tejani AS, Rauschecker AM. One System to Rule Them All? Task- and Data-specific Considerations for Automated Data Extraction. Radiol Artif Intell 2025; 7:e250175. [PMID: 40304575 DOI: 10.1148/ryai.250175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2025]
Affiliation(s)
- Ali S Tejani
- Department of Radiology, The University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390
- Center for Intelligent Imaging, Department of Radiology & Biomedical Imaging, University of California, San Francisco (UCSF), San Francisco, Calif
| | - Andreas M Rauschecker
- Center for Intelligent Imaging, Department of Radiology & Biomedical Imaging, University of California, San Francisco (UCSF), San Francisco, Calif
| |
Collapse
|
2
|
Petri J, Barbeira PB, Pesce M, Xhardez V, Laje R, Cotik V. Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study. J Biomed Inform 2025; 166:104795. [PMID: 40209919 DOI: 10.1016/j.jbi.2025.104795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Revised: 01/19/2025] [Accepted: 01/24/2025] [Indexed: 04/12/2025]
Abstract
OBJECTIVE Our study aims to enhance epidemic intelligence through event-based surveillance in an emerging pandemic context. We classified electronic health records (EHRs) from La Rioja, Argentina, focusing on predicting COVID-19-related categories in a scenario with limited disease knowledge, evolving symptoms, non-standardized coding practices, and restricted training data due to privacy issues. METHODS Using natural language processing techniques, we developed rapid, cost-effective methods suitable for implementation with limited resources. We annotated a corpus for training and testing classification models, ranging from simple logistic regression to more complex fine-tuned transformers. RESULTS The transformer-based, Spanish-adapted models BETO Clínico and RoBERTa Clínico, further pre-trained with an unannotated portion of our corpus, were the best-performing models (F1= 88.13% and 87.01%). A simple logistic regression (LR) model ranked third (F1=85.09%), outperforming more complex models like XGBoost and BiLSTM. Data classified as COVID-confirmed using LR and BETO Clínico exhibit stronger time-series Pearson correlation with official COVID-19 case counts from the National Health Surveillance System (SNVS 2.0) in La Rioja province compared to the correlations observed between the International Code of Diseases (ICD-10) codes and the SNVS 2.0 data (0.840, 0.873, and 0.663, p-values ≤3×10-7). Both models have a good Pearson correlation with ICD-10 codes assigned to the clinical notes for confirmed (0.940 and 0.902) and for suspected cases (0.960 and 0.954), p-values ≤1.7×10-18. CONCLUSION This study shows that simple, resource-efficient methods can achieve results comparable to complex approaches. BETO Clínico and LR strongly correlate with official data, revealing uncoded confirmed cases at the pandemic's onset. Our results suggest that annotating a smaller set of EHRs and training a simple model may be more cost-effective than manual coding. This points to potentially efficient strategies in public health emergencies, particularly in resource-limited settings, and provides valuable insights for future epidemic response efforts.
Collapse
Affiliation(s)
- Javier Petri
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Computación, Argentina
| | - Pilar Barcena Barbeira
- Universidad de Buenos Aires, Facultad de Medicina, Departamento de Salud Pública, Programa de Innovación Tecnológica en Salud Pública, Argentina
| | - Martina Pesce
- Universidad de Buenos Aires, Facultad de Medicina, Departamento de Salud Pública, Programa de Innovación Tecnológica en Salud Pública, Argentina
| | - Verónica Xhardez
- Proyecto ARPHAI, Centro Interdisciplinario de Estudios en Ciencia, Tecnología e Innovación, Argentina
| | - Rodrigo Laje
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Computación, Argentina; Universidad Nacional de Quilmes, Departamento de Ciencia y Tecnología, Argentina; CONICET, Argentina
| | - Viviana Cotik
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Computación, Argentina; CONICET - Universidad de Buenos Aires, Instituto de Investigación en Ciencias de la Computación (ICC), Argentina; CONICET, Argentina.
| |
Collapse
|
3
|
Clay B, Bergman HI, Salim S, Pergola G, Shalhoub J, Davies AH. Natural language processing techniques applied to the electronic health record in clinical research and practice - an introduction to methodologies. Comput Biol Med 2025; 188:109808. [PMID: 39946783 DOI: 10.1016/j.compbiomed.2025.109808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 02/03/2025] [Accepted: 02/04/2025] [Indexed: 03/05/2025]
Abstract
Natural Language Processing (NLP) has the potential to revolutionise clinical research utilising Electronic Health Records (EHR) through the automated analysis of unstructured free text. Despite this potential, relatively few applications have entered real-world clinical practice. This paper aims to introduce the whole pipeline of NLP methodologies for EHR analysis to the clinical researcher, with case studies to demonstrate the application of these methods in the existing literature. Essential pre-processing steps are introduced, followed by the two major classes of analytical frameworks: statistical methods and Artificial Neural Networks (ANNs). Case studies which apply statistical and ANN-based methods are then provided and discussed, illustrating information extraction tasks for objective and subjective information, and classification/prediction tasks using supervised and unsupervised approaches. State-of-the-art large language models and future directions for research are then discussed. This educational article aims to bridge the gap between the clinical researcher and the NLP expert, providing clinicians with a background understanding of the NLP techniques relevant to EHR analysis, allowing engagement with this rapidly evolving area of research, which is likely to have a major impact on clinical practice in coming years.
Collapse
Affiliation(s)
- Benjamin Clay
- Department of Trauma and Orthopaedic Surgery, East Suffolk and North Essex NHS Foundation Trust, Ipswich Hospital, Heath Road, Ipswich, IP4 5PD, United Kingdom; Department of Public Health and Primary Care, University of Cambridge, Forvie Site, Robinson Way, Cambridge, CB2 0SR, United Kingdom.
| | - Henry I Bergman
- Academic Section of Vascular Surgery, Department of Surgery and Cancer, Imperial College London, London, SW7 2AZ, United Kingdom.
| | - Safa Salim
- Academic Section of Vascular Surgery, Department of Surgery and Cancer, Imperial College London, London, SW7 2AZ, United Kingdom.
| | - Gabriele Pergola
- Department of Computer Science, University of Warwick, Coventry, CV4 7AL, United Kingdom.
| | - Joseph Shalhoub
- Academic Section of Vascular Surgery, Department of Surgery and Cancer, Imperial College London, London, SW7 2AZ, United Kingdom.
| | - Alun H Davies
- Academic Section of Vascular Surgery, Department of Surgery and Cancer, Imperial College London, London, SW7 2AZ, United Kingdom.
| |
Collapse
|
4
|
Golder S, O’Connor K, Lopez-Garcia G, Tatonetti N, Gonzalez-Hernandez G. LEVERAGING UNSTRUCTURED DATA IN ELECTRONIC HEALTH RECORDS TO DETECT ADVERSE EVENTS FROM PEDIATRIC DRUG USE - A SCOPING REVIEW. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.03.20.25324320. [PMID: 40166566 PMCID: PMC11957175 DOI: 10.1101/2025.03.20.25324320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Adverse drug events (ADEs) in pediatric populations pose significant public health challenges, yet research on their detection and monitoring remains limited. This scoping review evaluates the use of unstructured data from electronic health records (EHRs) to identify ADEs in children. We searched six databases, including MEDLINE, Embase and IEEE Xplore, in September 2024. From 984 records, only nine studies met our inclusion criteria, indicating a significant gap in research towards identify ADEs in children. We found that unstructured data in EHRs can indeed be of value and enhance pediatric pharmacovigilance, although its use has been so far very limited. Traditional Natural Language Processing (NLP) methods have been employed to extract ADEs, but the approaches utilized face challenges in generalizability and context interpretation. These challenges could be addressed with recent advances in transformer-based models and large language models (LLMs), unlocking the use of EHR data at scale for pediatric pharmacovigilance.
Collapse
Affiliation(s)
- Su Golder
- Department of Health Sciences, University of York, York, United Kingdom
| | - Karen O’Connor
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Guillermo Lopez-Garcia
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, West Hollywood, CA, USA
| | - Nicholas Tatonetti
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, West Hollywood, CA, USA
| | | |
Collapse
|
5
|
Alba C, Xue B, Abraham J, Kannampallil T, Lu C. The foundational capabilities of large language models in predicting postoperative risks using clinical notes. NPJ Digit Med 2025; 8:95. [PMID: 39934379 DOI: 10.1038/s41746-025-01489-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Accepted: 01/28/2025] [Indexed: 02/13/2025] Open
Abstract
Clinical notes recorded during a patient's perioperative journey holds immense informational value. Advances in large language models (LLMs) offer opportunities for bridging this gap. Using 84,875 preoperative notes and its associated surgical cases from 2018 to 2021, we examine the performance of LLMs in predicting six postoperative risks using various fine-tuning strategies. Pretrained LLMs outperformed traditional word embeddings by an absolute AUROC of 38.3% and AUPRC of 33.2%. Self-supervised fine-tuning further improved performance by 3.2% and 1.5%. Incorporating labels into training further increased AUROC by 1.8% and AUPRC by 2%. The highest performance was achieved with a unified foundation model, with improvements of 3.6% for AUROC and 2.6% for AUPRC compared to self-supervision, highlighting the foundational capabilities of LLMs in predicting postoperative risks, which could be potentially beneficial when deployed for perioperative care.
Collapse
Affiliation(s)
- Charles Alba
- AI for Health Institute, Washington University in St. Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
- McKelvey School of Engineering, Washington University in St Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
- Brown School, Washington University in St Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
| | - Bing Xue
- AI for Health Institute, Washington University in St. Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
- McKelvey School of Engineering, Washington University in St Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
| | - Joanna Abraham
- AI for Health Institute, Washington University in St. Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
- School of Medicine, Washington University in St Louis, 660 S Euclid Ave, St. Louis, 63110, MO, USA
- Institute for Informatics, Data Science, and Biostatistics, Washington University in St Louis, 660 S Euclid Ave, St. Louis, 63110, MO, USA
| | - Thomas Kannampallil
- AI for Health Institute, Washington University in St. Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
- McKelvey School of Engineering, Washington University in St Louis, 1 Brookings Drive, St Louis, 63130, MO, USA
- School of Medicine, Washington University in St Louis, 660 S Euclid Ave, St. Louis, 63110, MO, USA
- Institute for Informatics, Data Science, and Biostatistics, Washington University in St Louis, 660 S Euclid Ave, St. Louis, 63110, MO, USA
| | - Chenyang Lu
- AI for Health Institute, Washington University in St. Louis, 1 Brookings Drive, St Louis, 63130, MO, USA.
- McKelvey School of Engineering, Washington University in St Louis, 1 Brookings Drive, St Louis, 63130, MO, USA.
- School of Medicine, Washington University in St Louis, 660 S Euclid Ave, St. Louis, 63110, MO, USA.
| |
Collapse
|
6
|
Lopez I, Swaminathan A, Vedula K, Narayanan S, Nateghi Haredasht F, Ma SP, Liang AS, Tate S, Maddali M, Gallo RJ, Shah NH, Chen JH. Clinical entity augmented retrieval for clinical information extraction. NPJ Digit Med 2025; 8:45. [PMID: 39828800 PMCID: PMC11743751 DOI: 10.1038/s41746-024-01377-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 12/08/2024] [Indexed: 01/22/2025] Open
Abstract
Large language models (LLMs) with retrieval-augmented generation (RAG) have improved information extraction over previous methods, yet their reliance on embeddings often leads to inefficient retrieval. We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves information using entities. We compared CLEAR to embedding RAG and full-note approaches for extracting 18 variables using six LLMs across 20,000 clinical notes. Average F1 scores were 0.90, 0.86, and 0.79; inference times were 4.95, 17.41, and 20.08 s per note; average model queries were 1.68, 4.94, and 4.18 per note; and average input tokens were 1.1k, 3.8k, and 6.1k per note for CLEAR, embedding RAG, and full-note approaches, respectively. In conclusion, CLEAR utilizes clinical entities for information retrieval and achieves >70% reduction in token usage and inference time with improved performance compared to modern methods.
Collapse
Affiliation(s)
- Ivan Lopez
- Stanford University School of Medicine, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford, CA, USA.
| | - Akshay Swaminathan
- Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford, CA, USA
| | | | - Sanjana Narayanan
- Stanford Center for Biomedical Informatics Research, Stanford, CA, USA
| | | | - Stephen P Ma
- Division of Hospital Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - April S Liang
- Division of Clinical Informatics, Stanford University School of Medicine, Stanford, CA, USA
| | - Steven Tate
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
| | - Manoj Maddali
- Department of Biomedical Data Science, Stanford, CA, USA
- Division of Pulmonary, Allergy, and Critical Care Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Robert Joseph Gallo
- Center for Innovation to Implementation, VA Palo Alto Healthcare System, Menlo Park, CA, USA
- Department of Health Policy, Stanford University, Stanford, CA, USA
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford, CA, USA
- Technology and Digital Solutions, Stanford Healthcare, Palo Alto, USA
- Clinical Excellence Research Center, Stanford University School of Medicine, Stanford, CA, USA
| | - Jonathan H Chen
- Department of Biomedical Data Science, Stanford, CA, USA
- Stanford Center for Biomedical Informatics Research, Stanford, CA, USA
- Division of Hospital Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Clinical Excellence Research Center, Stanford University School of Medicine, Stanford, CA, USA
- Department of Medicine, Stanford, CA, USA
| |
Collapse
|
7
|
Basubrin O. Current Status and Future of Artificial Intelligence in Medicine. Cureus 2025; 17:e77561. [PMID: 39958114 PMCID: PMC11830112 DOI: 10.7759/cureus.77561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/30/2024] [Indexed: 02/18/2025] Open
Abstract
Artificial intelligence (AI) has rapidly emerged as a transformative force in medicine, revolutionizing various aspects of healthcare from diagnostics and treatment to public health and patient care. This narrative review synthesizes evidence from diverse study designs, exploring the current and future applications of AI in medicine. We highlight AI's role in improving diagnostic accuracy, optimizing treatment strategies, and enhancing patient care through personalized interventions and remote monitoring, drawing upon recent advancements and landmark studies. Emerging trends such as explainable AI and federated learning are also examined. While acknowledging the tremendous potential of AI in medicine, the review also addresses the barriers and ethical challenges that need to be overcome, including concerns about algorithmic bias, transparency, over-reliance, and the potential impact on the healthcare workforce. We emphasize the importance of establishing regulatory guidelines, fostering collaboration between clinicians and AI developers, and ensuring ongoing education for healthcare professionals. Despite these challenges, the future of AI in medicine holds immense promise, with the potential to significantly improve patient outcomes, transform healthcare delivery, and address healthcare disparities.
Collapse
Affiliation(s)
- Omar Basubrin
- Department of Medicine, Umm Al-Qura University, Makkah, SAU
| |
Collapse
|
8
|
Herman Bernardim Andrade G, Nishiyama T, Fujimaki T, Yada S, Wakamiya S, Takagi M, Kato M, Miyashiro I, Aramaki E. Assessing domain adaptation in adverse drug event extraction on real-world breast cancer records. Int J Med Inform 2024; 191:105539. [PMID: 39084086 DOI: 10.1016/j.ijmedinf.2024.105539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 06/21/2024] [Accepted: 07/01/2024] [Indexed: 08/02/2024]
Abstract
BACKGROUND Adverse Drug Events (ADE) are key information present in unstructured portions of Electronic Health Records. These pose a significant challenge in healthcare, ranging from mild discomfort to severe complications, and can impact patient safety and treatment outcomes. METHODS We explore the influence of domain shift between a set of dummy clinical notes and a real-world hospital corpus of Japanese clinical notes of breast cancer treatment when extracting ADEs from free text. We annotated a subset of the hospital dataset and used it to fine-tune a Named Entity Recognition (NER) model, initially trained with the set of dummy documents. We used increasing amounts of the annotated data and evaluated the impact on the model's performance. Additionally, we examined the extracted information to identify combinations of drugs that are likely to cause ADEs. RESULTS We show that domain adaptation can significantly improve model performance in the new domain, as by feeding a small subset of 100 documents for the fine-tuning process we saw a 40% improvement in model performance. However, we also noticed diminishing returns when fine-tuning the model with a larger dataset. For instance, by feeding eight times more data, we only saw further 18% improvement in extraction performance. CONCLUSION While variations in writing style and vocabulary in clinical corpora can significantly impact the quality of NER results. We show that domain adaptation can be of great aid in mitigating these discrepancies and achieving better performance. Yet, while providing in-domain data to a model helps, there are diminishing returns when fine-tuning with large amounts of data.
Collapse
Affiliation(s)
| | - Tomohiro Nishiyama
- Department of Information Science, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, 630-0101, Nara, Japan
| | - Takako Fujimaki
- Department of Information Science, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, 630-0101, Nara, Japan
| | - Shuntaro Yada
- Department of Information Science, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, 630-0101, Nara, Japan
| | - Shoko Wakamiya
- Department of Information Science, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, 630-0101, Nara, Japan
| | - Mari Takagi
- Department of Pharmacy, Osaka International Cancer Institute, 3-1-69 Otemae, Chuo-ku, 541-8567, Osaka, Japan
| | - Mizuki Kato
- Cancer Control Center, Osaka International Cancer Institute, 3-1-69 Otemae, Chuo-ku, 541-8567, Osaka, Japan
| | - Isao Miyashiro
- Cancer Control Center, Osaka International Cancer Institute, 3-1-69 Otemae, Chuo-ku, 541-8567, Osaka, Japan
| | - Eiji Aramaki
- Department of Information Science, Nara Institute of Science and Technology, 8916-5 Takayamacho, Ikoma, 630-0101, Nara, Japan.
| |
Collapse
|
9
|
Kim K, Park S, Min J, Park S, Kim JY, Eun J, Jung K, Park YE, Kim E, Lee EY, Lee J, Choi J. Multifaceted Natural Language Processing Task-Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation. JMIR Med Inform 2024; 12:e52897. [PMID: 39475725 PMCID: PMC11539635 DOI: 10.2196/52897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 07/08/2024] [Accepted: 08/17/2024] [Indexed: 11/08/2024] Open
Abstract
Background The bidirectional encoder representations from transformers (BERT) model has attracted considerable attention in clinical applications, such as patient classification and disease prediction. However, current studies have typically progressed to application development without a thorough assessment of the model's comprehension of clinical context. Furthermore, limited comparative studies have been conducted on BERT models using medical documents from non-English-speaking countries. Therefore, the applicability of BERT models trained on English clinical notes to non-English contexts is yet to be confirmed. To address these gaps in literature, this study focused on identifying the most effective BERT model for non-English clinical notes. Objective In this study, we evaluated the contextual understanding abilities of various BERT models applied to mixed Korean and English clinical notes. The objective of this study was to identify the BERT model that excels in understanding the context of such documents. Methods Using data from 164,460 patients in a South Korean tertiary hospital, we pretrained BERT-base, BERT for Biomedical Text Mining (BioBERT), Korean BERT (KoBERT), and Multilingual BERT (M-BERT) to improve their contextual comprehension capabilities and subsequently compared their performances in 7 fine-tuning tasks. Results The model performance varied based on the task and token usage. First, BERT-base and BioBERT excelled in tasks using classification ([CLS]) token embeddings, such as document classification. BioBERT achieved the highest F1-score of 89.32. Both BERT-base and BioBERT demonstrated their effectiveness in document pattern recognition, even with limited Korean tokens in the dictionary. Second, M-BERT exhibited a superior performance in reading comprehension tasks, achieving an F1-score of 93.77. Better results were obtained when fewer words were replaced with unknown ([UNK]) tokens. Third, M-BERT excelled in the knowledge inference task in which correct disease names were inferred from 63 candidate disease names in a document with disease names replaced with [MASK] tokens. M-BERT achieved the highest hit@10 score of 95.41. Conclusions This study highlighted the effectiveness of various BERT models in a multilingual clinical domain. The findings can be used as a reference in clinical and language-based applications.
Collapse
Affiliation(s)
- Kyungmo Kim
- Interdisciplinary Program for Bioengineering, Seoul National University, Seoul, Republic of Korea
| | - Seongkeun Park
- Seoul National University Medical Research Center, Seoul, Republic of Korea
| | - Jeongwon Min
- Interdisciplinary Program for Bioengineering, Seoul National University, Seoul, Republic of Korea
| | - Sumin Park
- Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University, Seoul, Republic of Korea
| | - Ju Yeon Kim
- Division of Rheumatology, Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Jinsu Eun
- Human Computer Interaction and Design Lab, Seoul National University, Seoul, Republic of Korea
| | - Kyuha Jung
- Human Computer Interaction and Design Lab, Seoul National University, Seoul, Republic of Korea
| | - Yoobin Elyson Park
- Human Computer Interaction and Design Lab, Seoul National University, Seoul, Republic of Korea
| | - Esther Kim
- Human Computer Interaction and Design Lab, Seoul National University, Seoul, Republic of Korea
| | - Eun Young Lee
- Division of Rheumatology, Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Joonhwan Lee
- Human Computer Interaction and Design Lab, Seoul National University, Seoul, Republic of Korea
| | - Jinwook Choi
- Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University, Seoul, Republic of Korea
- Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea, 82 2-766-3421
| |
Collapse
|
10
|
Sushil M, Zack T, Mandair D, Zheng Z, Wali A, Yu YN, Quan Y, Lituiev D, Butte AJ. A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports. J Am Med Inform Assoc 2024; 31:2315-2327. [PMID: 38900207 DOI: 10.1093/jamia/ocae146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/27/2024] [Accepted: 06/03/2024] [Indexed: 06/21/2024] Open
Abstract
OBJECTIVE Although supervised machine learning is popular for information extraction from clinical notes, creating large annotated datasets requires extensive domain expertise and is time-consuming. Meanwhile, large language models (LLMs) have demonstrated promising transfer learning capability. In this study, we explored whether recent LLMs could reduce the need for large-scale data annotations. MATERIALS AND METHODS We curated a dataset of 769 breast cancer pathology reports, manually labeled with 12 categories, to compare zero-shot classification capability of the following LLMs: GPT-4, GPT-3.5, Starling, and ClinicalCamel, with task-specific supervised classification performance of 3 models: random forests, long short-term memory networks with attention (LSTM-Att), and the UCSF-BERT model. RESULTS Across all 12 tasks, the GPT-4 model performed either significantly better than or as well as the best supervised model, LSTM-Att (average macro F1-score of 0.86 vs 0.75), with advantage on tasks with high label imbalance. Other LLMs demonstrated poor performance. Frequent GPT-4 error categories included incorrect inferences from multiple samples and from history, and complex task design, and several LSTM-Att errors were related to poor generalization to the test set. DISCUSSION On tasks where large annotated datasets cannot be easily collected, LLMs can reduce the burden of data labeling. However, if the use of LLMs is prohibitive, the use of simpler models with large annotated datasets can provide comparable results. CONCLUSIONS GPT-4 demonstrated the potential to speed up the execution of clinical NLP studies by reducing the need for large annotated datasets. This may increase the utilization of NLP-based variables and outcomes in clinical studies.
Collapse
Affiliation(s)
- Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Travis Zack
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Divneet Mandair
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Zhiwei Zheng
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Ahmed Wali
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Yan-Ning Yu
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Yuwei Quan
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Dmytro Lituiev
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94158, United States
- Center for Data-driven Insights and Innovation, University of California, Office of the President, Oakland, CA 94607, United States
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94158, United States
| |
Collapse
|
11
|
Liu Y, Wang H, Zhou H, Li M, Hou Y, Zhou S, Wang F, Hoetzlein R, Zhang R. A review of reinforcement learning for natural language processing and applications in healthcare. J Am Med Inform Assoc 2024; 31:2379-2393. [PMID: 39208319 PMCID: PMC11413430 DOI: 10.1093/jamia/ocae215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 07/01/2024] [Accepted: 08/01/2024] [Indexed: 09/04/2024] Open
Abstract
IMPORTANCE Reinforcement learning (RL) represents a pivotal avenue within natural language processing (NLP), offering a potent mechanism for acquiring optimal strategies in task completion. This literature review studies various NLP applications where RL has demonstrated efficacy, with notable applications in healthcare settings. OBJECTIVES To systematically explore the applications of RL in NLP, focusing on its effectiveness in acquiring optimal strategies, particularly in healthcare settings, and provide a comprehensive understanding of RL's potential in NLP tasks. MATERIALS AND METHODS Adhering to the PRISMA guidelines, an exhaustive literature review was conducted to identify instances where RL has exhibited success in NLP applications, encompassing dialogue systems, machine translation, question-answering, text summarization, and information extraction. Our methodological approach involves closely examining the technical aspects of RL methodologies employed in these applications, analyzing algorithms, states, rewards, actions, datasets, and encoder-decoder architectures. RESULTS The review of 93 papers yields insights into RL algorithms, prevalent techniques, emergent trends, and the fusion of RL methods in NLP healthcare applications. It clarifies the strategic approaches employed, datasets utilized, and the dynamic terrain of RL-NLP systems, thereby offering a roadmap for research and development in RL and machine learning techniques in healthcare. The review also addresses ethical concerns to ensure equity, transparency, and accountability in the evolution and application of RL-based NLP technologies, particularly within sensitive domains such as healthcare. DISCUSSION The findings underscore the promising role of RL in advancing NLP applications, particularly in healthcare, where its potential to optimize decision-making and enhance patient outcomes is significant. However, the ethical challenges and technical complexities associated with RL demand careful consideration and ongoing research to ensure responsible and effective implementation. CONCLUSIONS By systematically exploring RL's applications in NLP and providing insights into technical analysis, ethical implications, and potential advancements, this review contributes to a deeper understanding of RL's role for language processing.
Collapse
Affiliation(s)
- Ying Liu
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Haozhu Wang
- Amazon Web Service, Seattle, WA 98109, United States
| | - Huixue Zhou
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Mingchen Li
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Yu Hou
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Sicheng Zhou
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Fang Wang
- Amazon Web Service, Seattle, WA 98109, United States
| | | | - Rui Zhang
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| |
Collapse
|
12
|
Pham TD, Teh MT, Chatzopoulou D, Holmes S, Coulthard P. Artificial Intelligence in Head and Neck Cancer: Innovations, Applications, and Future Directions. Curr Oncol 2024; 31:5255-5290. [PMID: 39330017 PMCID: PMC11430806 DOI: 10.3390/curroncol31090389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 09/01/2024] [Accepted: 09/03/2024] [Indexed: 09/28/2024] Open
Abstract
Artificial intelligence (AI) is revolutionizing head and neck cancer (HNC) care by providing innovative tools that enhance diagnostic accuracy and personalize treatment strategies. This review highlights the advancements in AI technologies, including deep learning and natural language processing, and their applications in HNC. The integration of AI with imaging techniques, genomics, and electronic health records is explored, emphasizing its role in early detection, biomarker discovery, and treatment planning. Despite noticeable progress, challenges such as data quality, algorithmic bias, and the need for interdisciplinary collaboration remain. Emerging innovations like explainable AI, AI-powered robotics, and real-time monitoring systems are poised to further advance the field. Addressing these challenges and fostering collaboration among AI experts, clinicians, and researchers is crucial for developing equitable and effective AI applications. The future of AI in HNC holds significant promise, offering potential breakthroughs in diagnostics, personalized therapies, and improved patient outcomes.
Collapse
Affiliation(s)
- Tuan D. Pham
- Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Turner Street, London E1 2AD, UK; (M.-T.T.); (D.C.); (S.H.); (P.C.)
| | | | | | | | | |
Collapse
|
13
|
Klug K, Beckh K, Antweiler D, Chakraborty N, Baldini G, Laue K, Hosch R, Nensa F, Schuler M, Giesselbach S. From admission to discharge: a systematic review of clinical natural language processing along the patient journey. BMC Med Inform Decis Mak 2024; 24:238. [PMID: 39210370 PMCID: PMC11360876 DOI: 10.1186/s12911-024-02641-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 08/20/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND Medical text, as part of an electronic health record, is an essential information source in healthcare. Although natural language processing (NLP) techniques for medical text are developing fast, successful transfer into clinical practice has been rare. Especially the hospital domain offers great potential while facing several challenges including many documents per patient, multiple departments and complex interrelated processes. METHODS In this work, we survey relevant literature to identify and classify approaches which exploit NLP in the clinical context. Our contribution involves a systematic mapping of related research onto a prototypical patient journey in the hospital, along which medical documents are created, processed and consumed by hospital staff and patients themselves. Specifically, we reviewed which dataset types, dataset languages, model architectures and tasks are researched in current clinical NLP research. Additionally, we extract and analyze major obstacles during development and implementation. We discuss options to address them and argue for a focus on bias mitigation and model explainability. RESULTS While a patient's hospital journey produces a significant amount of structured and unstructured documents, certain steps and documents receive more research attention than others. Diagnosis, Admission and Discharge are clinical patient steps that are researched often across the surveyed paper. In contrast, our findings reveal significant under-researched areas such as Treatment, Billing, After Care, and Smart Home. Leveraging NLP in these stages can greatly enhance clinical decision-making and patient outcomes. Additionally, clinical NLP models are mostly based on radiology reports, discharge letters and admission notes, even though we have shown that many other documents are produced throughout the patient journey. There is a significant opportunity in analyzing a wider range of medical documents produced throughout the patient journey to improve the applicability and impact of NLP in healthcare. CONCLUSIONS Our findings suggest that there is a significant opportunity to leverage NLP approaches to advance clinical decision-making systems, as there remains a considerable understudied potential for the analysis of patient journey data.
Collapse
Grants
- 5-2011-0041/2 Ministry for Economic Affairs, Industry, Climate Action and Energy of the State of North-Rhine-Westphalia, Germany
- 5-2011-0041/2 Ministry for Economic Affairs, Industry, Climate Action and Energy of the State of North-Rhine-Westphalia, Germany
- 5-2011-0041/2 Ministry for Economic Affairs, Industry, Climate Action and Energy of the State of North-Rhine-Westphalia, Germany
- 5-2011-0041/2 Ministry for Economic Affairs, Industry, Climate Action and Energy of the State of North-Rhine-Westphalia, Germany
- 5-2011-0041/2 Ministry for Economic Affairs, Industry, Climate Action and Energy of the State of North-Rhine-Westphalia, Germany
- 5-2011-0041/2 Ministry for Economic Affairs, Industry, Climate Action and Energy of the State of North-Rhine-Westphalia, Germany
- 5-2011-0041/2 Ministry for Economic Affairs, Industry, Climate Action and Energy of the State of North-Rhine-Westphalia, Germany
- 5-2011-0041/2 Ministry for Economic Affairs, Industry, Climate Action and Energy of the State of North-Rhine-Westphalia, Germany
- 5-2011-0041/2 Ministry for Economic Affairs, Industry, Climate Action and Energy of the State of North-Rhine-Westphalia, Germany
- 5-2011-0041/2 Ministry for Economic Affairs, Industry, Climate Action and Energy of the State of North-Rhine-Westphalia, Germany
- Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS (1050)
Collapse
Affiliation(s)
| | | | | | | | - Giulia Baldini
- Institute of Interventional and Diagnostic Radiology and Neuroradiology, University Hospital Essen, Essen, Germany
- Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany
| | - Katharina Laue
- West German Cancer Centre, University Hospital Essen, Essen, Germany
| | - René Hosch
- Institute of Interventional and Diagnostic Radiology and Neuroradiology, University Hospital Essen, Essen, Germany
- Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany
| | - Felix Nensa
- Institute of Interventional and Diagnostic Radiology and Neuroradiology, University Hospital Essen, Essen, Germany
- Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany
| | - Martin Schuler
- West German Cancer Centre, University Hospital Essen, Essen, Germany
| | | |
Collapse
|
14
|
Powell N, Blank M, Luintel A, Elkhalifa S, Bhogal R, Wilcock M, Wakefield M, Sandoe J. Narrative review of recent developments and the future of penicillin allergy de-labelling by non-allergists. NPJ ANTIMICROBIALS AND RESISTANCE 2024; 2:18. [PMID: 39843524 PMCID: PMC11721385 DOI: 10.1038/s44259-024-00035-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 06/11/2024] [Indexed: 01/24/2025]
Abstract
This article outlines recent developments in non-allergist delivered penicillin allergy de-labelling (PADL), discusses remaining controversies and uncertainties and explores the future for non-allergist delivered PADL. Recent developments include national guidelines for non-allergist delivered PADL and validation of penicillin allergy risk assessment tools. Controversies remain on which penicillin allergy features are low risk of genuine allergy. In the future genetic or immunological tests may facilitate PADL.
Collapse
Affiliation(s)
- Neil Powell
- Pharmacy Department, Royal Cornwall Hospital Trust, Truro, Cornwall, UK.
| | | | - Akish Luintel
- Institute of Global Health Innovation, Imperial College London, London, UK
| | - Shuayb Elkhalifa
- Centre for Musculoskeletal Research, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK
- Allergy and Immunology Department, Respiratory Institute, Cleveland Clinic Abu Dhabi, Abu Dhabi, United Arab Emirates
| | - Rashmeet Bhogal
- The School of Pharmacy and Institute of Clinical Sciences, University of Birmingham, Birmingham, UK
| | - Michael Wilcock
- Pharmacy Department, Royal Cornwall Hospital Trust, Truro, Cornwall, UK
| | - Michael Wakefield
- Respiratory Department, Harrogate and District NHS Foundation Trust, Harrogate, UK
| | - Jonathan Sandoe
- Healthcare associated infection group, Leeds institute of medical research, university of Leeds, Leeds, UK
- Department of Microbiology, Leeds Teaching Hospitals NHS Trust, Leeds, UK
| |
Collapse
|
15
|
Osman M, Cooper R, Sayer AA, Witham MD. The use of natural language processing for the identification of ageing syndromes including sarcopenia, frailty and falls in electronic healthcare records: a systematic review. Age Ageing 2024; 53:afae135. [PMID: 38970549 PMCID: PMC11227113 DOI: 10.1093/ageing/afae135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Indexed: 07/08/2024] Open
Abstract
BACKGROUND Recording and coding of ageing syndromes in hospital records is known to be suboptimal. Natural Language Processing algorithms may be useful to identify diagnoses in electronic healthcare records to improve the recording and coding of these ageing syndromes, but the feasibility and diagnostic accuracy of such algorithms are unclear. METHODS We conducted a systematic review according to a predefined protocol and in line with Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines. Searches were run from the inception of each database to the end of September 2023 in PubMed, Medline, Embase, CINAHL, ACM digital library, IEEE Xplore and Scopus. Eligible studies were identified via independent review of search results by two coauthors and data extracted from each study to identify the computational method, source of text, testing strategy and performance metrics. Data were synthesised narratively by ageing syndrome and computational method in line with the Studies Without Meta-analysis guidelines. RESULTS From 1030 titles screened, 22 studies were eligible for inclusion. One study focussed on identifying sarcopenia, one frailty, twelve falls, five delirium, five dementia and four incontinence. Sensitivity (57.1%-100%) of algorithms compared with a reference standard was reported in 20 studies, and specificity (84.0%-100%) was reported in only 12 studies. Study design quality was variable with results relevant to diagnostic accuracy not always reported, and few studies undertaking external validation of algorithms. CONCLUSIONS Current evidence suggests that Natural Language Processing algorithms can identify ageing syndromes in electronic health records. However, algorithms require testing in rigorously designed diagnostic accuracy studies with appropriate metrics reported.
Collapse
Affiliation(s)
- Mo Osman
- AGE Research Group, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
- NIHR Newcastle Biomedical Research Centre, Newcastle upon Tyne NHS Foundation Trust, Cumbria Northumberland Tyne and Wear NHS Foundation Trust and Newcastle University, Newcastle upon Tyne, UK
| | - Rachel Cooper
- AGE Research Group, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
- NIHR Newcastle Biomedical Research Centre, Newcastle upon Tyne NHS Foundation Trust, Cumbria Northumberland Tyne and Wear NHS Foundation Trust and Newcastle University, Newcastle upon Tyne, UK
| | - Avan A Sayer
- AGE Research Group, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
- NIHR Newcastle Biomedical Research Centre, Newcastle upon Tyne NHS Foundation Trust, Cumbria Northumberland Tyne and Wear NHS Foundation Trust and Newcastle University, Newcastle upon Tyne, UK
| | - Miles D Witham
- AGE Research Group, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
- NIHR Newcastle Biomedical Research Centre, Newcastle upon Tyne NHS Foundation Trust, Cumbria Northumberland Tyne and Wear NHS Foundation Trust and Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
16
|
Huang J, Yang DM, Rong R, Nezafati K, Treager C, Chi Z, Wang S, Cheng X, Guo Y, Klesse LJ, Xiao G, Peterson ED, Zhan X, Xie Y. A critical assessment of using ChatGPT for extracting structured data from clinical notes. NPJ Digit Med 2024; 7:106. [PMID: 38693429 PMCID: PMC11063058 DOI: 10.1038/s41746-024-01079-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 03/14/2024] [Indexed: 05/03/2024] Open
Abstract
Existing natural language processing (NLP) methods to convert free-text clinical notes into structured data often require problem-specific annotations and model training. This study aims to evaluate ChatGPT's capacity to extract information from free-text medical notes efficiently and comprehensively. We developed a large language model (LLM)-based workflow, utilizing systems engineering methodology and spiral "prompt engineering" process, leveraging OpenAI's API for batch querying ChatGPT. We evaluated the effectiveness of this method using a dataset of more than 1000 lung cancer pathology reports and a dataset of 191 pediatric osteosarcoma pathology reports, comparing the ChatGPT-3.5 (gpt-3.5-turbo-16k) outputs with expert-curated structured data. ChatGPT-3.5 demonstrated the ability to extract pathological classifications with an overall accuracy of 89%, in lung cancer dataset, outperforming the performance of two traditional NLP methods. The performance is influenced by the design of the instructive prompt. Our case analysis shows that most misclassifications were due to the lack of highly specialized pathology terminology, and erroneous interpretation of TNM staging rules. Reproducibility shows the relatively stable performance of ChatGPT-3.5 over time. In pediatric osteosarcoma dataset, ChatGPT-3.5 accurately classified both grades and margin status with accuracy of 98.6% and 100% respectively. Our study shows the feasibility of using ChatGPT to process large volumes of clinical notes for structured information extraction without requiring extensive task-specific human annotation and model training. The results underscore the potential role of LLMs in transforming unstructured healthcare data into structured formats, thereby supporting research and aiding clinical decision-making.
Collapse
Affiliation(s)
- Jingwei Huang
- Quantitative Biomedical Research Center, Peter O'Donnell School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA
| | - Donghan M Yang
- Quantitative Biomedical Research Center, Peter O'Donnell School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA
| | - Ruichen Rong
- Quantitative Biomedical Research Center, Peter O'Donnell School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA
| | - Kuroush Nezafati
- Quantitative Biomedical Research Center, Peter O'Donnell School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA
| | - Colin Treager
- Quantitative Biomedical Research Center, Peter O'Donnell School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA
| | - Zhikai Chi
- Department of Pathology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA
| | - Shidan Wang
- Quantitative Biomedical Research Center, Peter O'Donnell School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA
| | - Xian Cheng
- Quantitative Biomedical Research Center, Peter O'Donnell School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA
| | - Yujia Guo
- Quantitative Biomedical Research Center, Peter O'Donnell School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA
| | - Laura J Klesse
- Department of Pediatrics, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA
| | - Guanghua Xiao
- Quantitative Biomedical Research Center, Peter O'Donnell School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA
| | - Eric D Peterson
- Department of Internal Medicine, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA
| | - Xiaowei Zhan
- Quantitative Biomedical Research Center, Peter O'Donnell School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA.
| | - Yang Xie
- Quantitative Biomedical Research Center, Peter O'Donnell School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA.
| |
Collapse
|
17
|
Sushil M, Zack T, Mandair D, Zheng Z, Wali A, Yu YN, Quan Y, Butte AJ. A comparative study of zero-shot inference with large language models and supervised modeling in breast cancer pathology classification. RESEARCH SQUARE 2024:rs.3.rs-3914899. [PMID: 38405831 PMCID: PMC10889046 DOI: 10.21203/rs.3.rs-3914899/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Although supervised machine learning is popular for information extraction from clinical notes, creating large, annotated datasets requires extensive domain expertise and is time-consuming. Meanwhile, large language models (LLMs) have demonstrated promising transfer learning capability. In this study, we explored whether recent LLMs can reduce the need for large-scale data annotations. We curated a manually labeled dataset of 769 breast cancer pathology reports, labeled with 13 categories, to compare zero-shot classification capability of the GPT-4 model and the GPT-3.5 model with supervised classification performance of three model architectures: random forests classifier, long short-term memory networks with attention (LSTM-Att), and the UCSF-BERT model. Across all 13 tasks, the GPT-4 model performed either significantly better than or as well as the best supervised model, the LSTM-Att model (average macro F1 score of 0.83 vs. 0.75). On tasks with a high imbalance between labels, the differences were more prominent. Frequent sources of GPT-4 errors included inferences from multiple samples and complex task design. On complex tasks where large annotated datasets cannot be easily collected, LLMs can reduce the burden of large-scale data labeling. However, if the use of LLMs is prohibitive, the use of simpler supervised models with large annotated datasets can provide comparable results. LLMs demonstrated the potential to speed up the execution of clinical NLP studies by reducing the need for curating large annotated datasets. This may increase the utilization of NLP-based variables and outcomes in observational clinical studies.
Collapse
Affiliation(s)
- Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, USA
| | - Travis Zack
- Bakar Computational Health Sciences Institute, University of California, San Francisco, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, USA
| | - Divneet Mandair
- Bakar Computational Health Sciences Institute, University of California, San Francisco, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, USA
| | | | | | | | | | - Atul J. Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, USA
- Center for Data-driven Insights and Innovation, University of California, Office of the President, Oakland, CA, USA
- Department of Pediatrics, University of California, San Francisco, CA, USA
| |
Collapse
|
18
|
Liu J, Ito S, Ngo TM, Lawate A, Ong QC, Fox TE, Chang SY, Phung D, Nair E, Palaiyan M, Joty S, Abisheganaden J, Lee CP, Lwin MO, Theng YL, Ho MHR, Chia M, Bojic I, Car J. A pilot randomised controlled trial exploring the feasibility and efficacy of a human-AI sleep coaching model for improving sleep among university students. Digit Health 2024; 10:20552076241241244. [PMID: 38638406 PMCID: PMC11025445 DOI: 10.1177/20552076241241244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/05/2024] [Indexed: 04/20/2024] Open
Abstract
Objective Sleep quality is a crucial concern, particularly among youth. The integration of health coaching with question-answering (QA) systems presents the potential to foster behavioural changes and enhance health outcomes. This study proposes a novel human-AI sleep coaching model, combining health coaching by peers and a QA system, and assesses its feasibility and efficacy in improving university students' sleep quality. Methods In a four-week unblinded pilot randomised controlled trial, 59 university students (mean age: 21.9; 64% males) were randomly assigned to the intervention (health coaching and QA system; n = 30) or the control conditions (QA system; n = 29). Outcomes included efficacy of the intervention on sleep quality (Pittsburgh Sleep Quality Index; PSQI), objective and self-reported sleep measures (obtained from Fitbit and sleep diaries) and feasibility of the study procedures and the intervention. Results Analysis revealed no significant differences in sleep quality (PSQI) between intervention and control groups (adjusted mean difference = -0.51, 95% CI: [-1.55-0.77], p = 0.40). The intervention group demonstrated significant improvements in Fitbit measures of total sleep time (adjusted mean difference = 32.5, 95% CI: [5.9-59.1], p = 0.02) and time in bed (adjusted mean difference = 32.3, 95% CI: [2.7-61.9], p = 0.03) compared to the control group, although other sleep measures were insignificant. Adherence was high, with the majority of the intervention group attending all health coaching sessions. Most participants completed baseline and post-intervention self-report measures, all diary entries, and consistently wore Fitbits during sleep. Conclusions The proposed model showed improvements in specific sleep measures for university students and the feasibility of the study procedures and intervention. Future research may extend the intervention period to see substantive sleep quality improvements.
Collapse
Affiliation(s)
- Jintana Liu
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Sakura Ito
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Tra My Ngo
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Ashwini Lawate
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Qi Chwen Ong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Tatiana Erlikh Fox
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Si Yuan Chang
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Duy Phung
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | | | - Malar Palaiyan
- University Counselling Centre, Nanyang Technological University, Singapore, Singapore
| | - Shafiq Joty
- Salesforce AI Research, San Francisco, CA, USA
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - John Abisheganaden
- Department of Respiratory and Critical Care Medicine, Tan Tock Seng Hospital, Singapore, Singapore
| | - Chuen Peng Lee
- Department of Respiratory and Critical Care Medicine, Tan Tock Seng Hospital, Singapore, Singapore
| | - May Oo Lwin
- Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore, Singapore
| | - Yin Leng Theng
- Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore, Singapore
| | - Moon-Ho Ringo Ho
- School of Social Sciences, Nanyang Technological University, Singapore, Singapore
| | - Michael Chia
- Physical Education and Sports Science, National Institute of Education, Nanyang Technological University, Singapore, Singapore
| | - Iva Bojic
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Josip Car
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Life Course & Population Sciences, King's College London, London, UK
| |
Collapse
|
19
|
Stewart R, Chaturvedi J, Roberts A. Natural language processing - relevance to patient outcomes and real-world evidence. Expert Rev Pharmacoecon Outcomes Res 2024; 24:5-9. [PMID: 37874661 DOI: 10.1080/14737167.2023.2275670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/23/2023] [Indexed: 10/26/2023]
Affiliation(s)
- Robert Stewart
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, London, UK
- South London and Maudsley NHS Foundation Trust, London, UK
| | - Jaya Chaturvedi
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, London, UK
| | - Angus Roberts
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, London, UK
| |
Collapse
|
20
|
Egli A. ChatGPT, GPT-4, and Other Large Language Models: The Next Revolution for Clinical Microbiology? Clin Infect Dis 2023; 77:1322-1328. [PMID: 37399030 PMCID: PMC10640689 DOI: 10.1093/cid/ciad407] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 06/16/2023] [Accepted: 06/30/2023] [Indexed: 07/04/2023] Open
Abstract
ChatGPT, GPT-4, and Bard are highly advanced natural language process-based computer programs (chatbots) that simulate and process human conversation in written or spoken form. Recently released by the company OpenAI, ChatGPT was trained on billions of unknown text elements (tokens) and rapidly gained wide attention for its ability to respond to questions in an articulate manner across a wide range of knowledge domains. These potentially disruptive large language model (LLM) technologies have a broad range of conceivable applications in medicine and medical microbiology. In this opinion article, I describe how chatbot technologies work and discuss the strengths and weaknesses of ChatGPT, GPT-4, and other LLMs for applications in the routine diagnostic laboratory, focusing on various use cases for the pre- to post-analytical process.
Collapse
Affiliation(s)
- Adrian Egli
- Institute of Medical Microbiology, University of Zurich, Zurich, Switzerland
| |
Collapse
|
21
|
Frei J, Frei-Stuber L, Kramer F. GERNERMED++: Semantic annotation in German medical NLP through transfer-learning, translation and word alignment. J Biomed Inform 2023; 147:104513. [PMID: 37838290 DOI: 10.1016/j.jbi.2023.104513] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 09/27/2023] [Accepted: 10/04/2023] [Indexed: 10/16/2023]
Abstract
We present a statistical model, GERNERMED++, for German medical natural language processing trained for named entity recognition (NER) as an open, publicly available model. We demonstrate the effectiveness of combining multiple techniques in order to achieve strong results in entity recognition performance by the means of transfer-learning on pre-trained deep language models (LM), word-alignment and neural machine translation, outperforming a pre-existing baseline model on several datasets. Due to the sparse situation of open, public medical entity recognition models for German texts, this work offers benefits to the German research community on medical NLP as a baseline model. The work serves as a refined successor to our first GERNERMED model. Similar to our previous work, our trained model is publicly available to other researchers. The sample code and the statistical model is available at: https://github.com/frankkramer-lab/GERNERMED-pp.
Collapse
Affiliation(s)
- Johann Frei
- IT-Infrastructure for Translational Medical Research, University of Augsburg, Alter Postweg 101, 86159 Augsburg, Germany.
| | - Ludwig Frei-Stuber
- Institute and Outpatient Clinic for Occupational, Social and Environmental Medicine, 80336 Munich, Germany.
| | - Frank Kramer
- IT-Infrastructure for Translational Medical Research, University of Augsburg, Alter Postweg 101, 86159 Augsburg, Germany.
| |
Collapse
|
22
|
Homburg M, Meijer E, Berends M, Kupers T, Olde Hartman T, Muris J, de Schepper E, Velek P, Kuiper J, Berger M, Peters L. A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study. J Med Internet Res 2023; 25:e49944. [PMID: 37792444 PMCID: PMC10563863 DOI: 10.2196/49944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/16/2023] [Accepted: 08/23/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND Natural language processing (NLP) models such as bidirectional encoder representations from transformers (BERT) hold promise in revolutionizing disease identification from electronic health records (EHRs) by potentially enhancing efficiency and accuracy. However, their practical application in practice settings demands a comprehensive and multidisciplinary approach to development and validation. The COVID-19 pandemic highlighted challenges in disease identification due to limited testing availability and challenges in handling unstructured data. In the Netherlands, where general practitioners (GPs) serve as the first point of contact for health care, EHRs generated by these primary care providers contain a wealth of potentially valuable information. Nonetheless, the unstructured nature of free-text entries in EHRs poses challenges in identifying trends, detecting disease outbreaks, or accurately pinpointing COVID-19 cases. OBJECTIVE This study aims to develop and validate a BERT model for detecting COVID-19 consultations in general practice EHRs in the Netherlands. METHODS The BERT model was initially pretrained on Dutch language data and fine-tuned using a comprehensive EHR data set comprising confirmed COVID-19 GP consultations and non-COVID-19-related consultations. The data set was partitioned into a training and development set, and the model's performance was evaluated on an independent test set that served as the primary measure of its effectiveness in COVID-19 detection. To validate the final model, its performance was assessed through 3 approaches. First, external validation was applied on an EHR data set from a different geographic region in the Netherlands. Second, validation was conducted using results of polymerase chain reaction (PCR) test data obtained from municipal health services. Lastly, correlation between predicted outcomes and COVID-19-related hospitalizations in the Netherlands was assessed, encompassing the period around the outbreak of the pandemic in the Netherlands, that is, the period before widespread testing. RESULTS The model development used 300,359 GP consultations. We developed a highly accurate model for COVID-19 consultations (accuracy 0.97, F1-score 0.90, precision 0.85, recall 0.85, specificity 0.99). External validations showed comparable high performance. Validation on PCR test data showed high recall but low precision and specificity. Validation using hospital data showed significant correlation between COVID-19 predictions of the model and COVID-19-related hospitalizations (F1-score 96.8; P<.001; R2=0.69). Most importantly, the model was able to predict COVID-19 cases weeks before the first confirmed case in the Netherlands. CONCLUSIONS The developed BERT model was able to accurately identify COVID-19 cases among GP consultations even preceding confirmed cases. The validated efficacy of our BERT model highlights the potential of NLP models to identify disease outbreaks early, exemplifying the power of multidisciplinary efforts in harnessing technology for disease identification. Moreover, the implications of this study extend beyond COVID-19 and offer a blueprint for the early recognition of various illnesses, revealing that such models could revolutionize disease surveillance.
Collapse
Affiliation(s)
- Maarten Homburg
- Department of Primary- and Long-Term Care, University Medical Center Groningen, Groningen, Netherlands
| | - Eline Meijer
- Department of Primary- and Long-Term Care, University Medical Center Groningen, Groningen, Netherlands
- Data Science Center in Health, University Medical Center Groningen, Groningen, Netherlands
| | - Matthijs Berends
- Department of Primary- and Long-Term Care, University Medical Center Groningen, Groningen, Netherlands
- Department of Medical Microbiology and Infection Prevention, University Medical Center Groningen, Groningen, Netherlands
- Department of Medical Epidemiology, Certe Foundation, Groningen, Netherlands
| | - Thijmen Kupers
- Department of Primary- and Long-Term Care, University Medical Center Groningen, Groningen, Netherlands
- Data Science Center in Health, University Medical Center Groningen, Groningen, Netherlands
| | - Tim Olde Hartman
- Department of Primary and Community Care, Radboud University Nijmegen Medical Center, Nijmegen, Netherlands
| | - Jean Muris
- Care and Public Health Research Institute, Department of Family Medicine, Maastricht University Medical Center, Maastricht, Netherlands
| | - Evelien de Schepper
- Department of General Practice, Erasmus Medical Center, Rotterdam, Netherlands
| | - Premysl Velek
- Department of General Practice, Erasmus Medical Center, Rotterdam, Netherlands
| | - Jeroen Kuiper
- Municipal Health Service Groningen, Groningen, Netherlands
| | - Marjolein Berger
- Department of Primary- and Long-Term Care, University Medical Center Groningen, Groningen, Netherlands
| | - Lilian Peters
- Department of Primary- and Long-Term Care, University Medical Center Groningen, Groningen, Netherlands
- Data Science Center in Health, University Medical Center Groningen, Groningen, Netherlands
- Midwifery Science, Amsterdam Public Health, Vrije Universiteit Amsterdam, Amsterdam University Medical Center, Amsterdam, Netherlands
| |
Collapse
|
23
|
Casey A, Davidson E, Grover C, Tobin R, Grivas A, Zhang H, Schrempf P, O’Neil AQ, Lee L, Walsh M, Pellie F, Ferguson K, Cvoro V, Wu H, Whalley H, Mair G, Whiteley W, Alex B. Understanding the performance and reliability of NLP tools: a comparison of four NLP tools predicting stroke phenotypes in radiology reports. Front Digit Health 2023; 5:1184919. [PMID: 37840686 PMCID: PMC10569314 DOI: 10.3389/fdgth.2023.1184919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 09/06/2023] [Indexed: 10/17/2023] Open
Abstract
Background Natural language processing (NLP) has the potential to automate the reading of radiology reports, but there is a need to demonstrate that NLP methods are adaptable and reliable for use in real-world clinical applications. Methods We tested the F1 score, precision, and recall to compare NLP tools on a cohort from a study on delirium using images and radiology reports from NHS Fife and a population-based cohort (Generation Scotland) that spans multiple National Health Service health boards. We compared four off-the-shelf rule-based and neural NLP tools (namely, EdIE-R, ALARM+, ESPRESSO, and Sem-EHR) and reported on their performance for three cerebrovascular phenotypes, namely, ischaemic stroke, small vessel disease (SVD), and atrophy. Clinical experts from the EdIE-R team defined phenotypes using labelling techniques developed in the development of EdIE-R, in conjunction with an expert researcher who read underlying images. Results EdIE-R obtained the highest F1 score in both cohorts for ischaemic stroke, ≥93%, followed by ALARM+, ≥87%. The F1 score of ESPRESSO was ≥74%, whilst that of Sem-EHR is ≥66%, although ESPRESSO had the highest precision in both cohorts, 90% and 98%. For F1 scores for SVD, EdIE-R scored ≥98% and ALARM+ ≥90%. ESPRESSO scored lowest with ≥77% and Sem-EHR ≥81%. In NHS Fife, F1 scores for atrophy by EdIE-R and ALARM+ were 99%, dropping in Generation Scotland to 96% for EdIE-R and 91% for ALARM+. Sem-EHR performed lowest for atrophy at 89% in NHS Fife and 73% in Generation Scotland. When comparing NLP tool output with brain image reads using F1 scores, ALARM+ scored 80%, outperforming EdIE-R at 66% in ischaemic stroke. For SVD, EdIE-R performed best, scoring 84%, with Sem-EHR 82%. For atrophy, EdIE-R and both ALARM+ versions were comparable at 80%. Conclusions The four NLP tools show varying F1 (and precision/recall) scores across all three phenotypes, although more apparent for ischaemic stroke. If NLP tools are to be used in clinical settings, this cannot be performed "out of the box." It is essential to understand the context of their development to assess whether they are suitable for the task at hand or whether further training, re-training, or modification is required to adapt tools to the target task.
Collapse
Affiliation(s)
- Arlene Casey
- Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Emma Davidson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Claire Grover
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Richard Tobin
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Andreas Grivas
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Huayu Zhang
- Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Patrick Schrempf
- Canon Medical Research Europe Ltd., AI Research, Edinburgh, United Kingdom
- School of Computer Science, University of St Andrews, St Andrews, United Kingdom
| | - Alison Q. O’Neil
- Canon Medical Research Europe Ltd., AI Research, Edinburgh, United Kingdom
- School of Engineering, University of Edinburgh, Edinburgh, United Kingdom
| | - Liam Lee
- Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Michael Walsh
- Intensive Care Department, University Hospitals Bristol and Weston, Bristol, United Kingdom
| | - Freya Pellie
- National Horizons Centre, Teesside University, Darlington, United Kingdom
- School of Health and Life Sciences, Teesside University, Middlesbrough, United Kingdom
| | - Karen Ferguson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Vera Cvoro
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Department of Geriatric Medicine, NHS Fife, Fife, United Kingdom
| | - Honghan Wu
- Institute of Health Informatics, University College London, London, United Kingdom
- Alan Turing Institute, London, United Kingdom
| | - Heather Whalley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Generation Scotland, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
| | - Grant Mair
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Neuroradiology, Department of Clinical Neurosciences, NHS Lothian, Edinburgh, United Kingdom
| | - William Whiteley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Neuroradiology, Department of Clinical Neurosciences, NHS Lothian, Edinburgh, United Kingdom
| | - Beatrice Alex
- Edinburgh Futures Institute, University of Edinburgh, Edinburgh, United Kingdom
- School of Literatures, Languages and Cultures, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
24
|
Lareyre F, Nasr B, Chaudhuri A, Di Lorenzo G, Carlier M, Raffort J. Comprehensive Review of Natural Language Processing (NLP) in Vascular Surgery. EJVES Vasc Forum 2023; 60:57-63. [PMID: 37822918 PMCID: PMC10562666 DOI: 10.1016/j.ejvsvf.2023.09.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 07/13/2023] [Accepted: 09/08/2023] [Indexed: 10/13/2023] Open
Abstract
Objective The use of Natural Language Processing (NLP) has attracted increased interest in healthcare with various potential applications including identification and extraction of health information, development of chatbots and virtual assistants. The aim of this comprehensive literature review was to provide an overview of NLP applications in vascular surgery, identify current limitations, and discuss future perspectives in the field. Data sources The MEDLINE database was searched on April 2023. Review methods The database was searched using a combination of keywords to identify studies reporting the use of NLP and chatbots in three main vascular diseases. Keywords used included Natural Language Processing, chatbot, chatGPT, aortic disease, carotid, peripheral artery disease, vascular, and vascular surgery. Results Given the heterogeneity of study design, techniques, and aims, a comprehensive literature review was performed to provide an overview of NLP applications in vascular surgery. By enabling identification and extraction of information on patients with vascular diseases, such technology could help to analyse data from healthcare information systems to provide feedback on current practice and help in optimising patient care. In addition, chatbots and NLP driven techniques have the potential to be used as virtual assistants for both health professionals and patients. Conclusion While Artificial Intelligence and NLP technology could be used to enhance care for patients with vascular diseases, many challenges remain including the need to define guidelines and clear consensus on how to evaluate and validate these innovations before their implementation into clinical practice.
Collapse
Affiliation(s)
- Fabien Lareyre
- Department of Vascular Surgery, Hospital of Antibes Juan-les-Pins, France
- Université Côte d'Azur, Inserm, U1065, C3M, Nice, France
| | - Bahaa Nasr
- Department of Vascular and Endovascular Surgery, Brest University Hospital, Brest, France
- INSERM, UMR 1101, LaTIM, Brest, France
| | - Arindam Chaudhuri
- Bedfordshire - Milton Keynes Vascular Centre, Bedfordshire Hospitals, NHS Foundation Trust, Bedford, UK
| | - Gilles Di Lorenzo
- Department of Vascular Surgery, Hospital of Antibes Juan-les-Pins, France
| | - Mathieu Carlier
- Department of Urology, University Hospital of Nice, Nice, France
| | - Juliette Raffort
- Université Côte d'Azur, Inserm, U1065, C3M, Nice, France
- Institute 3IA Côte d’Azur, Université Côte d’Azur, France
- Clinical Chemistry Laboratory, University Hospital of Nice, France
| |
Collapse
|
25
|
Roumengas R, Di Lorenzo G, Salhi A, de Buyer P, Chaudhuri A, Lareyre F, Raffort J. Natural Language Processing for Literature Search in Vascular Surgery: A Pilot Study Testing an Artificial Intelligence Based Application. EJVES Vasc Forum 2023; 60:48-52. [PMID: 37799295 PMCID: PMC10550400 DOI: 10.1016/j.ejvsvf.2023.09.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 08/01/2023] [Accepted: 09/12/2023] [Indexed: 10/07/2023] Open
Abstract
Introduction The use of natural language processing (NLP) for a literature search has been poorly investigated in vascular surgery so far. The aim of this pilot study was to test the applicability of an artificial intelligence (AI) based mobile application for literature searching in a topic related to vascular surgery. Technique A focused scientific question was defined to evaluate the performance of the AI application for a literature search and compare the results with the ground truth provided via a traditional literature search performed by human experts. Using pre-defined keywords, the literature search was performed automatically by the AI application through different steps, including quality assessment based on evaluation of the information available and quality filters using indicators of level of evidence, selection of publications based on relevancy filters using NLP, summarisation, and visualisation of the publications via the mobile app. A traditional literature search performed by human experts required 10 hours to check 154 original articles, among which 26 (16.9%) were truly related to the question, 63 (40.9%) related to the field but not to the specific question, and 65 (42.2%) were unrelated. The AI based search was performed in less than one hour, and, compared with traditional search, the method identified 17 original articles (48.6%) truly related to the question (p < .010), 18 (51.4%) related to the field but not to the specific question (p = .26), and no unrelated publications (p < .001). Fifteen truly related articles (88.2%) were identified jointly by the two methods. No significant difference was observed regarding the median number of citations, year of publications, and impact factor of journals. Discussion The AI based method enabled a targeted, focused, and time saving literature search, although the selection of publications was not completely exhaustive. These results suggest that such an AI driven application is a complementary tool to help researchers and clinicians for continuous education and dissemination of knowledge.
Collapse
Affiliation(s)
| | - Gilles Di Lorenzo
- Department of Vascular Surgery, Hospital of Antibes-Juan-les-Pins, Antibes, France
| | - Amel Salhi
- Juisci (Juisci SAS), Neuilly-sur-Seine, France
| | | | - Arindam Chaudhuri
- Bedfordshire – Milton Keynes Vascular Centre, Bedfordshire Hospitals, NHS Foundation Trust, Bedford, UK
| | - Fabien Lareyre
- Department of Vascular Surgery, Hospital of Antibes-Juan-les-Pins, Antibes, France
- Université Côte d'Azur, CHU, Inserm U1065, C3M, Nice, France
| | - Juliette Raffort
- Université Côte d'Azur, CHU, Inserm U1065, C3M, Nice, France
- Institute 3IA Côte d’Azur, Université Côte d’Azur, France
- Clinical Chemistry Laboratory, University Hospital of Nice, France
| |
Collapse
|
26
|
Papachristou N, Kotronoulas G, Dikaios N, Allison SJ, Eleftherochorinou H, Rai T, Kunz H, Barnaghi P, Miaskowski C, Bamidis PD. Digital Transformation of Cancer Care in the Era of Big Data, Artificial Intelligence and Data-Driven Interventions: Navigating the Field. Semin Oncol Nurs 2023; 39:151433. [PMID: 37137770 DOI: 10.1016/j.soncn.2023.151433] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 03/29/2023] [Indexed: 05/05/2023]
Abstract
OBJECTIVES To navigate the field of digital cancer care and define and discuss key aspects and applications of big data analytics, artificial intelligence (AI), and data-driven interventions. DATA SOURCES Peer-reviewed scientific publications and expert opinion. CONCLUSION The digital transformation of cancer care, enabled by big data analytics, AI, and data-driven interventions, presents a significant opportunity to revolutionize the field. An increased understanding of the lifecycle and ethics of data-driven interventions will enhance development of innovative and applicable products to advance digital cancer care services. IMPLICATIONS FOR NURSING PRACTICE As digital technologies become integrated into cancer care, nurse practitioners and scientists will be required to increase their knowledge and skills to effectively use these tools to the patient's benefit. An enhanced understanding of the core concepts of AI and big data, confident use of digital health platforms, and ability to interpret the outputs of data-driven interventions are key competencies. Nurses in oncology will play a crucial role in patient education around big data and AI, with a focus on addressing any arising questions, concerns, or misconceptions to foster trust in these technologies. Successful integration of data-driven innovations into oncology nursing practice will empower practitioners to deliver more personalized, effective, and evidence-based care.
Collapse
Affiliation(s)
- Nikolaos Papachristou
- Medical Physics and Digital Innovation Laboratory, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | | | - Nikolaos Dikaios
- Centre for Vision Speech and Signal Processing, University of Surrey, Guildford, UK; Mathematics Research Centre, Academy of Athens, Athens, Greece
| | - Sarah J Allison
- Department of Sport, Exercise and Rehabilitation, Faculty of Health and Life Sciences, Northumbria University, Newcastle, UK; School of Bioscience and Medicine, Faculty of Health & Medical Sciences, University of Surrey, Guildford, UK
| | | | - Taranpreet Rai
- Centre for Vision Speech and Signal Processing, University of Surrey, Guildford, UK; Datalab, The Veterinary Health Innovation Engine (vHive), Guildford, UK
| | - Holger Kunz
- Institute of Health Informatics, University College London, London, UK
| | - Payam Barnaghi
- UK Dementia Research Institute Care Research and Technology Centre, Imperial College London, London, UK
| | - Christine Miaskowski
- School of Nursing, University California San Francisco, San Francisco, California, USA
| | - Panagiotis D Bamidis
- Medical Physics and Digital Innovation Laboratory, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
27
|
Serna García G, Al Khalaf R, Invernici F, Ceri S, Bernasconi A. CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning. Gigascience 2022; 12:giad036. [PMID: 37222749 PMCID: PMC10205000 DOI: 10.1093/gigascience/giad036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 04/11/2023] [Accepted: 04/27/2023] [Indexed: 05/25/2023] Open
Abstract
BACKGROUND Literature about SARS-CoV-2 widely discusses the effects of variations that have spread in the past 3 years. Such information is dispersed in the texts of several research articles, hindering the possibility of practically integrating it with related datasets (e.g., millions of SARS-CoV-2 sequences available to the community). We aim to fill this gap, by mining literature abstracts to extract-for each variant/mutation-its related effects (in epidemiological, immunological, clinical, or viral kinetics terms) with labeled higher/lower levels in relation to the nonmutated virus. RESULTS The proposed framework comprises (i) the provisioning of abstracts from a COVID-19-related big data corpus (CORD-19) and (ii) the identification of mutation/variant effects in abstracts using a GPT2-based prediction model. The above techniques enable the prediction of mutations/variants with their effects and levels in 2 distinct scenarios: (i) the batch annotation of the most relevant CORD-19 abstracts and (ii) the on-demand annotation of any user-selected CORD-19 abstract through the CoVEffect web application (http://gmql.eu/coveffect), which assists expert users with semiautomated data labeling. On the interface, users can inspect the predictions and correct them; user inputs can then extend the training dataset used by the prediction model. Our prototype model was trained through a carefully designed process, using a minimal and highly diversified pool of samples. CONCLUSIONS The CoVEffect interface serves for the assisted annotation of abstracts, allowing the download of curated datasets for further use in data integration or analysis pipelines. The overall framework can be adapted to resolve similar unstructured-to-structured text translation tasks, which are typical of biomedical domains.
Collapse
Affiliation(s)
- Giuseppe Serna García
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Ruba Al Khalaf
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Francesco Invernici
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Stefano Ceri
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Anna Bernasconi
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| |
Collapse
|