1
|
Mahmoudi E, Vahdati S, Chao CJ, Khosravi B, Misra A, Lopez-Jimenez F, Erickson BJ. A comparative analysis of privacy-preserving large language models for automated echocardiography report analysis. J Am Med Inform Assoc 2025:ocaf056. [PMID: 40334045 DOI: 10.1093/jamia/ocaf056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2024] [Revised: 01/29/2025] [Accepted: 03/18/2025] [Indexed: 05/09/2025] Open
Abstract
BACKGROUND Automated data extraction from echocardiography reports could facilitate large-scale registry creation and clinical surveillance of valvular heart diseases (VHD). We evaluated the performance of open-source large language models (LLMs) guided by prompt instructions and chain of thought (CoT) for this task. METHODS From consecutive transthoracic echocardiographies performed in our center, we utilized 200 random reports from 2019 for prompt optimization and 1000 from 2023 for evaluation. Five instruction-tuned LLMs (Qwen2.0-72B, Llama3.0-70B, Mixtral8-46.7B, Llama3.0-8B, and Phi3.0-3.8B) were guided by prompt instructions with and without CoT to classify prosthetic valve presence and VHD severity. Performance was evaluated using classification metrics against expert-labeled ground truth. Mean squared error (MSE) was also calculated for predicted severity's deviation from actual severity. RESULTS With CoT prompting, Llama3.0-70B and Qwen2.0 achieved the highest performance (accuracy: 99.1% and 98.9% for VHD severity; 100% and 99.9% for prosthetic valve; MSE: 0.02 and 0.05, respectively). Smaller models showed lower accuracy for VHD severity (54.1%-85.9%) but maintained high accuracy for prosthetic valve detection (>96%). Chain of thought reasoning yielded higher accuracy for larger models while increasing processing time from 2-25 to 67-154 seconds per report. Based on CoT reasonings, the wrong predictions were mainly due to model outputs being influenced by irrelevant information in the text or failure to follow the prompt instructions. CONCLUSIONS Our study demonstrates the near-perfect performance of open-source LLMs for automated echocardiography report interpretation with the purpose of registry formation and disease surveillance. While larger models achieved exceptional accuracy through prompt optimization, practical implementation requires balancing performance with computational efficiency.
Collapse
Affiliation(s)
- Elham Mahmoudi
- Department of Radiology, Radiology Informatics Lab, Mayo Clinic, Rochester, MN 55905, United States
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN 55905, United States
| | - Sanaz Vahdati
- Department of Radiology, Radiology Informatics Lab, Mayo Clinic, Rochester, MN 55905, United States
| | - Chieh-Ju Chao
- Department of Radiology, Radiology Informatics Lab, Mayo Clinic, Rochester, MN 55905, United States
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN 55905, United States
| | - Bardia Khosravi
- Department of Radiology, Radiology Informatics Lab, Mayo Clinic, Rochester, MN 55905, United States
| | - Ajay Misra
- Department of Radiology, Radiology Informatics Lab, Mayo Clinic, Rochester, MN 55905, United States
| | - Francisco Lopez-Jimenez
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN 55905, United States
| | - Bradley J Erickson
- Department of Radiology, Radiology Informatics Lab, Mayo Clinic, Rochester, MN 55905, United States
| |
Collapse
|
2
|
Huhtanen HJ, Nyman MJ, Karlsson A, Hirvonen J. Machine Learning and Deep Learning Models for Automated Protocoling of Emergency Brain MRI Using Text from Clinical Referrals. Radiol Artif Intell 2025; 7:e230620. [PMID: 39969276 DOI: 10.1148/ryai.230620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2025]
Abstract
Purpose To develop and evaluate machine learning and deep learning-based models for automated protocoling of emergency brain MRI scans based on clinical referral text. Materials and Methods In this single-institution, retrospective study of 1953 emergency brain MRI referrals from January 2016 to January 2019, two neuroradiologists labeled the imaging protocol and use of contrast agent as the reference standard. Three machine learning algorithms (naive Bayes, support vector machine, and XGBoost) and two pretrained deep learning models (Finnish bidirectional encoder representations from transformers [BERT] and generative pretrained transformer [GPT]-3.5 [GPT-3.5 Turbo; Open AI]) were developed to predict the MRI protocol and need for a contrast agent. Each model was trained with three datasets (100% of training data, 50% of training data, and 50% plus augmented training data). Prediction accuracy was assessed with a test set. Results The GPT-3.5 models trained with 100% of the training data performed best in both tasks, achieving an accuracy of 84% (95% CI: 80, 88) for the correct protocol and 91% (95% CI: 88, 94) for the contrast agent. BERT had an accuracy of 78% (95% CI: 74, 82) for the protocol and 89% (95% CI: 86, 92) for the contrast agent. The best machine learning model in the protocol task was XGBoost (accuracy, 78%; 95% CI: 73, 82), and the best machine learning models in the contrast agent task were support vector machine and XGBoost (accuracy, 88%; 95% CI: 84, 91 for both). The accuracies of two nonneuroradiologists were 80%-83% in the protocol task and 89%-91% in the contrast medium task. Conclusion Machine learning and deep learning models demonstrated high performance in automatic protocoling of emergency brain MRI scans based on text from clinical referrals. Keywords: Natural Language Processing, Automatic Protocoling, Deep Learning, Machine Learning, Emergency Brain MRI Supplemental material is available for this article. Published under a CC BY 4.0 license. See also commentary by Strotzer in this issue.
Collapse
Affiliation(s)
- Heidi J Huhtanen
- Department of Radiology, Turku University Hospital & University of Turku, Kiinamyllynkatu 4-8, 20521 Turku, Finland
| | - Mikko J Nyman
- Department of Radiology, Turku University Hospital & University of Turku, Kiinamyllynkatu 4-8, 20521 Turku, Finland
| | - Antti Karlsson
- Department of Radiology, University of Turku, Turku, Finland, and Pihlajalinna Turku, Turku, Finland
| | - Jussi Hirvonen
- Department of Radiology, Turku University Hospital & University of Turku, Kiinamyllynkatu 4-8, 20521 Turku, Finland
| |
Collapse
|
3
|
López-Úbeda P, Martín-Noguerol T, Escartín J, Cabrera-Zubizarreta A, Luna A. Automated MRI pituitary structured reporting from free-text using a fine-tuned Llama model: a feasibility study. Jpn J Radiol 2025; 43:770-778. [PMID: 39730936 DOI: 10.1007/s11604-024-01721-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 12/11/2024] [Indexed: 12/29/2024]
Abstract
BACKGROUND AND OBJECTIVE Structured reports in radiology have demonstrated substantial advantages over unstructured ones. However, the transition from unstructured to structured reporting can face challenges, as experienced radiologists worry about the potential loss of valuable information. In this study, we fine-tuned the Llama 2 model capable of generating structured pituitary MRI reports from unstructured reports. METHODS We used a training set comprising 104 pituitary MRI reports to fine-tune Llama 2 and 26 reports as a test set to evaluate the system. The dataset was annotated manually by three expert radiologists. For this annotation, the radiologists used the unstructured report and structured it into eight anatomical landmarks: adenohypophysis, pituitary stalk, optic chiasm, suprasellar cistern, neurohypophysis, cavernous sinuses, sphenoid sinuses and other findings. RESULTS Llama2 achieves a value greater than 0.79 on the ROUGE-L metric in four anatomical landmarks from free-text pituitary MRI reports. The other anatomical landmarks exceed 0.61 of ROUGE-L except for the other findings section. CONCLUSIONS Our study suggests good performance in structuring anatomical landmarks on pituitary MRI reports using the fine-tune Llama 2 model.
Collapse
Affiliation(s)
| | | | - Jorge Escartín
- Neurorradiología Diagnostica E Intervencionista, HT Médica Córdoba-Sevilla, Sevilla, Spain
| | | | - Antonio Luna
- MRI Unit, Radiology Department, HT Medica. Carmelo Torres nº2, 23007, Jaén, Spain
| |
Collapse
|
4
|
Yao J, Alabousi A, Mironov O. Evaluation of a BERT Natural Language Processing Model for Automating CT and MRI Triage and Protocol Selection. Can Assoc Radiol J 2025; 76:265-272. [PMID: 38832645 DOI: 10.1177/08465371241255895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2024] Open
Abstract
Purpose: To evaluate the accuracy of a Bidirectional Encoder Representations for Transformers (BERT) Natural Language Processing (NLP) model for automating triage and protocol selection of cross-sectional image requisitions. Methods: A retrospective study was completed using 222 392 CT and MRI studies from a single Canadian university hospital database (January 2018-September 2022). Three hundred unique protocols (116 CT and 184 MRI) were included. A BERT model was trained, validated, and tested using an 80%-10%-10% stratified split. Naive Bayes (NB) and Support Vector Machine (SVM) machine learning models were used as comparators. Models were assessed using F1 score, precision, recall, and area under the receiver operating characteristic curve (AUROC). The BERT model was also assessed for multi-class protocol suggestion and subgroups based on referral location, modality, and imaging section. Results: BERT was superior to SVM for protocol selection (F1 score: BERT-0.901 vs SVM-0.881). However, was not significantly different from SVM for triage prediction (F1 score: BERT-0.844 vs SVM-0.845). Both models outperformed NB for protocol and triage. BERT had superior performance on minority classes compared to SVM and NB. For multiclass prediction, BERT accuracy was up to 0.991 for top-5 protocol suggestion, and 0.981 for top-2 triage suggestion. Emergency department patients had the highest F1 scores for both protocol (0.957) and triage (0.986), compared to inpatients and outpatients. Conclusion: The BERT NLP model demonstrated strong performance in automating the triage and protocol selection of radiology studies, showing potential to enhance radiologist workflows. These findings suggest the feasibility of using advanced NLP models to streamline radiology operations.
Collapse
Affiliation(s)
- Jason Yao
- Department of Radiology, McMaster University, Hamilton, ON, Canada
| | - Abdullah Alabousi
- Department of Radiology, McMaster University, Hamilton, ON, Canada
- St Joseph's Healthcare Hamilton, Hamilton, ON, Canada
| | - Oleg Mironov
- Department of Radiology, McMaster University, Hamilton, ON, Canada
- St Joseph's Healthcare Hamilton, Hamilton, ON, Canada
| |
Collapse
|
5
|
Mahyoub M, Dougherty K, Shukla A. Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study. JMIR Med Inform 2025; 13:e67706. [PMID: 40203306 PMCID: PMC12018862 DOI: 10.2196/67706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 01/30/2025] [Accepted: 03/13/2025] [Indexed: 04/11/2025] Open
Abstract
BACKGROUND Pulmonary embolism (PE) is a critical condition requiring rapid diagnosis to reduce mortality. Extracting PE diagnoses from radiology reports manually is time-consuming, highlighting the need for automated solutions. Advances in natural language processing, especially transformer models like GPT-4o, offer promising tools to improve diagnostic accuracy and workflow efficiency in clinical settings. OBJECTIVE This study aimed to develop an automatic extraction system using GPT-4o to extract PE diagnoses from radiology report impressions, enhancing clinical decision-making and workflow efficiency. METHODS In total, 2 approaches were developed and evaluated: a fine-tuned Clinical Longformer as a baseline model and a GPT-4o-based extractor. Clinical Longformer, an encoder-only model, was chosen for its robustness in text classification tasks, particularly on smaller scales. GPT-4o, a decoder-only instruction-following LLM, was selected for its advanced language understanding capabilities. The study aimed to evaluate GPT-4o's ability to perform text classification compared to the baseline Clinical Longformer. The Clinical Longformer was trained on a dataset of 1000 radiology report impressions and validated on a separate set of 200 samples, while the GPT-4o extractor was validated using the same 200-sample set. Postdeployment performance was further assessed on an additional 200 operational records to evaluate model efficacy in a real-world setting. RESULTS GPT-4o outperformed the Clinical Longformer in 2 of the metrics, achieving a sensitivity of 1.0 (95% CI 1.0-1.0; Wilcoxon test, P<.001) and an F1-score of 0.975 (95% CI 0.9495-0.9947; Wilcoxon test, P<.001) across the validation dataset. Postdeployment evaluations also showed strong performance of the deployed GPT-4o model with a sensitivity of 1.0 (95% CI 1.0-1.0), a specificity of 0.94 (95% CI 0.8913-0.9804), and an F1-score of 0.97 (95% CI 0.9479-0.9908). This high level of accuracy supports a reduction in manual review, streamlining clinical workflows and improving diagnostic precision. CONCLUSIONS The GPT-4o model provides an effective solution for the automatic extraction of PE diagnoses from radiology reports, offering a reliable tool that aids timely and accurate clinical decision-making. This approach has the potential to significantly improve patient outcomes by expediting diagnosis and treatment pathways for critical conditions like PE.
Collapse
Affiliation(s)
- Mohammed Mahyoub
- Virtua Health, Marlton, NJ, United States
- School of Systems Science and Industrial Engineering, Binghamton University, Binghamton, NY, United States
| | | | | |
Collapse
|
6
|
Clunie DA, Flanders A, Taylor A, Erickson B, Bialecki B, Brundage D, Gutman D, Prior F, Seibert JA, Perry J, Gichoya JW, Kirby J, Andriole K, Geneslaw L, Moore S, Fitzgerald TJ, Tellis W, Xiao Y, Farahani K. Report of the Medical Image De-Identification (MIDI) Task Group -- Best Practices and Recommendations. ARXIV 2025:arXiv:2303.10473v3. [PMID: 37033463 PMCID: PMC10081345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
This report addresses the technical aspects of de-identification of medical images of human subjects and biospecimens, such that re-identification risk of ethical, moral, and legal concern is sufficiently reduced to allow unrestricted public sharing for any purpose, regardless of the jurisdiction of the source and distribution sites. All medical images, regardless of the mode of acquisition, are considered, though the primary emphasis is on those with accompanying data elements, especially those encoded in formats in which the data elements are embedded, particularly Digital Imaging and Communications in Medicine (DICOM). These images include image-like objects such as Segmentations, Parametric Maps, and Radiotherapy (RT) Dose objects. The scope also includes related non-image objects, such as RT Structure Sets, Plans and Dose Volume Histograms, Structured Reports, and Presentation States. Only de-identification of publicly released data is considered, and alternative approaches to privacy preservation, such as federated learning for artificial intelligence (AI) model development, are out of scope, as are issues of privacy leakage from AI model sharing. Only technical issues of public sharing are addressed.
Collapse
|
7
|
Shahid F, Hsu MH, Chang YC, Jian WS. Using Generative AI to Extract Structured Information from Free Text Pathology Reports. J Med Syst 2025; 49:36. [PMID: 40080229 PMCID: PMC11906504 DOI: 10.1007/s10916-025-02167-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 03/03/2025] [Indexed: 03/15/2025]
Abstract
Manually converting unstructured text pathology reports into structured pathology reports is very time-consuming and prone to errors. This study demonstrates the transformative potential of generative AI in automating the analysis of free-text pathology reports. Employing the ChatGPT Large Language Model within a Streamlit web application, we automated the extraction and structuring of information from 33 unstructured breast cancer pathology reports from Taipei Medical University Hospital. Achieving a 99.61% accuracy rate, the AI system notably reduced the processing time compared to traditional methods. This not only underscores the efficacy of AI in converting unstructured medical text into structured data but also highlights its potential to enhance the efficiency and reliability of medical text analysis. However, this study is limited to breast cancer pathology reports and was conducted using data obtained from hospitals associated with a single institution. In the future, we plan to expand the scope of this research to include pathology reports for other cancer types incrementally and conduct external validation to further substantiate the robustness and generalizability of the proposed system. Through this technological integration, we aimed to substantiate the capabilities of generative AI in improving both the speed and reliability of data processing. The outcomes of this study affirm that generative AI can significantly transform the handling of pathology reports, promising substantial advancements in biomedical research by facilitating the structured analysis of complex medical data.
Collapse
Affiliation(s)
- Fahad Shahid
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan.
| | - Min-Huei Hsu
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan.
- Department of Neurosurgery, Shuang-Ho Hospital-Taipei Medical University, Taipei, Taiwan.
- International Ph.D. Program in Biotech and Healthcare Management, Taipei Medical University, Taipei, Taiwan.
- School of Healthcare Administration, Taipei Medical University, Taipei, Taiwan.
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taipei, Taiwan.
| | - Yung-Chun Chang
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan
- International Ph.D. Program in Biotech and Healthcare Management, Taipei Medical University, Taipei, Taiwan
- School of Healthcare Administration, Taipei Medical University, Taipei, Taiwan
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taipei, Taiwan
| | - Wen-Shan Jian
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan
- International Ph.D. Program in Biotech and Healthcare Management, Taipei Medical University, Taipei, Taiwan
- School of Healthcare Administration, Taipei Medical University, Taipei, Taiwan
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
8
|
Bala W, Li H, Moon J, Trivedi H, Gichoya J, Balthazar P. Enhancing radiology training with GPT-4: Pilot analysis of automated feedback in trainee preliminary reports. Curr Probl Diagn Radiol 2025; 54:151-158. [PMID: 39179466 PMCID: PMC11802295 DOI: 10.1067/j.cpradiol.2024.08.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 07/29/2024] [Accepted: 08/08/2024] [Indexed: 08/26/2024]
Abstract
RATIONALE AND OBJECTIVES Radiology residents often receive limited feedback on preliminary reports issued during independent call. This study aimed to determine if Large Language Models (LLMs) can supplement traditional feedback by identifying missed diagnoses in radiology residents' preliminary reports. MATERIALS & METHODS A randomly selected subset of 500 (250 train/250 validation) paired preliminary and final reports between 12/17/2022 and 5/22/2023 were extracted and de-identified from our institutional database. The prompts and report text were input into the GPT-4 language model via the GPT-4 API (gpt-4-0314 model version). Iterative prompt tuning was used on a subset of the training/validation sets to direct the model to identify important findings in the final report that were absent in preliminary reports. For testing, a subset of 10 reports with confirmed diagnostic errors were randomly selected. Fourteen residents with on-call experience assessed the LLM-generated discrepancies and completed a survey on their experience using a 5-point Likert scale. RESULTS The model identified 24 unique missed diagnoses across 10 test reports with i% model prediction accuracy as rated by 14 residents. Five additional diagnoses were identified by users, resulting in a model sensitivity of 79.2 %. Post-evaluation surveys showed a mean satisfaction rating of 3.50 and perceived accuracy rating of 3.64 out of 5 for LLM-generated feedback. Most respondents (71.4 %) favored a combination of LLM-generated and traditional feedback. CONCLUSION This pilot study on the use of LLM-generated feedback for radiology resident preliminary reports demonstrated notable accuracy in identifying missed diagnoses and was positively received, highlighting LLMs' potential role in supplementing conventional feedback methods.
Collapse
Affiliation(s)
- Wasif Bala
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, USA.
| | - Hanzhou Li
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, USA
| | - John Moon
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, USA
| | - Hari Trivedi
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, USA
| | - Judy Gichoya
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, USA
| | - Patricia Balthazar
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, USA
| |
Collapse
|
9
|
Omar M, Levkovich I. Exploring the efficacy and potential of large language models for depression: A systematic review. J Affect Disord 2025; 371:234-244. [PMID: 39581383 DOI: 10.1016/j.jad.2024.11.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 10/21/2024] [Accepted: 11/15/2024] [Indexed: 11/26/2024]
Abstract
BACKGROUND AND OBJECTIVE Depression is a substantial public health issue, with global ramifications. While initial literature reviews explored the intersection between artificial intelligence (AI) and mental health, they have not yet critically assessed the specific contributions of Large Language Models (LLMs) in this domain. The objective of this systematic review was to examine the usefulness of LLMs in diagnosing and managing depression, as well as to investigate their incorporation into clinical practice. METHODS This review was based on a thorough search of the PubMed, Embase, Web of Science, and Scopus databases for the period January 2018 through March 2024. The search used PROSPERO and adhered to PRISMA guidelines. Original research articles, preprints, and conference papers were included, while non-English and non-research publications were excluded. Data extraction was standardized, and the risk of bias was evaluated using the ROBINS-I, QUADAS-2, and PROBAST tools. RESULTS Our review included 34 studies that focused on the application of LLMs in detecting and classifying depression through clinical data and social media texts. LLMs such as RoBERTa and BERT demonstrated high effectiveness, particularly in early detection and symptom classification. Nevertheless, the integration of LLMs into clinical practice is in its nascent stage, with ongoing concerns about data privacy and ethical implications. CONCLUSION LLMs exhibit significant potential for transforming strategies for diagnosing and treating depression. Nonetheless, full integration of LLMs into clinical practice requires rigorous testing, ethical considerations, and enhanced privacy measures to ensure their safe and effective use.
Collapse
Affiliation(s)
- Mahmud Omar
- Tel-Aviv University, Faculty of Medicine, Israel.
| | | |
Collapse
|
10
|
Cruz-Gonzalez P, He AWJ, Lam EP, Ng IMC, Li MW, Hou R, Chan JNM, Sahni Y, Vinas Guasch N, Miller T, Lau BWM, Sánchez Vidaña DI. Artificial intelligence in mental health care: a systematic review of diagnosis, monitoring, and intervention applications. Psychol Med 2025; 55:e18. [PMID: 39911020 PMCID: PMC12017374 DOI: 10.1017/s0033291724003295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 10/26/2024] [Accepted: 11/26/2024] [Indexed: 02/07/2025]
Abstract
Artificial intelligence (AI) has been recently applied to different mental health illnesses and healthcare domains. This systematic review presents the application of AI in mental health in the domains of diagnosis, monitoring, and intervention. A database search (CCTR, CINAHL, PsycINFO, PubMed, and Scopus) was conducted from inception to February 2024, and a total of 85 relevant studies were included according to preestablished inclusion criteria. The AI methods most frequently used were support vector machine and random forest for diagnosis, machine learning for monitoring, and AI chatbot for intervention. AI tools appeared to be accurate in detecting, classifying, and predicting the risk of mental health conditions as well as predicting treatment response and monitoring the ongoing prognosis of mental health disorders. Future directions should focus on developing more diverse and robust datasets and on enhancing the transparency and interpretability of AI models to improve clinical practice.
Collapse
Affiliation(s)
- Pablo Cruz-Gonzalez
- Rehabilitation Research Institute of Singapore, Nanyang Technological University, Singapore, Singapore
| | - Aaron Wan-Jia He
- School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, Hong Kong
| | - Elly PoPo Lam
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
| | - Ingrid Man Ching Ng
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
| | - Mandy Wingman Li
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
| | - Rangchun Hou
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
| | - Jackie Ngai-Man Chan
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
| | - Yuvraj Sahni
- Department of Building Environment and Energy Engineering, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
| | - Nestor Vinas Guasch
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
| | - Tiev Miller
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
| | - Benson Wui-Man Lau
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
- Mental Health Research Center, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
| | - Dalinda Isabel Sánchez Vidaña
- Department of Rehabilitation Sciences, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
- Mental Health Research Center, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
| |
Collapse
|
11
|
Cheng CT, Ooyang CH, Liao CH, Kang SC. Applications of deep learning in trauma radiology: A narrative review. Biomed J 2025; 48:100743. [PMID: 38679199 PMCID: PMC11751421 DOI: 10.1016/j.bj.2024.100743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 03/26/2024] [Accepted: 04/24/2024] [Indexed: 05/01/2024] Open
Abstract
Diagnostic imaging is essential in modern trauma care for initial evaluation and identifying injuries requiring intervention. Deep learning (DL) has become mainstream in medical image analysis and has shown promising efficacy for classification, segmentation, and lesion detection. This narrative review provides the fundamental concepts for developing DL algorithms in trauma imaging and presents an overview of current progress in each modality. DL has been applied to detect free fluid on Focused Assessment with Sonography for Trauma (FAST), traumatic findings on chest and pelvic X-rays, and computed tomography (CT) scans, identify intracranial hemorrhage on head CT, detect vertebral fractures, and identify injuries to organs like the spleen, liver, and lungs on abdominal and chest CT. Future directions involve expanding dataset size and diversity through federated learning, enhancing model explainability and transparency to build clinician trust, and integrating multimodal data to provide more meaningful insights into traumatic injuries. Though some commercial artificial intelligence products are Food and Drug Administration-approved for clinical use in the trauma field, adoption remains limited, highlighting the need for multi-disciplinary teams to engineer practical, real-world solutions. Overall, DL shows immense potential to improve the efficiency and accuracy of trauma imaging, but thoughtful development and validation are critical to ensure these technologies positively impact patient care.
Collapse
Affiliation(s)
- Chi-Tung Cheng
- Department of Trauma and Emergency Surgery, Chang Gung Memorial Hospital, Linkou, Taoyuan, Taiwan; School of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Chun-Hsiang Ooyang
- Department of Trauma and Emergency Surgery, Chang Gung Memorial Hospital, Linkou, Taoyuan, Taiwan
| | - Chien-Hung Liao
- Department of Trauma and Emergency Surgery, Chang Gung Memorial Hospital, Linkou, Taoyuan, Taiwan
| | - Shih-Ching Kang
- Department of Trauma and Emergency Surgery, Chang Gung Memorial Hospital, Linkou, Taoyuan, Taiwan.
| |
Collapse
|
12
|
Jorg T, Halfmann MC, Graafen D, Hobohm L, Düber C, Mildenberger P, Müller L. [Structured reporting for efficient epidemiological and in-hospital prevalence analysis of pulmonary embolisms]. ROFO-FORTSCHR RONTG 2025; 197:186-195. [PMID: 38806150 DOI: 10.1055/a-2301-3349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2024]
Abstract
Structured reporting (SR) not only offers advantages regarding report quality but, as an IT-based method, also the opportunity to aggregate and analyze large, highly structured datasets (data mining). In this study, a data mining algorithm was used to calculate epidemiological data and in-hospital prevalence statistics of pulmonary embolism (PE) by analyzing structured CT reports.All structured reports for PE CT scans from the last 5 years (n = 2790) were extracted from the SR database and analyzed. The prevalence of PE was calculated for the entire cohort and stratified by referral type and clinical referrer. Distributions of the manifestation of PEs (central, lobar, segmental, subsegmental, as well as left-sided, right-sided, bilateral) were calculated, and the occurrence of right heart strain was correlated with the manifestation.The prevalence of PE in the entire cohort was 24% (n = 678). The median age of PE patients was 71 years (IQR 58-80), and the sex distribution was 1.2/1 (M/F). Outpatients showed a lower prevalence of 23% compared to patients from regular wards (27%) and intensive care units (30%). Surgically referred patients had a higher prevalence than patients from internal medicine (34% vs. 22%). Patients with central and bilateral PEs had a significantly higher occurrence of right heart strain compared to patients with peripheral and unilateral embolisms.Data mining of structured reports is a simple method for obtaining prevalence statistics, epidemiological data, and the distribution of disease characteristics, as demonstrated by the PE use case. The generated data can be helpful for multiple purposes, such as for internal clinical quality assurance and scientific analyses. To benefit from this, consistent use of SR is required and is therefore recommended. · SR-based data mining allows simple epidemiologic analyses for PE.. · The prevalence of PE differs between outpatients and inpatients.. · Central and bilateral PEs have an increased risk of right heart strain.. · Jorg T, Halfmann MC, Graafen D et al. Structured reporting for efficient epidemiological and in-hospital prevalence analysis of pulmonary embolisms. Rofo 2025; 197: 186-195.
Collapse
Affiliation(s)
- Tobias Jorg
- Department of Diagnostic and Interventional Radiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Moritz C Halfmann
- Department of Diagnostic and Interventional Radiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Dirk Graafen
- Department of Diagnostic and Interventional Radiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Lukas Hobohm
- Center for Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Christoph Düber
- Department of Diagnostic and Interventional Radiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Peter Mildenberger
- Department of Diagnostic and Interventional Radiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Lukas Müller
- Department of Diagnostic and Interventional Radiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
13
|
Sajjadi SM, Mohebbi A, Ehsani A, Marashi A, Azhdarimoghaddam A, Karami S, Karimi MA, Sadeghi M, Firoozi K, Mohammad Zamani A, Rigi A, Nayebagha M, Asadi Anar M, Eini P, Salehi S, Rostami Ghezeljeh M. Identifying abdominal aortic aneurysm size and presence using Natural Language Processing of radiology reports: a systematic review and meta-analysis. Abdom Radiol (NY) 2025:10.1007/s00261-025-04810-5. [PMID: 39883167 DOI: 10.1007/s00261-025-04810-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Revised: 01/10/2025] [Accepted: 01/16/2025] [Indexed: 01/31/2025]
Abstract
BACKGROUND AND AIM Prior investigations of the natural history of abdominal aortic aneurysms (AAAs) have been constrained by small sample sizes or uneven assessments of aggregated data. Natural language processing (NLP) can significantly enhance the investigation and treatment of patients with AAAs by swiftly and effectively collecting imaging data from health records. This meta-analysis aimed to evaluate the efficacy of NLP techniques in reliably identifying the existence or absence of AAAs and measuring the maximal abdominal aortic diameter in extensive datasets of radiology study reports. METHOD The PubMed, Scopus, Web of Science, Embase, and Science Direct databases were searched until March 2024 to obtain pertinent papers. The RAYYAN intelligent tool for systematic reviews was utilized to screen the studies. The meta-analysis was conducted using STATA v18 software. Egger's test was employed to evaluate publication bias. The Newcastle Ottawa Scale was employed to assess the quality of the listed studies. A plot digitizer was employed to extract digital data. RESULT A total of 39,094 individuals with AAA were included in this analysis. Twenty-seven thousand three hundred twenty-six patients were male, and 11,383 were female. The mean age of the total participants was 73.1 ± 1.25 years. Analysis results for pooled estimation of performance variables such as: The sensitivity, specificity, precision, and accuracy of the implemented NLP model were analyzed as follows: 0.89(0.88-0.91), 0.88 (0.87-0.89), 0.92 (0.89-0.95), and 0.91 (0.89-0.93) respectively. The aneurysm diameter size difference reported in follow-up before and after NLP implementation in the included studies showed a 0.05 cm reduction in size, which was statistically significant. CONCLUSION NLP holds great potential for automating the detection of AAA size and presence in radiology reports, enhancing efficiency and scalability over manual review. However, challenges persist. Variability in report formats, terminology, and unstructured data can compromise accuracy. Additionally, NLP models rely on high-quality, annotated training datasets, which may be incomplete or unrepresentative. While NLP aids in identifying AAA-related data, human oversight is essential to ensure decisions are informed by the patient's broader clinical context. Ongoing algorithm refinement and seamless integration into clinical workflows are key to improving NLP's utility and reliability in this field.
Collapse
Affiliation(s)
| | - Alisa Mohebbi
- Tehran University of Medical Sciences, Tehran, Islamic Republic of Iran
| | | | - Amir Marashi
- Shahid Beheshti University of Medical Sciences, Tehran, Islamic Republic of Iran
| | | | - Shaghayegh Karami
- Tehran University of Medical Sciences, Tehran, Islamic Republic of Iran
| | - Mohammad Amin Karimi
- Shahid Beheshti University of Medical Sciences, Tehran, Islamic Republic of Iran
| | - Mahsa Sadeghi
- Tehran University of Medical Sciences, Tehran, Islamic Republic of Iran
| | - Kiana Firoozi
- Gonabad University of Medical Sciences, Gonābād, Islamic Republic of Iran
| | - Amir Mohammad Zamani
- Ahvaz Jundishapur University of Medical Sciences, Ahvāz, Islamic Republic of Iran
| | - Amirhossein Rigi
- Shahid Beheshti University of Medical Sciences, Tehran, Islamic Republic of Iran
| | - Melika Nayebagha
- Shahid Beheshti University of Medical Sciences, Tehran, Islamic Republic of Iran
| | | | - Pooya Eini
- Shahid Beheshti University of Medical Sciences, Tehran, Islamic Republic of Iran
| | - Sadaf Salehi
- Iran University of Medical Sciences, Tehran, Islamic Republic of Iran
| | | |
Collapse
|
14
|
Omar M, Nassar S, SharIf K, Glicksberg BS, Nadkarni GN, Klang E. Emerging applications of NLP and large language models in gastroenterology and hepatology: a systematic review. Front Med (Lausanne) 2025; 11:1512824. [PMID: 39917263 PMCID: PMC11799763 DOI: 10.3389/fmed.2024.1512824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Accepted: 12/09/2024] [Indexed: 02/09/2025] Open
Abstract
Background and aim In the last years, natural language processing (NLP) has transformed significantly with the introduction of large language models (LLM). This review updates on NLP and LLM applications and challenges in gastroenterology and hepatology. Methods Registered with PROSPERO (CRD42024542275) and adhering to PRISMA guidelines, we searched six databases for relevant studies published from 2003 to 2024, ultimately including 57 studies. Results Our review of 57 studies notes an increase in relevant publications in 2023-2024 compared to previous years, reflecting growing interest in newer models such as GPT-3 and GPT-4. The results demonstrate that NLP models have enhanced data extraction from electronic health records and other unstructured medical data sources. Key findings include high precision in identifying disease characteristics from unstructured reports and ongoing improvement in clinical decision-making. Risk of bias assessments using ROBINS-I, QUADAS-2, and PROBAST tools confirmed the methodological robustness of the included studies. Conclusion NLP and LLMs can enhance diagnosis and treatment in gastroenterology and hepatology. They enable extraction of data from unstructured medical records, such as endoscopy reports and patient notes, and for enhancing clinical decision-making. Despite these advancements, integrating these tools into routine practice is still challenging. Future work should prospectively demonstrate real-world value.
Collapse
Affiliation(s)
- Mahmud Omar
- Maccabi Health Services, Tel Aviv, Israel
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | | | - Kassem SharIf
- Department of Gastroenterology, Sheba Medical Center, Tel HaShomer, Israel
| | - Benjamin S. Glicksberg
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Girish N. Nadkarni
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
15
|
Fu T, Berlin S, Gupta A, Sommer J. Automated Incidental Findings Notification Through the Electronic Health Record Utilizing Dictation Macros. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-024-01357-7. [PMID: 39806184 DOI: 10.1007/s10278-024-01357-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 11/11/2024] [Accepted: 11/24/2024] [Indexed: 01/16/2025]
Abstract
The objective of this study is to implement an actionable incidental findings (AIFs) communication workflow integrated into the electronic health record (EHR) using dictation macros to improve the quality of radiology reports and facilitate delivery of findings to clinicians. The workflow was implemented across an academic multi-hospital health system and used by over 100 radiologists from 12 divisions. Standardized macros were created for different organ systems including the thyroid, lungs, liver, pancreas, spleen, kidney, female reproductive, and others, designed based on the ACR Novel Quality Measure Set. All macros contained special codes enabling automated notification of clinicians in Epic EHR and unique codes to allow for tracking. When notified, clinicians can fast track ordering of follow-up imaging exams. All alerts were monitored by radiology operations who ensured messages were acknowledged within 73 h. From September 2023 to March 2024, 12,919 AIFs alerts were filed for 10,766 patients. Median age was 65 years, and 63.6% were female and 36.4% were male. Most alerts were submitted for outpatients (73.5%), and a majority originated from CT exams (57.3%) followed by radiographs (12.2%) and ultrasound (11.5%). Number of submissions per radiologist ranged from 0 to 930 with a median of 62. Median time to alert acknowledgment was 8.1 h, and 93.9% were acknowledged within 73 h. Follow-up orders were placed for 62.3% of patients. A standardized AIFs communication workflow utilizing dictation macros can help facilitate delivery of findings and follow-up recommendations to clinicians.
Collapse
Affiliation(s)
- Tianyuan Fu
- University Hospitals Cleveland Medical Center, Case Western Reserve University, 11100 Euclid Avenue, BSH 5056, Cleveland, OH, 44106, USA.
| | - Sheila Berlin
- University Hospitals Cleveland Medical Center, Case Western Reserve University, 11100 Euclid Avenue, BSH 5056, Cleveland, OH, 44106, USA
| | - Amit Gupta
- University Hospitals Cleveland Medical Center, Case Western Reserve University, 11100 Euclid Avenue, BSH 5056, Cleveland, OH, 44106, USA
| | - Jennifer Sommer
- University Hospitals Cleveland Medical Center, Case Western Reserve University, 11100 Euclid Avenue, BSH 5056, Cleveland, OH, 44106, USA
| |
Collapse
|
16
|
Fathi M, Vakili K, Hajibeygi R, Bahrami A, Behzad S, Tafazolimoghadam A, Aghabozorgi H, Eshraghi R, Bhatt V, Gholamrezanezhad A. Cultivating diagnostic clarity: The importance of reporting artificial intelligence confidence levels in radiologic diagnoses. Clin Imaging 2025; 117:110356. [PMID: 39566394 DOI: 10.1016/j.clinimag.2024.110356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 11/01/2024] [Accepted: 11/09/2024] [Indexed: 11/22/2024]
Abstract
Accurate image interpretation is essential in the field of radiology to the healthcare team in order to provide optimal patient care. This article discusses the use of artificial intelligence (AI) confidence levels to enhance the accuracy and dependability of its radiological diagnoses. The current advances in AI technologies have changed how radiologists and clinicians make the diagnoses of pathological conditions such as aneurysms, hemorrhages, pneumothorax, pneumoperitoneum, and particularly fractures. To enhance the utility of these AI models, radiologists need a more comprehensive understanding of the model's levels of confidence and certainty behind the results they produce. This allows radiologists to make more informed decisions that have the potential to drastically change a patient's clinical management. Several AI models, especially those utilizing deep learning models (DL) with convolutional neural networks (CNNs), have demonstrated significant potential in identifying subtle findings in medical imaging that are often missed by radiologists. It is necessary to create standardized levels of confidence metrics in order for AI systems to be relevant and reliable in the clinical setting. Incorporating AI into clinical practice does have certain obstacles like the need for clinical validation, concerns regarding the interpretability of AI system results, and addressing confusion and misunderstandings within the medical community. This study emphasizes the importance of AI systems to clearly convey their level of confidence in radiological diagnosis. This paper highlights the importance of conducting research to establish AI confidence level metrics that are limited to a specific anatomical region or lesion type. KEY POINT OF THE VIEW: Accurate fracture diagnosis relies on radiologic certainty, where Artificial intelligence (AI), especially convolutional neural networks (CNNs) and deep learning (DL), shows promise in enhancing X-ray interpretation amidst a shortage of radiologists. Overcoming integration challenges through improved AI interpretability and education is crucial for widespread acceptance and better patient outcomes.
Collapse
Affiliation(s)
- Mobina Fathi
- Advanced Diagnostic and Interventional Radiology Research Center (ADIR), Tehran University of Medical Science, Tehran, Iran; School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Kimia Vakili
- School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ramtin Hajibeygi
- Advanced Diagnostic and Interventional Radiology Research Center (ADIR), Tehran University of Medical Science, Tehran, Iran; Tehran University of Medical Science (TUMS), School of Medicine, Tehran, Iran
| | - Ashkan Bahrami
- Faculty of Medicine, Kashan University of Medical Science, Kashan, Iran
| | - Shima Behzad
- Advanced Diagnostic and Interventional Radiology Research Center (ADIR), Tehran University of Medical Science, Tehran, Iran
| | | | - Hadiseh Aghabozorgi
- Student Research Committee, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | - Reza Eshraghi
- Faculty of Medicine, Kashan University of Medical Science, Kashan, Iran
| | - Vivek Bhatt
- University of California, Riverside, School of Medicine, Riverside, CA, United States of America
| | - Ali Gholamrezanezhad
- Keck School of Medicine of University of Southern California, Los Angeles, CA, United States of America; Department of Radiology, Cedars Sinai Hospital, Los Angeles, CA, United States of America.
| |
Collapse
|
17
|
Breitwieser M, Moore V, Wiesner T, Wichlas F, Deininger C. NLP-Driven Analysis of Pneumothorax Incidence Following Central Venous Catheter Procedures: A Data-Driven Re-Evaluation of Routine Imaging in Value-Based Medicine. Diagnostics (Basel) 2024; 14:2792. [PMID: 39767153 PMCID: PMC11674588 DOI: 10.3390/diagnostics14242792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 11/14/2024] [Accepted: 12/10/2024] [Indexed: 01/11/2025] Open
Abstract
Background: This study presents a systematic approach using a natural language processing (NLP) algorithm to assess the necessity of routine imaging after central venous catheter (CVC) placement and removal. With pneumothorax being a key complication of CVC procedures, this research aims to provide evidence-based recommendations for optimizing imaging protocols and minimizing unnecessary imaging risks. Methods: We analyzed electronic health records from four university hospitals in Salzburg, Austria, focusing on X-rays performed between 2012 and 2021 following CVC procedures. A custom-built NLP algorithm identified cases of pneumothorax from radiologists' reports and clinician requests, while excluding cases with contraindications such as chest injuries, prior pneumothorax, or missing data. Chi-square tests were used to compare pneumothorax rates between CVC insertion and removal, and multivariate logistic regression identified risk factors, with a focus on age and gender. Results: This study analyzed 17,175 cases of patients aged 18 and older, with 95.4% involving CVC insertion and 4.6% involving CVC removal. Pneumothorax was observed in 106 cases post-insertion (1.3%) and in 3 cases post-removal (0.02%), with no statistically significant difference between procedures (p = 0.5025). The NLP algorithm achieved an accuracy of 93%, with a sensitivity of 97.9%, a specificity of 87.9%, and an area under the ROC curve (AUC) of 0.9283. Conclusions: The findings indicate no significant difference in pneumothorax incidence between CVC insertion and removal, supporting existing recommendations against routine imaging post-removal for asymptomatic patients and suggesting that routine imaging after CVC insertion may also be unnecessary in similar cases. This study demonstrates how advanced NLP techniques can support value-based medicine by enhancing clinical decision making and optimizing resources.
Collapse
Affiliation(s)
- Martin Breitwieser
- Department for Orthopedic Surgery and Traumatology, Paracelsus Medical University, 5020 Salzburg, Austria; (V.M.); (F.W.); (C.D.)
| | | | | | | | | |
Collapse
|
18
|
Guellil I, Wu J, Pradipta Gema A, Francis F, Berrachedi Y, Chenni N, Tobin R, Llewellyn C, Arakelyan S, Wu H, Guthrie B, Alex B. Natural language processing for detecting adverse drug events: A systematic review protocol. NIHR OPEN RESEARCH 2024; 3:67. [PMID: 39931191 PMCID: PMC11808655 DOI: 10.3310/nihropenres.13504.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/07/2025] [Indexed: 02/13/2025]
Abstract
Background Detecting Adverse Drug Events (ADEs) is an emerging research area, attracting great interest in the research community. Better anticipatory management of predisposing factors has considerable potential to improve outcomes. Automatic extraction of ADEs using Natural Language Processing (NLP) has a great potential to significantly facilitate efficient and effective distillation of such knowledge, to better understand and predict risk of adverse events. Methods This systematic review follows the six-stage including the literature from 6 databases (Embase, Medline, Web Of Science Core Collection, ACM Guide to Computing Literature, IEEE Digital Library and Scopus). Following the title, abstract and full-text screenings, characteristics and main findings of the included studies and resources will be tabulated and summarized. The risk of bias and reporting quality was assessed using the PROBAST tool. Results We developed our search strategy and collected all relevant publications. As of December 2024, we have completed all the stages of the systematic review. We identified 178 studies for inclusion through the academic literature search (where data was extracted from all of the papers). Right now, we are writing up the systematic review paper where we are synthesising the different findings. Further refinement of the eligibility criteria and data extraction has been ongoing since August 2022. Conclusion In this systematic review, we will identify and consolidate information and evidence related to the use and effectiveness of existing NLP approaches and tools for automatically detecting ADEs from free text (discharge summaries, General Practitioner notes, social media, etc.). Our findings will improve the understanding of the current landscape of the use of NLP for extracting ADEs. It will lead to better anticipatory management of predisposing factors with the potential to improve outcomes considerably. Our results will also be valuable both to NLP researchers developing methods to extract ADEs and to translational/clinical researchers who use NLP for this purpose and in healthcare in general. For example, from our initial analysis of the studies, we can conclude that the majority of the proposed works are about the detection (extraction) of ADEs from text. An important portion of studies also focus on the binary classification of text (for highlighting if it includes or not ADEs). Different challenges related to the unbalanced dataset, abbreviations and acronyms but also to the lower results with rare ADEs were also mentioned by the studied papers.
Collapse
Affiliation(s)
- Imane Guellil
- The University of Edinburgh, Edinburgh, Scotland, UK
| | - Jinge Wu
- University College London, London, England, UK
| | | | - Farah Francis
- The University of Edinburgh, Edinburgh, Scotland, UK
| | - Yousra Berrachedi
- Ecole nationale Superieure d'Informatique,ESI, Alger, Algiers, Algeria
| | | | - Richard Tobin
- The University of Edinburgh, Edinburgh, Scotland, UK
| | | | | | - Honghan Wu
- University College London, London, England, UK
| | - Bruce Guthrie
- The University of Edinburgh, Edinburgh, Scotland, UK
| | - Beatrice Alex
- The University of Edinburgh, Edinburgh, Scotland, UK
| |
Collapse
|
19
|
Lastrucci A, Wandael Y, Barra A, Ricci R, Pirrera A, Lepri G, Gulino RA, Miele V, Giansanti D. Revolutionizing Radiology with Natural Language Processing and Chatbot Technologies: A Narrative Umbrella Review on Current Trends and Future Directions. J Clin Med 2024; 13:7337. [PMID: 39685793 DOI: 10.3390/jcm13237337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 11/18/2024] [Accepted: 11/26/2024] [Indexed: 12/18/2024] Open
Abstract
The application of chatbots and NLP in radiology is an emerging field, currently characterized by a growing body of research. An umbrella review has been proposed utilizing a standardized checklist and quality control procedure for including scientific papers. This review explores the early developments and potential future impact of these technologies in radiology. The current literature, comprising 15 systematic reviews, highlights potentialities, opportunities, areas needing improvements, and recommendations. This umbrella review offers a comprehensive overview of the current landscape of natural language processing (NLP) and natural language models (NLMs), including chatbots, in healthcare. These technologies show potential for improving clinical decision-making, patient engagement, and communication across various medical fields. However, significant challenges remain, particularly the lack of standardized protocols, which raises concerns about the reliability and consistency of these tools in different clinical contexts. Without uniform guidelines, variability in outcomes may hinder the broader adoption of NLP/NLM technologies by healthcare providers. Moreover, the limited research on how these technologies intersect with medical devices (MDs) is a notable gap in the literature. Future research must address these challenges to fully realize the potential of NLP/NLM applications in healthcare. Key future research directions include the development of standardized protocols to ensure the consistent and safe deployment of NLP/NLM tools, particularly in high-stake areas like radiology. Investigating the integration of these technologies with MD workflows will be crucial to enhance clinical decision-making and patient care. Ethical concerns, such as data privacy, informed consent, and algorithmic bias, must also be explored to ensure responsible use in clinical settings. Longitudinal studies are needed to evaluate the long-term impact of these technologies on patient outcomes, while interdisciplinary collaboration between healthcare professionals, data scientists, and ethicists is essential for driving innovation in an ethically sound manner. Addressing these areas will advance the application of NLP/NLM technologies and improve patient care in this emerging field.
Collapse
Affiliation(s)
- Andrea Lastrucci
- Department of Allied Health Professions, Azienda Ospedaliero-Universitaria Careggi, 50134 Florence, Italy
| | - Yannick Wandael
- Department of Allied Health Professions, Azienda Ospedaliero-Universitaria Careggi, 50134 Florence, Italy
| | - Angelo Barra
- Department of Allied Health Professions, Azienda Ospedaliero-Universitaria Careggi, 50134 Florence, Italy
| | - Renzo Ricci
- Department of Allied Health Professions, Azienda Ospedaliero-Universitaria Careggi, 50134 Florence, Italy
| | | | - Graziano Lepri
- Azienda Unità Sanitaria Locale Umbria 1, Via Guerriero Guerra 21, 06127 Perugia, Italy
| | - Rosario Alfio Gulino
- Facoltà di Ingegneria, Università di Tor Vergata, Via del Politecnico, 1, 00133 Rome, Italy
| | - Vittorio Miele
- Department of Experimental Clinical and Biomedical Sciences, University of Florence, 50134 Florence, Italy
- Department of Radiology, Careggi University Hospital, 50134 Florence, Italy
| | | |
Collapse
|
20
|
Lee JJ, Zepeda A, Arbour G, Isaac KV, Ng RT, Nichol AM. Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing. JCO Clin Cancer Inform 2024; 8:e2400107. [PMID: 39705642 DOI: 10.1200/cci.24.00107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 08/15/2024] [Accepted: 10/18/2024] [Indexed: 12/22/2024] Open
Abstract
PURPOSE Breast cancer relapses are rarely collected by cancer registries because of logistical and financial constraints. Hence, we investigated natural language processing (NLP), enhanced with state-of-the-art deep learning transformer tools and large language models, to automate relapse identification in the text of computed tomography (CT) reports. METHODS We analyzed follow-up CT reports from patients diagnosed with breast cancer between January 1, 2005, and December 31, 2014. The reports were curated and annotated for the presence or absence of local, regional, and distant breast cancer relapses. We performed 10-fold cross-validation to evaluate models identifying different types of relapses in CT reports. Model performance was assessed with classification metrics, reported with 95% confidence intervals. RESULTS In our data set of 1,445 CT reports, 799 (55.3%) described any relapse, 72 (5.0%) local relapses, 97 (6.7%) regional relapses, and 743 (51.4%) distant relapses. The any-relapse model achieved an accuracy of 89.6% (87.8-91.1), with a sensitivity of 93.2% (91.4-94.9) and a specificity of 84.2% (80.9-87.1). The local relapse model achieved an accuracy of 94.6% (93.3-95.7), a sensitivity of 44.4% (32.8-56.3), and a specificity of 97.2% (96.2-98.0). The regional relapse model showed an accuracy of 93.6% (92.3-94.9), a sensitivity of 70.1% (60.0-79.1), and a specificity of 95.3% (94.2-96.5). Finally, the distant relapse model demonstrated an accuracy of 88.1% (86.2-89.7), a sensitivity of 91.8% (89.9-93.8), and a specificity of 83.7% (80.5-86.4). CONCLUSION We developed NLP models to identify local, regional, and distant breast cancer relapses from CT reports. Automating the identification of breast cancer relapses can enhance data collection about patient outcomes.
Collapse
Affiliation(s)
- Jaimie J Lee
- Department of Radiation Oncology, BC Cancer, Vancouver, BC, Canada
- Department of Surgery, University of British Columbia, Vancouver, BC, Canada
| | - Andres Zepeda
- Department of Computer Science, University of British Columbia, Vancouver, BC, Canada
| | - Gregory Arbour
- Department of Computer Science, University of British Columbia, Vancouver, BC, Canada
| | - Kathryn V Isaac
- Department of Surgery, University of British Columbia, Vancouver, BC, Canada
| | - Raymond T Ng
- Department of Computer Science, University of British Columbia, Vancouver, BC, Canada
| | - Alan M Nichol
- Department of Radiation Oncology, BC Cancer, Vancouver, BC, Canada
- Department of Surgery, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
21
|
Xiang RF. Use of n-grams and K-means clustering to classify data from free text bone marrow reports. J Pathol Inform 2024; 15:100358. [PMID: 38292072 PMCID: PMC10825612 DOI: 10.1016/j.jpi.2023.100358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/10/2023] [Accepted: 12/23/2023] [Indexed: 02/01/2024] Open
Abstract
Natural language processing (NLP) has been used to extract information from and summarize medical reports. Currently, the most advanced NLP models require large training datasets of accurately labeled medical text. An approach to creating these large datasets is to use low resource intensive classical NLP algorithms. In this manuscript, we examined how an automated classical NLP algorithm was able to classify portions of bone marrow report text into their appropriate sections. A total of 1480 bone marrow reports were extracted from the laboratory information system of a tertiary healthcare network. The free text of these bone marrow reports were preprocessed by separating the reports into text blocks and then removing the section headers. A natural language processing algorithm involving n-grams and K-means clustering was used to classify the text blocks into their appropriate bone marrow sections. The impact of token replacement of numerical values, accession numbers, and clusters of differentiation, varying the number of centroids (1-19) and n-grams (1-5), and utilizing an ensemble algorithm were assessed. The optimal NLP model was found to employ an ensemble algorithm that incorporated token replacement, utilized 1-gram or bag of words, and 10 centroids for K-means clustering. This optimal model was able to classify text blocks with an accuracy of 89%, suggesting that classical NLP models can accurately classify portions of marrow report text.
Collapse
Affiliation(s)
- Richard F. Xiang
- Department of Pathology and Laboratory Medicine, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
22
|
Chen LC, Zack T, Demirci A, Sushil M, Miao B, Kasap C, Butte A, Collisson EA, Hong JC. Assessing Large Language Models for Oncology Data Inference From Radiology Reports. JCO Clin Cancer Inform 2024; 8:e2400126. [PMID: 39661914 DOI: 10.1200/cci.24.00126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 08/14/2024] [Accepted: 09/23/2024] [Indexed: 12/13/2024] Open
Abstract
PURPOSE We examined the effectiveness of proprietary and open large language models (LLMs) in detecting disease presence, location, and treatment response in pancreatic cancer from radiology reports. METHODS We analyzed 203 deidentified radiology reports, manually annotated for disease status, location, and indeterminate nodules needing follow-up. Using generative pre-trained transformer (GPT)-4, GPT-3.5-turbo, and open models such as Gemma-7B and Llama3-8B, we employed strategies such as ablation and prompt engineering to boost accuracy. Discrepancies between human and model interpretations were reviewed by a secondary oncologist. RESULTS Among 164 patients with pancreatic tumor, GPT-4 showed the highest accuracy in inferring disease status, achieving a 75.5% correctness (F1-micro). Open models Mistral-7B and Llama3-8B performed comparably, with accuracies of 68.6% and 61.4%, respectively. Mistral-7B excelled in deriving correct inferences from objective findings directly. Most tested models demonstrated proficiency in identifying disease containing anatomic locations from a list of choices, with GPT-4 and Llama3-8B showing near-parity in precision and recall for disease site identification. However, open models struggled with differentiating benign from malignant postsurgical changes, affecting their precision in identifying findings indeterminate for cancer. A secondary review occasionally favored GPT-3.5's interpretations, indicating the variability in human judgment. CONCLUSION LLMs, especially GPT-4, are proficient in deriving oncologic insights from radiology reports. Their performance is enhanced by effective summarization strategies, demonstrating their potential in clinical support and health care analytics. This study also underscores the possibility of zero-shot open model utility in environments where proprietary models are restricted. Finally, by providing a set of annotated radiology reports, this paper presents a valuable data set for further LLM research in oncology.
Collapse
Affiliation(s)
- Li-Ching Chen
- University of California, Berkeley, Berkeley, CA
- University of California, San Francisco, San Francisco, CA
| | - Travis Zack
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA
| | - Arda Demirci
- University of California, Berkeley, Berkeley, CA
| | - Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA
| | - Brenda Miao
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA
| | - Corynn Kasap
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA
| | - Atul Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA
| | - Eric A Collisson
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA
| | - Julian C Hong
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA
| |
Collapse
|
23
|
Valente AS, Trunfio TA, Aiello M, Baldi D, Baldi M, Imbò S, Russo MA, Cavaliere C, Franzese M. Text mining approach for feature extraction and cartilage disease grade classification using knee MRI radiology reports. Comput Struct Biotechnol J 2024; 24:622-629. [PMID: 39963548 PMCID: PMC11832019 DOI: 10.1016/j.csbj.2024.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 10/01/2024] [Accepted: 10/01/2024] [Indexed: 02/20/2025] Open
Abstract
MRI radiology reporting processes can be improved by exploiting structured and semantically labelled data that can be fed to artificial intelligence (AI) tools. AI-based tools assisting radiology reporting can help to automatically individuate cartilage grading in textual magnetic resonance imaging (MRI) reports, thus supporting clinicians' decisions regarding medical imaging utilisation, diagnosis and treatment. In this study, we extracted information (clinical findings, observations, anatomical regions, etc.) and classified knee cartilage degradation from medical reports utilising transfer-learning techniques applied to the Bidirectional Encoder Representations from Transformers (BERT) model and its variants, pre-trained on an Italian-language corpus. To realise this objective, we used a dataset of 750 MRI knee reports written by three radiologists who contributed to a manual annotation process to perform text classification (TC) and named entity recognition (NER) tasks. The dataset was obtained from an internal database of the IRCCS SYNLAB SDN. Seventy percent of the dataset was used for training, 10% was used for validation and 20% was used for testing. The best-performing configurations for NER and TC tasks were based on the pre-trained BERT model. The macro F1-scores obtained with the NER and TC models are 0.89 and 0.81, respectively. The accuracies calculated on the test set for both tasks are 0.96 and 0.99, respectively.
Collapse
Affiliation(s)
| | - Teresa Angela Trunfio
- University of Naples Federico II, Department of Advanced Biomedical Sciences, Via Pansini, 5, 80131, Naples, Italy
| | - Marco Aiello
- IRCCS SYNLAB SDN, Via E. Gianturco, 113, 80143, Naples, Italy
| | - Dario Baldi
- IRCCS SYNLAB SDN, Via E. Gianturco, 113, 80143, Naples, Italy
| | - Marilena Baldi
- GESAN SRL, R&D Department, Via Torino, 14, 81020, San Nicola La Strada, Caserta, Italy
| | - Silvio Imbò
- GESAN SRL, R&D Department, Via Torino, 14, 81020, San Nicola La Strada, Caserta, Italy
| | | | - Carlo Cavaliere
- IRCCS SYNLAB SDN, Via E. Gianturco, 113, 80143, Naples, Italy
| | - Monica Franzese
- IRCCS SYNLAB SDN, Via E. Gianturco, 113, 80143, Naples, Italy
| |
Collapse
|
24
|
Cho HN, Jun TJ, Kim YH, Kang H, Ahn I, Gwon H, Kim Y, Seo J, Choi H, Kim M, Han J, Kee G, Park S, Ko S. Task-Specific Transformer-Based Language Models in Health Care: Scoping Review. JMIR Med Inform 2024; 12:e49724. [PMID: 39556827 PMCID: PMC11612605 DOI: 10.2196/49724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 07/10/2023] [Accepted: 10/21/2024] [Indexed: 11/20/2024] Open
Abstract
BACKGROUND Transformer-based language models have shown great potential to revolutionize health care by advancing clinical decision support, patient interaction, and disease prediction. However, despite their rapid development, the implementation of transformer-based language models in health care settings remains limited. This is partly due to the lack of a comprehensive review, which hinders a systematic understanding of their applications and limitations. Without clear guidelines and consolidated information, both researchers and physicians face difficulties in using these models effectively, resulting in inefficient research efforts and slow integration into clinical workflows. OBJECTIVE This scoping review addresses this gap by examining studies on medical transformer-based language models and categorizing them into 6 tasks: dialogue generation, question answering, summarization, text classification, sentiment analysis, and named entity recognition. METHODS We conducted a scoping review following the Cochrane scoping review protocol. A comprehensive literature search was performed across databases, including Google Scholar and PubMed, covering publications from January 2017 to September 2024. Studies involving transformer-derived models in medical tasks were included. Data were categorized into 6 key tasks. RESULTS Our key findings revealed both advancements and critical challenges in applying transformer-based models to health care tasks. For example, models like MedPIR involving dialogue generation show promise but face privacy and ethical concerns, while question-answering models like BioBERT improve accuracy but struggle with the complexity of medical terminology. The BioBERTSum summarization model aids clinicians by condensing medical texts but needs better handling of long sequences. CONCLUSIONS This review attempted to provide a consolidated understanding of the role of transformer-based language models in health care and to guide future research directions. By addressing current challenges and exploring the potential for real-world applications, we envision significant improvements in health care informatics. Addressing the identified challenges and implementing proposed solutions can enable transformer-based language models to significantly improve health care delivery and patient outcomes. Our review provides valuable insights for future research and practical applications, setting the stage for transformative advancements in medical informatics.
Collapse
Affiliation(s)
- Ha Na Cho
- Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
| | - Tae Joon Jun
- Big Data Research Center, Asan Institute for Life Sciences, Asan Medical Center, Seoul, Republic of Korea
| | - Young-Hak Kim
- Division of Cardiology, Department of Information Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Heejun Kang
- Division of Cardiology, Asan Medical Center, Seoul, Republic of Korea
| | - Imjin Ahn
- Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
| | - Hansle Gwon
- Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
| | - Yunha Kim
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Jiahn Seo
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Heejung Choi
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Minkyoung Kim
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Jiye Han
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Gaeun Kee
- Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
| | - Seohyun Park
- Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
| | - Soyoung Ko
- Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
| |
Collapse
|
25
|
Lotfian G, Parekh K, Abdul Sami M, Suthar PP. Evaluation of ChatGPT 4.0 in Thoracic Imaging and Diagnostics. Cureus 2024; 16:e73741. [PMID: 39677135 PMCID: PMC11646414 DOI: 10.7759/cureus.73741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/15/2024] [Indexed: 12/17/2024] Open
Abstract
Recent advancements in natural language processing (NLP) have profoundly transformed the medical industry, enhancing large cohort data analysis, improving diagnostic capabilities, and streamlining clinical workflows. Among the leading tools in this domain is ChatGPT 4.0 (OpenAI, San Francisco, California, US), a commercial NLP model widely used across various applications. This study evaluates the diagnostic performance of ChatGPT 4.0 specifically in thoracic imaging by assessing its ability to answer diagnostic questions related to this field. We utilized the model to respond to multiple-choice questions derived from thoracic imaging scenarios, followed by rigorous statistical analysis to assess its accuracy and variability across different subgroups. Our analysis revealed significant variability across different subgroups. Overall, the model achieved an impressive accuracy of 84.9% in diagnosing thoracic radiology questions. It excelled in terminology and diagnostic signs, achieving perfect scores, and demonstrated strong performance in the intensive care and normal anatomy categories, with accuracies of 90% and 80%, respectively. In pathology subgroups, ChatGPT achieved an average accuracy of 89.1%, particularly excelling in diagnosing infectious pneumonia and atelectasis, though it scored lower in diffuse alveolar disease (66.7%). For disease-related questions, the mean accuracy was 79.1%, with perfect scores in several specific subcategories. However, accuracy was notably lower for vascular disease (50%) and lung cancer (66.7%). In conclusion, while ChatGPT 4.0 shows strong potential in diagnosing thoracic conditions, the variability identified underscores the necessity for ongoing research and refinement of its transformer architecture. This will enhance its reliability and applicability in broader clinical and patient care settings.
Collapse
Affiliation(s)
- Golnaz Lotfian
- Department of Diagnostic Radiology and Nuclear Medicine, Rush University Medical Center, Chicago, USA
| | - Keyur Parekh
- Department of Diagnostic Radiology and Nuclear Medicine, Rush University Medical Center, Chicago, USA
| | - Mohammed Abdul Sami
- Department of Diagnostic Radiology and Nuclear Medicine, Rush University Medical Center, Chicago, USA
| | - Pokhraj P Suthar
- Department of Diagnostic Radiology and Nuclear Medicine, Rush University Medical Center, Chicago, USA
| |
Collapse
|
26
|
Martín-Noguerol T, López-Úbeda P, Paulano-Godino F, Luna A. Natural language processing-based analysis of the level of adoption by expert radiologists of the ASSR, ASNR and NASS version 2.0 of lumbar disc nomenclature: an eight-year survey. Quant Imaging Med Surg 2024; 14:7780-7790. [PMID: 39544464 PMCID: PMC11558493 DOI: 10.21037/qims-23-1294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Accepted: 12/26/2023] [Indexed: 11/17/2024]
Abstract
Background The American Society of Spine Radiology (ASSR), American Society of Neuroradiology (ASNR), and North American Spine Society (NASS) published a consensus paper with recommendations for lumbar disc nomenclature reports in 2014. We aimed to evaluate the degree of adoption in our radiology department of the ASSR, ASNR, and NASS 2.0 lumbar spine consensus paper using natural language processing (NLP). Methods In March 2015 we gave in our radiology department, at HT Medica in Jaén (Spain) a lecture detailing the changes proposed in the ASSR, ASNR, and NASS consensus about lumbar disc nomenclature, version 2.0. We analyzed 34,064 lumbar spine magnetic resonance imaging (MRI) reports from three different expert radiologists (A, B, and C) performed from May 2010 to February 2015 (15,813 studies) and from March 2015 to February 2022 (18,251 studies). Using an NLP algorithm, we evaluated 29 old and new terms related to 4 different categories: disc with fissures of the annulus, degenerated disc, herniated disc, and location of the disc. Results A relevant decrease in the percentage of use of old terms was found for degenerated disc category (44.63% for radiologist B and 18.95% for radiologist C) and disc localization (18.86% for radiologist A and 27.73% for radiologist C). Relevant increments in the percentage of use of new lexicon were depicted for terms related to degenerated disc (32.48% for radiologist C), herniated disc (7.27% for radiologist A) and disc localization (36.53% for radiologist C). Conclusions NLP algorithms may help to manage large radiological report datasets to evaluate the impact and degree of adherence of radiologists to recommendations for the use of ASSR, ASNR and NASS lumbar disc nomenclature version 2.0.
Collapse
Affiliation(s)
| | | | | | - Antonio Luna
- MRI Unit, Radiology Department, HT Medica, Jaén, Spain
| |
Collapse
|
27
|
Yao J, Chu LC, Patlas M. Applications of Artificial Intelligence in Acute Abdominal Imaging. Can Assoc Radiol J 2024; 75:761-770. [PMID: 38715249 DOI: 10.1177/08465371241250197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024] Open
Abstract
Artificial intelligence (AI) is a rapidly growing field with significant implications for radiology. Acute abdominal pain is a common clinical presentation that can range from benign conditions to life-threatening emergencies. The critical nature of these situations renders emergent abdominal imaging an ideal candidate for AI applications. CT, radiographs, and ultrasound are the most common modalities for imaging evaluation of these patients. For each modality, numerous studies have assessed the performance of AI models for detecting common pathologies, such as appendicitis, bowel obstruction, and cholecystitis. The capabilities of these models range from simple classification to detailed severity assessment. This narrative review explores the evolution, trends, and challenges in AI applications for evaluating acute abdominal pathologies. We review implementations of AI for non-traumatic and traumatic abdominal pathologies, with discussion of potential clinical impact, challenges, and future directions for the technology.
Collapse
Affiliation(s)
- Jason Yao
- Department of Radiology, McMaster University, Hamilton, ON, Canada
| | - Linda C Chu
- Department of Radiology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Michael Patlas
- Department of Medical Imaging, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
28
|
Shankar SV, Dhingra LS, Aminorroaya A, Adejumo P, Nadkarni GN, Xu H, Brandt C, Oikonomou EK, Pedroso AF, Khera R. Automated Transformation of Unstructured Cardiovascular Diagnostic Reports into Structured Datasets Using Sequentially Deployed Large Language Models. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.10.08.24315035. [PMID: 39417094 PMCID: PMC11482995 DOI: 10.1101/2024.10.08.24315035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Background Rich data in cardiovascular diagnostic testing are often sequestered in unstructured reports, with the necessity of manual abstraction limiting their use in real-time applications in patient care and research. Methods We developed a two-step process that sequentially deploys generative and interpretative large language models (LLMs; Llama2 70b and Llama2 13b). Using a Llama2 70b model, we generated varying formats of transthoracic echocardiogram (TTE) reports from 3,000 real-world echo reports with paired structured elements, leveraging temporal changes in reporting formats to define the variations. Subsequently, we fine-tuned Llama2 13b using sequentially larger batches of generated echo reports as inputs, to extract data from free-text narratives across 18 clinically relevant echocardiographic fields. This was set up as a prompt-based supervised training task. We evaluated the fine-tuned Llama2 13b model, HeartDx-LM, on several distinct echocardiographic datasets: (i) reports across the different time periods and formats at Yale New Haven Health System (YNHHS), (ii) the Medical Information Mart for Intensive Care (MIMIC) III dataset, and (iii) the MIMIC IV dataset. We used the accuracy of extracted fields and Cohen's Kappa as the metrics and have publicly released the HeartDX-LM model. Results The HeartDX-LM model was trained on randomly selected 2,000 synthetic echo reports with varying formats and paired structured labels, with a wide range of clinical findings. We identified a lower threshold of 500 annotated reports required for fine-tuning Llama2 13b to achieve stable and consistent performance. At YNHHS, the HeartDx-LM model accurately extracted 69,144 out of 70,032 values (98.7%) across 18 clinical fields from unstructured reports in the test set from contemporary records where paired structured data were also available. In older echo reports where only unstructured reports were available, the model achieved 87.1% accuracy against expert annotations for the same 18 fields for a random sample of 100 reports. Similarly, in expert-annotated external validation sets from MIMIC-IV and MIMIC-III, HeartDx-LM correctly extracted 201 out of 220 available values (91.3%) and 615 out of 707 available values (87.9%), respectively, from 100 randomly chosen and expert annotated echo reports from each set. Conclusion We developed a novel method using paired large and moderate-sized LLMs to automate the extraction of unstructured echocardiographic reports into tabular datasets. Our approach represents a scalable strategy that transforms unstructured reports into computable elements that can be leveraged to improve cardiovascular care quality and enable research.
Collapse
Affiliation(s)
- Sumukh Vasisht Shankar
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Lovedeep S Dhingra
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Arya Aminorroaya
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Philip Adejumo
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Girish N Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT
| | - Cynthia Brandt
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT
| | - Evangelos K Oikonomou
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Aline F Pedroso
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Rohan Khera
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
- Section of Health Informatics, Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
- Center for Outcomes Research and Evaluation (CORE), Yale New Haven Hospital, New Haven, CT, USA
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT
| |
Collapse
|
29
|
Cai W. Uncovering Demographic Bias in Natural Language Processing Tools for Radiology. Radiology 2024; 313:e242723. [PMID: 39436296 DOI: 10.1148/radiol.242723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Affiliation(s)
- Wenli Cai
- From the Global Alliance for Intelligent Oncology, 62 Edgemere Rd, Quincy, MA 02169
| |
Collapse
|
30
|
Mittermeier A, Aßenmacher M, Schachtner B, Grosu S, Dakovic V, Kandratovich V, Sabel B, Ingrisch M. [Automatic ICD-10 coding : Natural language processing for German MRI reports]. RADIOLOGIE (HEIDELBERG, GERMANY) 2024; 64:793-800. [PMID: 39120724 DOI: 10.1007/s00117-024-01349-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/27/2024] [Indexed: 08/10/2024]
Abstract
BACKGROUND The medical coding of radiology reports is essential for a good quality of care and correct billing, but at the same time a complex and error-prone task. OBJECTIVE To assess the performance of natural language processing (NLP) for ICD-10 coding of German radiology reports using fine tuning of suitable language models. MATERIAL AND METHODS This retrospective study included all magnetic resonance imaging (MRI) radiology reports acquired at our institution between 2010 and 2020. The codes on discharge ICD-10 were matched to the corresponding reports to construct a dataset for multiclass classification. Fine tuning of GermanBERT and flanT5 was carried out on the total dataset (dstotal) containing 1035 different ICD-10 codes and 2 reduced subsets containing the 100 (ds100) and 50 (ds50) most frequent codes. The performance of the model was assessed using top‑k accuracy for k = 1, 3 and 5. In an ablation study both models were trained on the accompanying metadata and the radiology report alone. RESULTS The total dataset consisted of 100,672 radiology reports, the reduced subsets ds100 of 68,103 and ds50 of 52,293 reports. The performance of the model increased when several of the best predictions of the model were taken into consideration, when the number of target classes was reduced and the metadata were combined with the report. The flanT5 outperformed GermanBERT across all datasets and metrics and was is suited as a medical coding assistant, achieving a top 3 accuracy of nearly 70% in the real-world dataset dstotal. CONCLUSION Finely tuned language models can reliably predict ICD-10 codes of German magnetic resonance imaging (MRI) radiology reports across various settings. As a coding assistant flanT5 can guide medical coders to make informed decisions and potentially reduce the workload.
Collapse
Affiliation(s)
- Andreas Mittermeier
- Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland.
- Munich Center for Machine Learning (MCML), München, Deutschland.
| | | | - Balthasar Schachtner
- Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland
- Munich Center for Machine Learning (MCML), München, Deutschland
| | - Sergio Grosu
- Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland
| | - Vladana Dakovic
- Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland
| | - Viktar Kandratovich
- Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland
| | - Bastian Sabel
- Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland
| | - Michael Ingrisch
- Klinik und Poliklinik für Radiologie, LMU Klinikum, LMU München, München, Deutschland
- Munich Center for Machine Learning (MCML), München, Deutschland
| |
Collapse
|
31
|
Su Y, Babore YB, Kahn CE. A Large Language Model to Detect Negated Expressions in Radiology Reports. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01274-9. [PMID: 39322813 DOI: 10.1007/s10278-024-01274-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 08/28/2024] [Accepted: 09/12/2024] [Indexed: 09/27/2024]
Abstract
Natural language processing (NLP) is crucial to extract information accurately from unstructured text to provide insights for clinical decision-making, quality improvement, and medical research. This study compared the performance of a rule-based NLP system and a medical-domain transformer-based model to detect negated concepts in radiology reports. Using a corpus of 984 de-identified radiology reports from a large U.S.-based academic health system (1000 consecutive reports, excluding 16 duplicates), the investigators compared the rule-based medspaCy system and the Clinical Assertion and Negation Classification Bidirectional Encoder Representations from Transformers (CAN-BERT) system to detect negated expressions of terms from RadLex, the Unified Medical Language System Metathesaurus, and the Radiology Gamuts Ontology. Power analysis determined a sample size of 382 terms to achieve α = 0.05 and β = 0.8 for McNemar's test; based on an estimate of 15% negated terms, 2800 randomly selected terms were annotated manually as negated or not negated. Precision, recall, and F1 of the two models were compared using McNemar's test. Of the 2800 terms, 387 (13.8%) were negated. For negation detection, medspaCy attained a recall of 0.795, precision of 0.356, and F1 of 0.492. CAN-BERT achieved a recall of 0.785, precision of 0.768, and F1 of 0.777. Although recall was not significantly different, CAN-BERT had significantly better precision (χ2 = 304.64; p < 0.001). The transformer-based CAN-BERT model detected negated terms in radiology reports with high precision and recall; its precision significantly exceeded that of the rule-based medspaCy system. Use of this system will improve data extraction from textual reports to support information retrieval, AI model training, and discovery of causal relationships.
Collapse
Affiliation(s)
- Yvonne Su
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, 3400 Spruce Street, Philadelphia, 19104, PA, USA
| | - Yonatan B Babore
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, 3400 Spruce Street, Philadelphia, 19104, PA, USA
| | - Charles E Kahn
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, 3400 Spruce Street, Philadelphia, 19104, PA, USA.
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
32
|
Omar M, Naffaa ME, Glicksberg BS, Reuveni H, Nadkarni GN, Klang E. Advancing rheumatology with natural language processing: insights and prospects from a systematic review. Rheumatol Adv Pract 2024; 8:rkae120. [PMID: 39399162 PMCID: PMC11467191 DOI: 10.1093/rap/rkae120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 08/14/2024] [Indexed: 10/15/2024] Open
Abstract
Objectives Natural language processing (NLP) and large language models (LLMs) have emerged as powerful tools in healthcare, offering advanced methods for analysing unstructured clinical texts. This systematic review aims to evaluate the current applications of NLP and LLMs in rheumatology, focusing on their potential to improve disease detection, diagnosis and patient management. Methods We screened seven databases. We included original research articles that evaluated the performance of NLP models in rheumatology. Data extraction and risk of bias assessment were performed independently by two reviewers, following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. The Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies was used to evaluate the risk of bias. Results Of 1491 articles initially identified, 35 studies met the inclusion criteria. These studies utilized various data types, including electronic medical records and clinical notes, and employed models like Bidirectional Encoder Representations from Transformers and Generative Pre-trained Transformers. High accuracy was observed in detecting conditions such as RA, SpAs and gout. The use of NLP also showed promise in managing diseases and predicting flares. Conclusion NLP showed significant potential in enhancing rheumatology by improving diagnostic accuracy and personalizing patient care. While applications in detecting diseases like RA and gout are well developed, further research is needed to extend these technologies to rarer and more complex clinical conditions. Overcoming current limitations through targeted research is essential for fully realizing NLP's potential in clinical practice.
Collapse
Affiliation(s)
- Mahmud Omar
- Faculty of Medicine, Tel-Aviv University, Tel Aviv, Israel
| | | | - Benjamin S Glicksberg
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Hagar Reuveni
- Division of Diagnostic Imaging, Sheba Medical Center, Affiliated to Tel-Aviv University, Ramat Gan, Israel
| | - Girish N Nadkarni
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Eyal Klang
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
33
|
Zhao Y, Coppola A, Karamchandani U, Amiras D, Gupte CM. Artificial intelligence applied to magnetic resonance imaging reliably detects the presence, but not the location, of meniscus tears: a systematic review and meta-analysis. Eur Radiol 2024; 34:5954-5964. [PMID: 38386028 PMCID: PMC11364796 DOI: 10.1007/s00330-024-10625-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 12/24/2023] [Accepted: 01/13/2024] [Indexed: 02/23/2024]
Abstract
OBJECTIVES To review and compare the accuracy of convolutional neural networks (CNN) for the diagnosis of meniscal tears in the current literature and analyze the decision-making processes utilized by these CNN algorithms. MATERIALS AND METHODS PubMed, MEDLINE, EMBASE, and Cochrane databases up to December 2022 were searched in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) statement. Risk of analysis was used for all identified articles. Predictive performance values, including sensitivity and specificity, were extracted for quantitative analysis. The meta-analysis was divided between AI prediction models identifying the presence of meniscus tears and the location of meniscus tears. RESULTS Eleven articles were included in the final review, with a total of 13,467 patients and 57,551 images. Heterogeneity was statistically significantly large for the sensitivity of the tear identification analysis (I2 = 79%). A higher level of accuracy was observed in identifying the presence of a meniscal tear over locating tears in specific regions of the meniscus (AUC, 0.939 vs 0.905). Pooled sensitivity and specificity were 0.87 (95% confidence interval (CI) 0.80-0.91) and 0.89 (95% CI 0.83-0.93) for meniscus tear identification and 0.88 (95% CI 0.82-0.91) and 0.84 (95% CI 0.81-0.85) for locating the tears. CONCLUSIONS AI prediction models achieved favorable performance in the diagnosis, but not location, of meniscus tears. Further studies on the clinical utilities of deep learning should include standardized reporting, external validation, and full reports of the predictive performances of these models, with a view to localizing tears more accurately. CLINICAL RELEVANCE STATEMENT Meniscus tears are hard to diagnose in the knee magnetic resonance images. AI prediction models may play an important role in improving the diagnostic accuracy of clinicians and radiologists. KEY POINTS • Artificial intelligence (AI) provides great potential in improving the diagnosis of meniscus tears. • The pooled diagnostic performance for artificial intelligence (AI) in identifying meniscus tears was better (sensitivity 87%, specificity 89%) than locating the tears (sensitivity 88%, specificity 84%). • AI is good at confirming the diagnosis of meniscus tears, but future work is required to guide the management of the disease.
Collapse
Affiliation(s)
- Yi Zhao
- Imperial College London School of Medicine, Exhibition Rd, South Kensington, London, SW7 2BU, UK.
| | - Andrew Coppola
- Imperial College London School of Medicine, Exhibition Rd, South Kensington, London, SW7 2BU, UK
| | | | - Dimitri Amiras
- Imperial College London School of Medicine, Exhibition Rd, South Kensington, London, SW7 2BU, UK
- Imperial College London NHS Trust, London, UK
| | - Chinmay M Gupte
- Imperial College London School of Medicine, Exhibition Rd, South Kensington, London, SW7 2BU, UK
- Imperial College London NHS Trust, London, UK
| |
Collapse
|
34
|
Mostafa E, Hui A, Aasman B, Chowdary K, Mani K, Mardakhaev E, Zampolin R, Blumfield E, Berman J, Ramos RDLG, Fourman M, Yassari R, Eleswarapu A, Mirhaji P. Development of a natural language processing algorithm for the detection of spinal metastasis based on magnetic resonance imaging reports. NORTH AMERICAN SPINE SOCIETY JOURNAL 2024; 19:100513. [PMID: 39149563 PMCID: PMC11325227 DOI: 10.1016/j.xnsj.2024.100513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 06/25/2024] [Indexed: 08/17/2024]
Abstract
Background Metastasis to the spinal column is a common complication of malignancy, potentially causing pain and neurologic injury. An automated system to identify and refer patients with spinal metastases can help overcome barriers to timely treatment. We describe the training, optimization and validation of a natural language processing algorithm to identify the presence of vertebral metastasis and metastatic epidural cord compression (MECC) from radiology reports of spinal MRIs. Methods Reports from patients with spine MRI studies performed between January 1, 2008 and April 14, 2019 were reviewed by a team of radiologists to assess for the presence of cancer and generate a labeled dataset for model training. Using regular expression, impression sections were extracted from the reports and converted to all lower-case letters with all nonalphabetic characters removed. The reports were then tokenized and vectorized using the doc2vec algorithm. These were then used to train a neural network to predict the likelihood of spinal tumor or MECC. For each report, the model provided a number from 0 to 1 corresponding to its impression. We then obtained 111 MRI reports from outside the test set, 92 manually labeled negative and 19 with MECC to test the model's performance. Results About 37,579 radiology reports were reviewed. About 36,676 were labeled negative, and 903 with MECC. We chose a cutoff of 0.02 as a positive result to optimize for a low false negative rate. At this threshold we found a 100% sensitivity rate with a low false positive rate of 2.2%. Conclusions The NLP model described predicts the presence of spinal tumor and MECC in spine MRI reports with high accuracy. We plan to implement the algorithm into our EMR to allow for faster referral of these patients to appropriate specialists, allowing for reduced morbidity and increased survival.
Collapse
Affiliation(s)
- Evan Mostafa
- Department of Orthopaedic Surgery, Montefiore Medical Center, 111 E 210th St, Bronx, NY, 10467, United States
| | - Aaron Hui
- Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, 10461, NY, United States
| | - Boudewijn Aasman
- Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, 10461, NY, United States
| | - Kamlesh Chowdary
- Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, 10461, NY, United States
| | - Kyle Mani
- Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, 10461, NY, United States
| | - Edward Mardakhaev
- Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, 10461, NY, United States
| | - Richard Zampolin
- Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, 10461, NY, United States
| | - Einat Blumfield
- Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, 10461, NY, United States
| | - Jesse Berman
- Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, 10461, NY, United States
| | - Rafael De La Garza Ramos
- Department of Neurological Surgery, Montefiore Medical Center, 111 E 210th St, Bronx, NY, 10467, United States
| | - Mitchell Fourman
- Department of Orthopaedic Surgery, Montefiore Medical Center, 111 E 210th St, Bronx, NY, 10467, United States
| | - Reza Yassari
- Department of Neurological Surgery, Montefiore Medical Center, 111 E 210th St, Bronx, NY, 10467, United States
| | - Ananth Eleswarapu
- Department of Orthopaedic Surgery, Montefiore Medical Center, 111 E 210th St, Bronx, NY, 10467, United States
| | - Parsa Mirhaji
- Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, 10461, NY, United States
| |
Collapse
|
35
|
Reichenpfader D, Müller H, Denecke K. A scoping review of large language model based approaches for information extraction from radiology reports. NPJ Digit Med 2024; 7:222. [PMID: 39182008 PMCID: PMC11344824 DOI: 10.1038/s41746-024-01219-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 08/09/2024] [Indexed: 08/27/2024] Open
Abstract
Radiological imaging is a globally prevalent diagnostic method, yet the free text contained in radiology reports is not frequently used for secondary purposes. Natural Language Processing can provide structured data retrieved from these reports. This paper provides a summary of the current state of research on Large Language Model (LLM) based approaches for information extraction (IE) from radiology reports. We conduct a scoping review that follows the PRISMA-ScR guideline. Queries of five databases were conducted on August 1st 2023. Among the 34 studies that met inclusion criteria, only pre-transformer and encoder-based models are described. External validation shows a general performance decrease, although LLMs might improve generalizability of IE approaches. Reports related to CT and MRI examinations, as well as thoracic reports, prevail. Most common challenges reported are missing validation on external data and augmentation of the described methods. Different reporting granularities affect the comparability and transparency of approaches.
Collapse
Affiliation(s)
- Daniel Reichenpfader
- Institute for Patient-Centered Digital Health, Bern University of Applied Sciences, Biel/Bienne, Switzerland.
- Faculty of Medicine, University of Geneva, Geneva, Switzerland.
| | - Henning Müller
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
- Informatics Institute, HES-SO Valais-Wallis, Sierre, Switzerland
| | - Kerstin Denecke
- Institute for Patient-Centered Digital Health, Bern University of Applied Sciences, Biel/Bienne, Switzerland
| |
Collapse
|
36
|
Bergomi L, Buonocore TM, Antonazzo P, Alberghi L, Bellazzi R, Preda L, Bortolotto C, Parimbelli E. Reshaping free-text radiology notes into structured reports with generative question answering transformers. Artif Intell Med 2024; 154:102924. [PMID: 38964194 DOI: 10.1016/j.artmed.2024.102924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/22/2024] [Accepted: 06/25/2024] [Indexed: 07/06/2024]
Abstract
BACKGROUND Radiology reports are typically written in a free-text format, making clinical information difficult to extract and use. Recently, the adoption of structured reporting (SR) has been recommended by various medical societies thanks to the advantages it offers, e.g. standardization, completeness, and information retrieval. We propose a pipeline to extract information from Italian free-text radiology reports that fits with the items of the reference SR registry proposed by a national society of interventional and medical radiology, focusing on CT staging of patients with lymphoma. METHODS Our work aims to leverage the potential of Natural Language Processing and Transformer-based models to deal with automatic SR registry filling. With the availability of 174 Italian radiology reports, we investigate a rule-free generative Question Answering approach based on the Italian-specific version of T5: IT5. To address information content discrepancies, we focus on the six most frequently filled items in the annotations made on the reports: three categorical (multichoice), one free-text (free-text), and two continuous numerical (factual). In the preprocessing phase, we encode also information that is not supposed to be entered. Two strategies (batch-truncation and ex-post combination) are implemented to comply with the IT5 context length limitations. Performance is evaluated in terms of strict accuracy, f1, and format accuracy, and compared with the widely used GPT-3.5 Large Language Model. Unlike multichoice and factual, free-text answers do not have 1-to-1 correspondence with their reference annotations. For this reason, we collect human-expert feedback on the similarity between medical annotations and generated free-text answers, using a 5-point Likert scale questionnaire (evaluating the criteria of correctness and completeness). RESULTS The combination of fine-tuning and batch splitting allows IT5 ex-post combination to achieve notable results in terms of information extraction of different types of structured data, performing on par with GPT-3.5. Human-based assessment scores of free-text answers show a high correlation with the AI performance metrics f1 (Spearman's correlation coefficients>0.5, p-values<0.001) for both IT5 ex-post combination and GPT-3.5. The latter is better at generating plausible human-like statements, even if it systematically provides answers even when they are not supposed to be given. CONCLUSIONS In our experimental setting, a fine-tuned Transformer-based model with a modest number of parameters (i.e., IT5, 220 M) performs well as a clinical information extraction system for automatic SR registry filling task. It can extract information from more than one place in the report, elaborating it in a manner that complies with the response specifications provided by the SR registry (for multichoice and factual items), or that closely approximates the work of a human-expert (free-text items); with the ability to discern when an answer is supposed to be given or not to a user query.
Collapse
Affiliation(s)
- Laura Bergomi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
| | - Tommaso M Buonocore
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Paolo Antonazzo
- Diagnostic Imaging Unit, Department of Clinical, Surgical, Diagnostic, and Pediatric Sciences, University of Pavia, Pavia, Italy
| | - Lorenzo Alberghi
- Diagnostic Imaging Unit, Department of Clinical, Surgical, Diagnostic, and Pediatric Sciences, University of Pavia, Pavia, Italy
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy; LIM-IA - Laboratory of Medical Informatics and AI, IRCCS Istituti Clinici Scientifici Maugeri, Pavia, Italy
| | - Lorenzo Preda
- Diagnostic Imaging Unit, Department of Clinical, Surgical, Diagnostic, and Pediatric Sciences, University of Pavia, Pavia, Italy; Radiology Unit - Diagnostic Imaging I, Department of Diagnostic Medicine, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Chandra Bortolotto
- Diagnostic Imaging Unit, Department of Clinical, Surgical, Diagnostic, and Pediatric Sciences, University of Pavia, Pavia, Italy; Radiology Unit - Diagnostic Imaging I, Department of Diagnostic Medicine, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Enea Parimbelli
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| |
Collapse
|
37
|
Kim S, Kim SS, Kim E, Cecchini M, Park MS, Choi JA, Kim SH, Hwang HK, Kang CM, Choi HJ, Shin SJ, Kang J, Lee CK. Deep-Transfer-Learning-Based Natural Language Processing of Serial Free-Text Computed Tomography Reports for Predicting Survival of Patients With Pancreatic Cancer. JCO Clin Cancer Inform 2024; 8:e2400021. [PMID: 39151114 DOI: 10.1200/cci.24.00021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 04/22/2024] [Accepted: 07/10/2024] [Indexed: 08/18/2024] Open
Abstract
PURPOSE To explore the predictive potential of serial computed tomography (CT) radiology reports for pancreatic cancer survival using natural language processing (NLP). METHODS Deep-transfer-learning-based NLP models were retrospectively trained and tested with serial, free-text CT reports, and survival information of consecutive patients diagnosed with pancreatic cancer in a Korean tertiary hospital was extracted. Randomly selected patients with pancreatic cancer and their serial CT reports from an independent tertiary hospital in the United States were included in the external testing data set. The concordance index (c-index) of predicted survival and actual survival, and area under the receiver operating characteristic curve (AUROC) for predicting 1-year survival were calculated. RESULTS Between January 2004 and June 2021, 2,677 patients with 12,255 CT reports and 670 patients with 3,058 CT reports were allocated to training and internal testing data sets, respectively. ClinicalBERT (Bidirectional Encoder Representations from Transformers) model trained on the single, first CT reports showed a c-index of 0.653 and AUROC of 0.722 in predicting the overall survival of patients with pancreatic cancer. ClinicalBERT trained on up to 15 consecutive reports from the initial report showed an improved c-index of 0.811 and AUROC of 0.911. On the external testing set with 273 patients with 1,947 CT reports, the AUROC was 0.888, indicating the generalizability of our model. Further analyses showed our model's contextual interpretation beyond specific phrases. CONCLUSION Deep-transfer-learning-based NLP model of serial CT reports can predict the survival of patients with pancreatic cancer. Clinical decisions can be supported by the developed model, with survival information extracted solely from serial radiology reports.
Collapse
Affiliation(s)
- Sunkyu Kim
- Department of Computer Science and Engineering, Korea University, Seoul, Korea
| | - Seung-Seob Kim
- Department of Radiology and Research Institute of Radiological Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
- Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea
| | - Eejung Kim
- Department of Internal Medicine (Medical Oncology), Yale University School of Medicine, New Haven, CT
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Michael Cecchini
- Department of Internal Medicine (Medical Oncology), Yale University School of Medicine, New Haven, CT
| | - Mi-Suk Park
- Department of Radiology and Research Institute of Radiological Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
- Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea
| | - Ji A Choi
- Song-dang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, Korea
| | - Sung Hyun Kim
- Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea
- Department of Surgery, Yonsei University College of Medicine, Seoul, Korea
| | - Ho Kyoung Hwang
- Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea
- Department of Surgery, Yonsei University College of Medicine, Seoul, Korea
| | - Chang Moo Kang
- Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea
- Department of Surgery, Yonsei University College of Medicine, Seoul, Korea
| | - Hye Jin Choi
- Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea
- Division of Medical Oncology, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Korea
| | - Sang Joon Shin
- Song-dang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, Korea
- Division of Medical Oncology, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, Korea
- AIGEN Sciences Inc, Seoul, Korea
| | - Choong-Kun Lee
- Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea
- Song-dang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, Korea
- Division of Medical Oncology, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Korea
| |
Collapse
|
38
|
Tejani AS, Bialecki B, O’Donnell K, Sippel Schmidt T, Kohli MD, Alkasab T. Standardizing imaging findings representation: harnessing Common Data Elements semantics and Fast Healthcare Interoperability Resources structures. J Am Med Inform Assoc 2024; 31:1735-1742. [PMID: 38900188 PMCID: PMC11258419 DOI: 10.1093/jamia/ocae134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 05/22/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024] Open
Abstract
OBJECTIVES Designing a framework representing radiology results in a standards-based data structure using joint Radiological Society of North America/American College of Radiology Common Data Elements (CDEs) as the semantic labels on standard structures. This allows radiologist-created report data to integrate with artificial intelligence-generated results for use throughout downstream systems. MATERIALS AND METHODS We developed a framework modeling radiology findings as Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) observations using CDE set/element identifiers as standardized semantic labels. This framework deploys CDE identifiers to specify radiology findings and attributes, providing consistent labels for radiology report concepts-diagnoses, recommendations, tabular/quantitative data-with built-in integration with RadLex, SNOMED CT, LOINC, and other ontologies. Observation structures fit within larger HL7 FHIR DiagnosticReport resources, providing output including both nuanced text and structured data. RESULTS Labeling radiology findings as discrete data for interchange between systems requires two components: structure and semantics. CDE definitions provide semantic identifiers for findings and their component values. The FHIR observation resource specifies a structure for associating identifiers with radiology findings in the context of reports, with CDE-encoded observations referring to definitions for CDE identifiers in a central repository. The discussion includes an example of encoding pulmonary nodules on a chest CT as CDE-labeled observations, demonstrating the application of this framework to exchange findings throughout the imaging workflow, making imaging data available to downstream clinical systems. DISCUSSION CDE-labeled observations establish a lingua franca for encoding, exchanging, and consuming radiology data at the level of individual findings, facilitating use throughout healthcare systems. IMPORTANCE CDE-labeled FHIR observation objects can increase the value of radiology results by facilitating their use throughout patient care.
Collapse
Affiliation(s)
- Ali S Tejani
- Department of Radiology, UT Southwestern Medical Center, Dallas, TX 75390, United States
| | - Brian Bialecki
- Informatics, American College of Radiology, Reston, VA 20191, United States
| | - Kevin O’Donnell
- Connectivity, Standards, & Interoperability, Canon Medical Research United States Inc, Vernon Hills, IL 60061, United States
| | - Teri Sippel Schmidt
- Biomedical Informatics and Data Sciences Department, Johns Hopkins School of Medicine, Baltimore, MD 21205, United States
| | - Marc D Kohli
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA 94143, United States
| | - Tarik Alkasab
- Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, United States
| |
Collapse
|
39
|
Wieland-Jorna Y, van Kooten D, Verheij RA, de Man Y, Francke AL, Oosterveld-Vlug MG. Natural language processing systems for extracting information from electronic health records about activities of daily living. A systematic review. JAMIA Open 2024; 7:ooae044. [PMID: 38798774 PMCID: PMC11126158 DOI: 10.1093/jamiaopen/ooae044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 03/21/2024] [Accepted: 05/07/2024] [Indexed: 05/29/2024] Open
Abstract
Objective Natural language processing (NLP) can enhance research on activities of daily living (ADL) by extracting structured information from unstructured electronic health records (EHRs) notes. This review aims to give insight into the state-of-the-art, usability, and performance of NLP systems to extract information on ADL from EHRs. Materials and Methods A systematic review was conducted based on searches in Pubmed, Embase, Cinahl, Web of Science, and Scopus. Studies published between 2017 and 2022 were selected based on predefined eligibility criteria. Results The review identified 22 studies. Most studies (65%) used NLP for classifying unstructured EHR data on 1 or 2 ADL. Deep learning, combined with a ruled-based method or machine learning, was the approach most commonly used. NLP systems varied widely in terms of the pre-processing and algorithms. Common performance evaluation methods were cross-validation and train/test datasets, with F1, precision, and sensitivity as the most frequently reported evaluation metrics. Most studies reported relativity high overall scores on the evaluation metrics. Discussion NLP systems are valuable for the extraction of unstructured EHR data on ADL. However, comparing the performance of NLP systems is difficult due to the diversity of the studies and challenges related to the dataset, including restricted access to EHR data, inadequate documentation, lack of granularity, and small datasets. Conclusion This systematic review indicates that NLP is promising for deriving information on ADL from unstructured EHR notes. However, what the best-performing NLP system is, depends on characteristics of the dataset, research question, and type of ADL.
Collapse
Affiliation(s)
- Yvonne Wieland-Jorna
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
- Tranzo, School of Social Sciences and Behavioural Research, Tilburg University, Tilburg, Postbus 90153, 5000 LE, The Netherlands
| | - Daan van Kooten
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
| | - Robert A Verheij
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
- Tranzo, School of Social Sciences and Behavioural Research, Tilburg University, Tilburg, Postbus 90153, 5000 LE, The Netherlands
| | - Yvonne de Man
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
| | - Anneke L Francke
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
- Department of Public and Occupational Health, Location Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, Postbus 7057, 1007 MB, The Netherlands
| | - Mariska G Oosterveld-Vlug
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
| |
Collapse
|
40
|
Lam BD, Chrysafi P, Chiasakul T, Khosla H, Karagkouni D, McNichol M, Adamski A, Reyes N, Abe K, Mantha S, Vlachos IS, Zwicker JI, Patell R. Machine learning natural language processing for identifying venous thromboembolism: systematic review and meta-analysis. Blood Adv 2024; 8:2991-3000. [PMID: 38522096 PMCID: PMC11215191 DOI: 10.1182/bloodadvances.2023012200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 02/22/2024] [Accepted: 02/22/2024] [Indexed: 03/26/2024] Open
Abstract
ABSTRACT Venous thromboembolism (VTE) is a leading cause of preventable in-hospital mortality. Monitoring VTE cases is limited by the challenges of manual medical record review and diagnosis code interpretation. Natural language processing (NLP) can automate the process. Rule-based NLP methods are effective but time consuming. Machine learning (ML)-NLP methods present a promising solution. We conducted a systematic review and meta-analysis of studies published before May 2023 that use ML-NLP to identify VTE diagnoses in the electronic health records. Four reviewers screened all manuscripts, excluding studies that only used a rule-based method. A meta-analysis evaluated the pooled performance of each study's best performing model that evaluated for pulmonary embolism and/or deep vein thrombosis. Pooled sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with confidence interval (CI) were calculated by DerSimonian and Laird method using a random-effects model. Study quality was assessed using an adapted TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) tool. Thirteen studies were included in the systematic review and 8 had data available for meta-analysis. Pooled sensitivity was 0.931 (95% CI, 0.881-0.962), specificity 0.984 (95% CI, 0.967-0.992), PPV 0.910 (95% CI, 0.865-0.941) and NPV 0.985 (95% CI, 0.977-0.990). All studies met at least 13 of the 21 NLP-modified TRIPOD items, demonstrating fair quality. The highest performing models used vectorization rather than bag-of-words and deep-learning techniques such as convolutional neural networks. There was significant heterogeneity in the studies, and only 4 validated their model on an external data set. Further standardization of ML studies can help progress this novel technology toward real-world implementation.
Collapse
Affiliation(s)
- Barbara D. Lam
- Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
- Division of Clinical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Pavlina Chrysafi
- Department of Medicine, Mount Auburn Hospital, Harvard Medical School, Boston, MA
| | - Thita Chiasakul
- Center of Excellence in Translational Hematology, Division of Hematology, Department of Medicine, Faculty of Medicine, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok, Thailand
| | - Harshit Khosla
- Department of Medicine, Saint Vincent Hospital, Worcester, MA
| | - Dimitra Karagkouni
- Department of Pathology, Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Megan McNichol
- Library Sciences, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Alys Adamski
- Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
| | - Nimia Reyes
- Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
| | - Karon Abe
- Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
| | - Simon Mantha
- Division of Hematology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Ioannis S. Vlachos
- Department of Pathology, Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Jeffrey I. Zwicker
- Division of Hematology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Rushad Patell
- Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| |
Collapse
|
41
|
Patell R, Zwicker JI, Singh R, Mantha S. Machine learning in cancer-associated thrombosis: hype or hope in untangling the clot. BLEEDING, THROMBOSIS AND VASCULAR BIOLOGY 2024; 3:123. [PMID: 39323613 PMCID: PMC11423546 DOI: 10.4081/btvb.2024.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 03/22/2024] [Indexed: 09/27/2024]
Abstract
The goal of machine learning (ML) is to create informative signals and useful tasks by leveraging large datasets to derive computational algorithms. ML has the potential to revolutionize the healthcare industry by boosting productivity, enhancing safe and effective patient care, and lightening the load on clinicians. In addition to gaining mechanistic insights into cancer-associated thrombosis (CAT), ML can be used to improve patient outcomes, streamline healthcare delivery, and spur innovation. Our review paper delves into the present and potential applications of this cutting-edge technology, encompassing three areas: i) computer vision-assisted diagnosis of thromboembolism from radiology data; ii) case detection from electronic health records using natural language processing; iii) algorithms for CAT prediction and risk stratification. The availability of large, well-annotated, high-quality datasets, overfitting, limited generalizability, the risk of propagating inherent bias, and a lack of transparency among patients and clinicians are among the challenges that must be overcome in order to effectively develop ML in the health sector. To guarantee that this powerful instrument can be utilized to maximize innovation in CAT, clinicians can collaborate with stakeholders such as computer scientists, regulatory bodies, and patient groups.
Collapse
Affiliation(s)
- Rushad Patell
- Division of Medical Oncology and Hematology, Beth Israel Deaconess Medical Center, Boston, MA
- Harvard Medical School, Boston, MA
| | - Jeffrey I. Zwicker
- Department of Medicine, Hematology Service, Memorial Sloan Kettering Cancer Center, New York, NY
- Weill Cornell Medical College, New York, NY
| | - Rohan Singh
- Department of Digital Informatics & Technology Solutions, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Simon Mantha
- Department of Medicine, Hematology Service, Memorial Sloan Kettering Cancer Center, New York, NY
| |
Collapse
|
42
|
Sindhu A, Jadhav U, Ghewade B, Bhanushali J, Yadav P. Revolutionizing Pulmonary Diagnostics: A Narrative Review of Artificial Intelligence Applications in Lung Imaging. Cureus 2024; 16:e57657. [PMID: 38707160 PMCID: PMC11070215 DOI: 10.7759/cureus.57657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 04/04/2024] [Indexed: 05/07/2024] Open
Abstract
Artificial intelligence (AI) has emerged as a transformative force in healthcare, particularly in pulmonary diagnostics. This comprehensive review explores the impact of AI on revolutionizing lung imaging, focusing on its applications in detecting abnormalities, diagnosing pulmonary conditions, and predicting disease prognosis. We provide an overview of traditional pulmonary diagnostic methods and highlight the importance of accurate and efficient lung imaging for early intervention and improved patient outcomes. Through the lens of AI, we examine machine learning algorithms, deep learning techniques, and natural language processing for analyzing radiology reports. Case studies and examples showcase the successful implementation of AI in pulmonary diagnostics, alongside challenges faced and lessons learned. Finally, we discuss future directions, including integrating AI into clinical workflows, ethical considerations, and the need for further research and collaboration in this rapidly evolving field. This review underscores the transformative potential of AI in enhancing the accuracy, efficiency, and accessibility of pulmonary healthcare.
Collapse
Affiliation(s)
- Arman Sindhu
- Respiratory Medicine, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Ulhas Jadhav
- Respiratory Medicine, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Babaji Ghewade
- Respiratory Medicine, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Jay Bhanushali
- Respiratory Medicine, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Pallavi Yadav
- Obstetrics and Gynecology, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| |
Collapse
|
43
|
Crombé A, Lecomte JC, Seux M, Banaste N, Gorincour G. Using the Textual Content of Radiological Reports to Detect Emerging Diseases: A Proof-of-Concept Study of COVID-19. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:620-632. [PMID: 38343242 PMCID: PMC11031522 DOI: 10.1007/s10278-023-00949-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/02/2023] [Accepted: 10/04/2023] [Indexed: 04/20/2024]
Abstract
Changes in the content of radiological reports at population level could detect emerging diseases. Herein, we developed a method to quantify similarities in consecutive temporal groupings of radiological reports using natural language processing, and we investigated whether appearance of dissimilarities between consecutive periods correlated with the beginning of the COVID-19 pandemic in France. CT reports from 67,368 consecutive adults across 62 emergency departments throughout France between October 2019 and March 2020 were collected. Reports were vectorized using time frequency-inverse document frequency (TF-IDF) analysis on one-grams. For each successive 2-week period, we performed unsupervised clustering of the reports based on TF-IDF values and partition-around-medoids. Next, we assessed the similarities between this clustering and a clustering from two weeks before according to the average adjusted Rand index (AARI). Statistical analyses included (1) cross-correlation functions (CCFs) with the number of positive SARS-CoV-2 tests and advanced sanitary index for flu syndromes (ASI-flu, from open-source dataset), and (2) linear regressions of time series at different lags to understand the variations of AARI over time. Overall, 13,235 chest CT reports were analyzed. AARI was correlated with ASI-flu at lag = + 1, + 5, and + 6 weeks (P = 0.0454, 0.0121, and 0.0042, respectively) and with SARS-CoV-2 positive tests at lag = - 1 and 0 week (P = 0.0057 and 0.0001, respectively). In the best fit, AARI correlated with the ASI-flu with a lag of 2 weeks (P = 0.0026), SARS-CoV-2-positive tests in the same week (P < 0.0001) and their interaction (P < 0.0001) (adjusted R2 = 0.921). Thus, our method enables the automatic monitoring of changes in radiological reports and could help capturing disease emergence.
Collapse
Affiliation(s)
- Amandine Crombé
- IMADIS, Lyon, France.
- SARCOTARGET Team, University of Bordeaux, Inserm, UMR1312, BRIC, BoRdeaux Institute of Oncology, 146 Rue Léo Saignat, Bordeaux, F-33076, France.
- Department of Radiology, Pellegrin University Hospital, CHU Bordeaux, Place Amélie Raba-Léon, Bordeaux, F-33076, France.
| | - Jean-Christophe Lecomte
- IMADIS, Lyon, France
- Centre Aquitain d'Imagerie médicale, Mérignac, France
- Centre Hospitalier de Saintes, Saintes, France
- Clinique Mutualiste Bordeaux Pessac, Pessac, France
| | | | - Nathan Banaste
- IMADIS, Lyon, France
- Clinique Convert, Ramsay, Bourg en Bresse, France
| | | |
Collapse
|
44
|
Martín-Noguerol T, López-Úbeda P, Luna A. Imagine there is no paperwork… it's easy if you try. Br J Radiol 2024; 97:744-746. [PMID: 38335929 PMCID: PMC11027242 DOI: 10.1093/bjr/tqae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 01/11/2024] [Accepted: 02/05/2024] [Indexed: 02/12/2024] Open
Abstract
Artificial Intelligence (AI) applied to radiology is so vast that it provides applications ranging from becoming a complete replacement for radiologists (a potential threat) to an efficient paperwork-saving time assistant (an evident strength). Nowadays, there are AI applications developed to facilitate the diagnostic process of radiologists without directly influencing (or replacing) the proper diagnostic decision step. These tools may help to reduce administrative workload, in different scenarios ranging from assisting in scheduling, study prioritization, or report communication, to helping with patient follow-up, including recommending additional exams. These are just a few of the highly time-consuming tasks that radiologists have to deal with every day in their routine workflow. These tasks hinder the time that radiologists should spend evaluating images and caring for patients, which will have a direct and negative impact on the quality of reports and patient attention, increasing the delay and waiting list of studies pending to be performed and reported. These types of AI applications should help to partially face this worldwide shortage of radiologists.
Collapse
Affiliation(s)
| | | | - Antonio Luna
- MRI Unit, Radiology Department, HT medica, Jaén 23007, Spain
| |
Collapse
|
45
|
Cappello A, Murgia Y, Giacobbe DR, Mora S, Gazzarata R, Rosso N, Giacomini M, Bassetti M. Automated extraction of standardized antibiotic resistance and prescription data from laboratory information systems and electronic health records: a narrative review. FRONTIERS IN ANTIBIOTICS 2024; 3:1380380. [PMID: 39816258 PMCID: PMC11731964 DOI: 10.3389/frabi.2024.1380380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 02/26/2024] [Indexed: 01/18/2025]
Abstract
Antimicrobial resistance in bacteria has been associated with significant morbidity and mortality in hospitalized patients. In the era of big data and of the consequent frequent need for large study populations, manual collection of data for research studies on antimicrobial resistance and antibiotic use has become extremely time-consuming and sometimes impossible to be accomplished by overwhelmed healthcare personnel. In this review, we discuss relevant concepts pertaining to the automated extraction of antibiotic resistance and antibiotic prescription data from laboratory information systems and electronic health records to be used in clinical studies, starting from the currently available literature on the topic. Leveraging automatic extraction and standardization of antimicrobial resistance and antibiotic prescription data is an tremendous opportunity to improve the care of future patients with severe infections caused by multidrug-resistant organisms, and should not be missed.
Collapse
Affiliation(s)
- Alice Cappello
- Clinica Malattie Infettive, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Ylenia Murgia
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, Genoa, Italy
| | - Daniele Roberto Giacobbe
- Clinica Malattie Infettive, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Health Sciences (DISSAL), University of Genoa, Genoa, Italy
| | - Sara Mora
- UO Information and Communication Technologies (ICT), IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Roberta Gazzarata
- Healthropy, Savona, Italy
- Health Level 7 (HL7) Europe, Brussels, Belgium
| | - Nicola Rosso
- UO Information and Communication Technologies (ICT), IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Mauro Giacomini
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, Genoa, Italy
| | - Matteo Bassetti
- Clinica Malattie Infettive, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Health Sciences (DISSAL), University of Genoa, Genoa, Italy
| |
Collapse
|
46
|
Martín-Noguerol T, López-Úbeda P, Pons-Escoda A, Luna A. Natural language processing deep learning models for the differential between high-grade gliomas and metastasis: what if the key is how we report them? Eur Radiol 2024; 34:2113-2120. [PMID: 37665389 DOI: 10.1007/s00330-023-10202-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/10/2023] [Accepted: 07/20/2023] [Indexed: 09/05/2023]
Abstract
OBJECTIVES The differential between high-grade glioma (HGG) and metastasis remains challenging in common radiological practice. We compare different natural language processing (NLP)-based deep learning models to assist radiologists based on data contained in radiology reports. METHODS This retrospective study included 185 MRI reports between 2010 and 2022 from two different institutions. A total of 117 reports were used for the training and 21 were reserved for the validation set, while the rest were used as a test set. A comparison of the performance of different deep learning models for HGG and metastasis classification has been carried out. Specifically, Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), a hybrid version of BiLSTM and CNN, and a radiology-specific Bidirectional Encoder Representations from Transformers (RadBERT) model were used. RESULTS For the classification of MRI reports, the CNN network provided the best results among all tested, showing a macro-avg precision of 87.32%, a sensitivity of 87.45%, and an F1 score of 87.23%. In addition, our NLP algorithm detected keywords such as tumor, temporal, and lobe to positively classify a radiological report as HGG or metastasis group. CONCLUSIONS A deep learning model based on CNN enables radiologists to discriminate between HGG and metastasis based on MRI reports with high-precision values. This approach should be considered an additional tool in diagnosing these central nervous system lesions. CLINICAL RELEVANCE STATEMENT The use of our NLP model enables radiologists to differentiate between patients with high-grade glioma and metastasis based on their MRI reports and can be used as an additional tool to the conventional image-based approach for this challenging task. KEY POINTS • Differential between high-grade glioma and metastasis is still challenging in common radiological practice. • Natural language processing (NLP)-based deep learning models can assist radiologists based on data contained in radiology reports. • We have developed and tested a natural language processing model for discriminating between high-grade glioma and metastasis based on MRI reports that show high precision for this task.
Collapse
Affiliation(s)
| | | | - Albert Pons-Escoda
- Radiology Department, Hospital Universitari de Bellvitge, Barcelona, Spain
| | - Antonio Luna
- Radiology Department, MRI Unit, HT Medica, Carmelo Torres 2, 23007, Jaén, Spain
| |
Collapse
|
47
|
Liu W, Cai L, Li Y. Application of natural language processing to post-structuring of rectal cancer MRI reports. Clin Radiol 2024; 79:e204-e210. [PMID: 38042740 DOI: 10.1016/j.crad.2023.10.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/20/2023] [Accepted: 10/26/2023] [Indexed: 12/04/2023]
Abstract
AIM To evaluate a natural language processing (NLP) system for extracting structured information from the free-form text of rectal cancer magnetic resonance imaging (MRI) reports written in Chinese. MATERIALS AND METHODS A rule-based NLP model that could extract 11 key image features of rectal cancer was constructed using 358 MRI reports of rectal cancer written between 2015 and 2021. Fifty reports written before 2015 and 50 written after 2021 were used as test datasets, and the reference standard was determined by manual extraction of information by two radiologists. The length and reporting rate of image features in pre-2015 and post-2021 datasets, as well as the accuracy, precision, recall, and F1 score of feature extraction by the NLP system, were compared. The time required for the NLP to extract data was compared with that required by the radiologists. RESULTS Reports written after 2021 had longer diagnostic impression sections than reports written before 2015. The reporting rate of key imaging features of rectal cancer was 36.55% before 2015 and 79.82% after 2021. The accuracy, precision, recall, and F1 score of NLP for correct extraction of values from reports were 93.82%, 95.63%, 87.06%, and 91.15%, respectively, for pre-2015 reports, and 92.55%, 98.53%, 94.15%, and 96.29%, respectively, for post-2021 reports. NLP generated all the structured information in <1 second. CONCLUSIONS The NLP system with rule-based pattern matching achieved rapid and accurate structured processing of rectal cancer MRI reports. MRI reports with structured templates are more suitable for NLP-based extraction of information.
Collapse
Affiliation(s)
- W Liu
- Department of Radiology, Aerospace Center Hospital, Beijing, 100049, China; Department of Radiology, Beijing Friendship Hospital, Capital Medical University, Beijing, 100050, China
| | - L Cai
- School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China
| | - Y Li
- Department of General Surgery, Aerospace Center Hospital, Beijing, 100049, China.
| |
Collapse
|
48
|
Nobel JM, Puts S, Krdzalic J, Zegers KML, Lobbes MBI, F Robben SG, Dekker ALAJ. Natural Language Processing Algorithm Used for Staging Pulmonary Oncology from Free-Text Radiological Reports: "Including PET-CT and Validation Towards Clinical Use". JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:3-12. [PMID: 38343237 PMCID: PMC10976919 DOI: 10.1007/s10278-023-00913-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 08/26/2023] [Accepted: 09/03/2023] [Indexed: 03/02/2024]
Abstract
Natural language processing (NLP) can be used to process and structure free text, such as (free text) radiological reports. In radiology, it is important that reports are complete and accurate for clinical staging of, for instance, pulmonary oncology. A computed tomography (CT) or positron emission tomography (PET)-CT scan is of great importance in tumor staging, and NLP may be of additional value to the radiological report when used in the staging process as it may be able to extract the T and N stage of the 8th tumor-node-metastasis (TNM) classification system. The purpose of this study is to evaluate a new TN algorithm (TN-PET-CT) by adding a layer of metabolic activity to an already existing rule-based NLP algorithm (TN-CT). This new TN-PET-CT algorithm is capable of staging chest CT examinations as well as PET-CT scans. The study design made it possible to perform a subgroup analysis to test the external validation of the prior TN-CT algorithm. For information extraction and matching, pyContextNLP, SpaCy, and regular expressions were used. Overall TN accuracy score of the TN-PET-CT algorithm was 0.73 and 0.62 in the training and validation set (N = 63, N = 100). The external validation of the TN-CT classifier (N = 65) was 0.72. Overall, it is possible to adjust the TN-CT algorithm into a TN-PET-CT algorithm. However, outcomes highly depend on the accuracy of the report, the used vocabulary, and its context to express, for example, uncertainty. This is true for both the adjusted PET-CT algorithm and for the CT algorithm when applied in another hospital.
Collapse
Affiliation(s)
- J Martijn Nobel
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center+, Postbox 5800, 6202 AZ, Maastricht, Netherlands.
- School of Health Professions Education, Maastricht University, Maastricht, Netherlands.
| | - Sander Puts
- Department of Radiation Oncology (MAASTRO), Maastricht, Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, Netherlands
| | - Jasenko Krdzalic
- Zuyderland Medical Center, Department of Medical Imaging, Sittard-Geleen, Netherlands
| | - Karen M L Zegers
- Department of Radiation Oncology (MAASTRO), Maastricht, Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, Netherlands
| | - Marc B I Lobbes
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center+, Postbox 5800, 6202 AZ, Maastricht, Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, Netherlands
- Zuyderland Medical Center, Department of Medical Imaging, Sittard-Geleen, Netherlands
| | - Simon G F Robben
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center+, Postbox 5800, 6202 AZ, Maastricht, Netherlands
- School of Health Professions Education, Maastricht University, Maastricht, Netherlands
| | - André L A J Dekker
- Department of Radiation Oncology (MAASTRO), Maastricht, Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
49
|
Reichenpfader D, Müller H, Denecke K. Large language model-based information extraction from free-text radiology reports: a scoping review protocol. BMJ Open 2023; 13:e076865. [PMID: 38070902 PMCID: PMC10729196 DOI: 10.1136/bmjopen-2023-076865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 11/21/2023] [Indexed: 12/18/2023] Open
Abstract
INTRODUCTION Radiological imaging is one of the most frequently performed diagnostic tests worldwide. The free-text contained in radiology reports is currently only rarely used for secondary use purposes, including research and predictive analysis. However, this data might be made available by means of information extraction (IE), based on natural language processing (NLP). Recently, a new approach to NLP, large language models (LLMs), has gained momentum and continues to improve performance of IE-related tasks. The objective of this scoping review is to show the state of research regarding IE from free-text radiology reports based on LLMs, to investigate applied methods and to guide future research by showing open challenges and limitations of current approaches. To our knowledge, no systematic or scoping review of IE from radiology reports based on LLMs has been published. Existing publications are outdated and do not comprise LLM-based methods. METHODS AND ANALYSIS This protocol is designed based on the JBI Manual for Evidence Synthesis, chapter 11.2: 'Development of a scoping review protocol'. Inclusion criteria and a search strategy comprising four databases (PubMed, IEEE Xplore, Web of Science Core Collection and ACM Digital Library) are defined. Furthermore, we describe the screening process, data charting, analysis and presentation of extracted data. ETHICS AND DISSEMINATION This protocol describes the methodology of a scoping literature review and does not comprise research on or with humans, animals or their data. Therefore, no ethical approval is required. After the publication of this protocol and the conduct of the review, its results are going to be published in an open access journal dedicated to biomedical informatics/digital health.
Collapse
Affiliation(s)
- Daniel Reichenpfader
- Institute for Patient-centered Digital Health, Bern University of Applied Sciences, Bern, Switzerland
| | - Henning Müller
- Department of Radiology and Medical Informatics, Université de Genève, Genève, Switzerland
- Informatics Institute, HES-SO Valais-Wallis, Sierre, Switzerland
| | - Kerstin Denecke
- Institute for Patient-centered Digital Health, Bern University of Applied Sciences, Bern, Switzerland
| |
Collapse
|
50
|
Liu F, Zhu T, Wu X, Yang B, You C, Wang C, Lu L, Liu Z, Zheng Y, Sun X, Yang Y, Clifton L, Clifton DA. A medical multimodal large language model for future pandemics. NPJ Digit Med 2023; 6:226. [PMID: 38042919 PMCID: PMC10693607 DOI: 10.1038/s41746-023-00952-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 10/24/2023] [Indexed: 12/04/2023] Open
Abstract
Deep neural networks have been integrated into the whole clinical decision procedure which can improve the efficiency of diagnosis and alleviate the heavy workload of physicians. Since most neural networks are supervised, their performance heavily depends on the volume and quality of available labels. However, few such labels exist for rare diseases (e.g., new pandemics). Here we report a medical multimodal large language model (Med-MLLM) for radiograph representation learning, which can learn broad medical knowledge (e.g., image understanding, text semantics, and clinical phenotypes) from unlabelled data. As a result, when encountering a rare disease, our Med-MLLM can be rapidly deployed and easily adapted to them with limited labels. Furthermore, our model supports medical data across visual modality (e.g., chest X-ray and CT) and textual modality (e.g., medical report and free-text clinical note); therefore, it can be used for clinical tasks that involve both visual and textual data. We demonstrate the effectiveness of our Med-MLLM by showing how it would perform using the COVID-19 pandemic "in replay". In the retrospective setting, we test the model on the early COVID-19 datasets; and in the prospective setting, we test the model on the new variant COVID-19-Omicron. The experiments are conducted on 1) three kinds of input data; 2) three kinds of downstream tasks, including disease reporting, diagnosis, and prognosis; 3) five COVID-19 datasets; and 4) three different languages, including English, Chinese, and Spanish. All experiments show that our model can make accurate and robust COVID-19 decision-support with little labelled data.
Collapse
Affiliation(s)
- Fenglin Liu
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK.
| | - Tingting Zhu
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Xian Wu
- Jarvis Research Center, Tencent YouTu Lab, Beijing, China
| | - Bang Yang
- School of Computer Science, Peking University, Beijing, China
| | | | - Chenyang Wang
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Lei Lu
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Zhangdaihong Liu
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
- Oxford-Suzhou Centre for Advanced Research, Suzhou, China
| | - Yefeng Zheng
- Jarvis Research Center, Tencent YouTu Lab, Beijing, China
| | - Xu Sun
- School of Computer Science, Peking University, Beijing, China
| | - Yang Yang
- School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Lei Clifton
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - David A Clifton
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK.
- Oxford-Suzhou Centre for Advanced Research, Suzhou, China.
| |
Collapse
|