Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Casey A, Davidson E, Poon M, Dong H, Duma D, Grivas A, Grover C, Suárez-Paniagua V, Tobin R, Whiteley W, Wu H, Alex B. A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak 2021;21:179. [PMID: 34082729 PMCID: PMC8176715 DOI: 10.1186/s12911-021-01533-7] [Citation(s) in RCA: 83] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/17/2021] [Indexed: 11/10/2022] Open

For:	Casey A, Davidson E, Poon M, Dong H, Duma D, Grivas A, Grover C, Suárez-Paniagua V, Tobin R, Whiteley W, Wu H, Alex B. A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak 2021;21:179. [PMID: 34082729 PMCID: PMC8176715 DOI: 10.1186/s12911-021-01533-7] [Citation(s) in RCA: 83] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/17/2021] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Mahmoudi E, Vahdati S, Chao CJ, Khosravi B, Misra A, Lopez-Jimenez F, Erickson BJ. A comparative analysis of privacy-preserving large language models for automated echocardiography report analysis. J Am Med Inform Assoc 2025:ocaf056. [PMID: 40334045 DOI: 10.1093/jamia/ocaf056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2024] [Revised: 01/29/2025] [Accepted: 03/18/2025] [Indexed: 05/09/2025] Open

Abstract

BACKGROUND

Automated data extraction from echocardiography reports could facilitate large-scale registry creation and clinical surveillance of valvular heart diseases (VHD). We evaluated the performance of open-source large language models (LLMs) guided by prompt instructions and chain of thought (CoT) for this task.

METHODS

From consecutive transthoracic echocardiographies performed in our center, we utilized 200 random reports from 2019 for prompt optimization and 1000 from 2023 for evaluation. Five instruction-tuned LLMs (Qwen2.0-72B, Llama3.0-70B, Mixtral8-46.7B, Llama3.0-8B, and Phi3.0-3.8B) were guided by prompt instructions with and without CoT to classify prosthetic valve presence and VHD severity. Performance was evaluated using classification metrics against expert-labeled ground truth. Mean squared error (MSE) was also calculated for predicted severity's deviation from actual severity.

RESULTS

With CoT prompting, Llama3.0-70B and Qwen2.0 achieved the highest performance (accuracy: 99.1% and 98.9% for VHD severity; 100% and 99.9% for prosthetic valve; MSE: 0.02 and 0.05, respectively). Smaller models showed lower accuracy for VHD severity (54.1%-85.9%) but maintained high accuracy for prosthetic valve detection (>96%). Chain of thought reasoning yielded higher accuracy for larger models while increasing processing time from 2-25 to 67-154 seconds per report. Based on CoT reasonings, the wrong predictions were mainly due to model outputs being influenced by irrelevant information in the text or failure to follow the prompt instructions.

CONCLUSIONS

Our study demonstrates the near-perfect performance of open-source LLMs for automated echocardiography report interpretation with the purpose of registry formation and disease surveillance. While larger models achieved exceptional accuracy through prompt optimization, practical implementation requires balancing performance with computational efficiency.

Collapse

Huhtanen HJ, Nyman MJ, Karlsson A, Hirvonen J. Machine Learning and Deep Learning Models for Automated Protocoling of Emergency Brain MRI Using Text from Clinical Referrals. Radiol Artif Intell 2025;7:e230620. [PMID: 39969276 DOI: 10.1148/ryai.230620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2025]

Abstract

Purpose To develop and evaluate machine learning and deep learning-based models for automated protocoling of emergency brain MRI scans based on clinical referral text. Materials and Methods In this single-institution, retrospective study of 1953 emergency brain MRI referrals from January 2016 to January 2019, two neuroradiologists labeled the imaging protocol and use of contrast agent as the reference standard. Three machine learning algorithms (naive Bayes, support vector machine, and XGBoost) and two pretrained deep learning models (Finnish bidirectional encoder representations from transformers [BERT] and generative pretrained transformer [GPT]-3.5 [GPT-3.5 Turbo; Open AI]) were developed to predict the MRI protocol and need for a contrast agent. Each model was trained with three datasets (100% of training data, 50% of training data, and 50% plus augmented training data). Prediction accuracy was assessed with a test set. Results The GPT-3.5 models trained with 100% of the training data performed best in both tasks, achieving an accuracy of 84% (95% CI: 80, 88) for the correct protocol and 91% (95% CI: 88, 94) for the contrast agent. BERT had an accuracy of 78% (95% CI: 74, 82) for the protocol and 89% (95% CI: 86, 92) for the contrast agent. The best machine learning model in the protocol task was XGBoost (accuracy, 78%; 95% CI: 73, 82), and the best machine learning models in the contrast agent task were support vector machine and XGBoost (accuracy, 88%; 95% CI: 84, 91 for both). The accuracies of two nonneuroradiologists were 80%-83% in the protocol task and 89%-91% in the contrast medium task. Conclusion Machine learning and deep learning models demonstrated high performance in automatic protocoling of emergency brain MRI scans based on text from clinical referrals. Keywords: Natural Language Processing, Automatic Protocoling, Deep Learning, Machine Learning, Emergency Brain MRI Supplemental material is available for this article. Published under a CC BY 4.0 license. See also commentary by Strotzer in this issue.

Collapse

López-Úbeda P, Martín-Noguerol T, Escartín J, Cabrera-Zubizarreta A, Luna A. Automated MRI pituitary structured reporting from free-text using a fine-tuned Llama model: a feasibility study. Jpn J Radiol 2025;43:770-778. [PMID: 39730936 DOI: 10.1007/s11604-024-01721-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 12/11/2024] [Indexed: 12/29/2024]

Yao J, Alabousi A, Mironov O. Evaluation of a BERT Natural Language Processing Model for Automating CT and MRI Triage and Protocol Selection. Can Assoc Radiol J 2025;76:265-272. [PMID: 38832645 DOI: 10.1177/08465371241255895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2024] Open

Mahyoub M, Dougherty K, Shukla A. Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study. JMIR Med Inform 2025;13:e67706. [PMID: 40203306 PMCID: PMC12018862 DOI: 10.2196/67706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 01/30/2025] [Accepted: 03/13/2025] [Indexed: 04/11/2025] Open

Abstract

BACKGROUND

Pulmonary embolism (PE) is a critical condition requiring rapid diagnosis to reduce mortality. Extracting PE diagnoses from radiology reports manually is time-consuming, highlighting the need for automated solutions. Advances in natural language processing, especially transformer models like GPT-4o, offer promising tools to improve diagnostic accuracy and workflow efficiency in clinical settings.

OBJECTIVE

This study aimed to develop an automatic extraction system using GPT-4o to extract PE diagnoses from radiology report impressions, enhancing clinical decision-making and workflow efficiency.

METHODS

In total, 2 approaches were developed and evaluated: a fine-tuned Clinical Longformer as a baseline model and a GPT-4o-based extractor. Clinical Longformer, an encoder-only model, was chosen for its robustness in text classification tasks, particularly on smaller scales. GPT-4o, a decoder-only instruction-following LLM, was selected for its advanced language understanding capabilities. The study aimed to evaluate GPT-4o's ability to perform text classification compared to the baseline Clinical Longformer. The Clinical Longformer was trained on a dataset of 1000 radiology report impressions and validated on a separate set of 200 samples, while the GPT-4o extractor was validated using the same 200-sample set. Postdeployment performance was further assessed on an additional 200 operational records to evaluate model efficacy in a real-world setting.

RESULTS

GPT-4o outperformed the Clinical Longformer in 2 of the metrics, achieving a sensitivity of 1.0 (95% CI 1.0-1.0; Wilcoxon test, P<.001) and an F1-score of 0.975 (95% CI 0.9495-0.9947; Wilcoxon test, P<.001) across the validation dataset. Postdeployment evaluations also showed strong performance of the deployed GPT-4o model with a sensitivity of 1.0 (95% CI 1.0-1.0), a specificity of 0.94 (95% CI 0.8913-0.9804), and an F1-score of 0.97 (95% CI 0.9479-0.9908). This high level of accuracy supports a reduction in manual review, streamlining clinical workflows and improving diagnostic precision.

CONCLUSIONS

The GPT-4o model provides an effective solution for the automatic extraction of PE diagnoses from radiology reports, offering a reliable tool that aids timely and accurate clinical decision-making. This approach has the potential to significantly improve patient outcomes by expediting diagnosis and treatment pathways for critical conditions like PE.

Collapse

Clunie DA, Flanders A, Taylor A, Erickson B, Bialecki B, Brundage D, Gutman D, Prior F, Seibert JA, Perry J, Gichoya JW, Kirby J, Andriole K, Geneslaw L, Moore S, Fitzgerald TJ, Tellis W, Xiao Y, Farahani K. Report of the Medical Image De-Identification (MIDI) Task Group -- Best Practices and Recommendations. ARXIV 2025:arXiv:2303.10473v3. [PMID: 37033463 PMCID: PMC10081345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]

Shahid F, Hsu MH, Chang YC, Jian WS. Using Generative AI to Extract Structured Information from Free Text Pathology Reports. J Med Syst 2025;49:36. [PMID: 40080229 PMCID: PMC11906504 DOI: 10.1007/s10916-025-02167-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 03/03/2025] [Indexed: 03/15/2025]

Bala W, Li H, Moon J, Trivedi H, Gichoya J, Balthazar P. Enhancing radiology training with GPT-4: Pilot analysis of automated feedback in trainee preliminary reports. Curr Probl Diagn Radiol 2025;54:151-158. [PMID: 39179466 PMCID: PMC11802295 DOI: 10.1067/j.cpradiol.2024.08.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 07/29/2024] [Accepted: 08/08/2024] [Indexed: 08/26/2024]

Omar M, Levkovich I. Exploring the efficacy and potential of large language models for depression: A systematic review. J Affect Disord 2025;371:234-244. [PMID: 39581383 DOI: 10.1016/j.jad.2024.11.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 10/21/2024] [Accepted: 11/15/2024] [Indexed: 11/26/2024]

Cruz-Gonzalez P, He AWJ, Lam EP, Ng IMC, Li MW, Hou R, Chan JNM, Sahni Y, Vinas Guasch N, Miller T, Lau BWM, Sánchez Vidaña DI. Artificial intelligence in mental health care: a systematic review of diagnosis, monitoring, and intervention applications. Psychol Med 2025;55:e18. [PMID: 39911020 PMCID: PMC12017374 DOI: 10.1017/s0033291724003295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 10/26/2024] [Accepted: 11/26/2024] [Indexed: 02/07/2025]

Cheng CT, Ooyang CH, Liao CH, Kang SC. Applications of deep learning in trauma radiology: A narrative review. Biomed J 2025;48:100743. [PMID: 38679199 PMCID: PMC11751421 DOI: 10.1016/j.bj.2024.100743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 03/26/2024] [Accepted: 04/24/2024] [Indexed: 05/01/2024] Open

Jorg T, Halfmann MC, Graafen D, Hobohm L, Düber C, Mildenberger P, Müller L. [Structured reporting for efficient epidemiological and in-hospital prevalence analysis of pulmonary embolisms]. ROFO-FORTSCHR RONTG 2025;197:186-195. [PMID: 38806150 DOI: 10.1055/a-2301-3349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2024]

Abstract

Structured reporting (SR) not only offers advantages regarding report quality but, as an IT-based method, also the opportunity to aggregate and analyze large, highly structured datasets (data mining). In this study, a data mining algorithm was used to calculate epidemiological data and in-hospital prevalence statistics of pulmonary embolism (PE) by analyzing structured CT reports.All structured reports for PE CT scans from the last 5 years (n = 2790) were extracted from the SR database and analyzed. The prevalence of PE was calculated for the entire cohort and stratified by referral type and clinical referrer. Distributions of the manifestation of PEs (central, lobar, segmental, subsegmental, as well as left-sided, right-sided, bilateral) were calculated, and the occurrence of right heart strain was correlated with the manifestation.The prevalence of PE in the entire cohort was 24% (n = 678). The median age of PE patients was 71 years (IQR 58-80), and the sex distribution was 1.2/1 (M/F). Outpatients showed a lower prevalence of 23% compared to patients from regular wards (27%) and intensive care units (30%). Surgically referred patients had a higher prevalence than patients from internal medicine (34% vs. 22%). Patients with central and bilateral PEs had a significantly higher occurrence of right heart strain compared to patients with peripheral and unilateral embolisms.Data mining of structured reports is a simple method for obtaining prevalence statistics, epidemiological data, and the distribution of disease characteristics, as demonstrated by the PE use case. The generated data can be helpful for multiple purposes, such as for internal clinical quality assurance and scientific analyses. To benefit from this, consistent use of SR is required and is therefore recommended. · SR-based data mining allows simple epidemiologic analyses for PE.. · The prevalence of PE differs between outpatients and inpatients.. · Central and bilateral PEs have an increased risk of right heart strain.. · Jorg T, Halfmann MC, Graafen D et al. Structured reporting for efficient epidemiological and in-hospital prevalence analysis of pulmonary embolisms. Rofo 2025; 197: 186-195.

Collapse

Sajjadi SM, Mohebbi A, Ehsani A, Marashi A, Azhdarimoghaddam A, Karami S, Karimi MA, Sadeghi M, Firoozi K, Mohammad Zamani A, Rigi A, Nayebagha M, Asadi Anar M, Eini P, Salehi S, Rostami Ghezeljeh M. Identifying abdominal aortic aneurysm size and presence using Natural Language Processing of radiology reports: a systematic review and meta-analysis. Abdom Radiol (NY) 2025:10.1007/s00261-025-04810-5. [PMID: 39883167 DOI: 10.1007/s00261-025-04810-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Revised: 01/10/2025] [Accepted: 01/16/2025] [Indexed: 01/31/2025]

Abstract

BACKGROUND AND AIM

Prior investigations of the natural history of abdominal aortic aneurysms (AAAs) have been constrained by small sample sizes or uneven assessments of aggregated data. Natural language processing (NLP) can significantly enhance the investigation and treatment of patients with AAAs by swiftly and effectively collecting imaging data from health records. This meta-analysis aimed to evaluate the efficacy of NLP techniques in reliably identifying the existence or absence of AAAs and measuring the maximal abdominal aortic diameter in extensive datasets of radiology study reports.

METHOD

The PubMed, Scopus, Web of Science, Embase, and Science Direct databases were searched until March 2024 to obtain pertinent papers. The RAYYAN intelligent tool for systematic reviews was utilized to screen the studies. The meta-analysis was conducted using STATA v18 software. Egger's test was employed to evaluate publication bias. The Newcastle Ottawa Scale was employed to assess the quality of the listed studies. A plot digitizer was employed to extract digital data.

RESULT

A total of 39,094 individuals with AAA were included in this analysis. Twenty-seven thousand three hundred twenty-six patients were male, and 11,383 were female. The mean age of the total participants was 73.1 ± 1.25 years. Analysis results for pooled estimation of performance variables such as: The sensitivity, specificity, precision, and accuracy of the implemented NLP model were analyzed as follows: 0.89(0.88-0.91), 0.88 (0.87-0.89), 0.92 (0.89-0.95), and 0.91 (0.89-0.93) respectively. The aneurysm diameter size difference reported in follow-up before and after NLP implementation in the included studies showed a 0.05 cm reduction in size, which was statistically significant.

CONCLUSION

NLP holds great potential for automating the detection of AAA size and presence in radiology reports, enhancing efficiency and scalability over manual review. However, challenges persist. Variability in report formats, terminology, and unstructured data can compromise accuracy. Additionally, NLP models rely on high-quality, annotated training datasets, which may be incomplete or unrepresentative. While NLP aids in identifying AAA-related data, human oversight is essential to ensure decisions are informed by the patient's broader clinical context. Ongoing algorithm refinement and seamless integration into clinical workflows are key to improving NLP's utility and reliability in this field.

Collapse

Omar M, Nassar S, SharIf K, Glicksberg BS, Nadkarni GN, Klang E. Emerging applications of NLP and large language models in gastroenterology and hepatology: a systematic review. Front Med (Lausanne) 2025;11:1512824. [PMID: 39917263 PMCID: PMC11799763 DOI: 10.3389/fmed.2024.1512824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Accepted: 12/09/2024] [Indexed: 02/09/2025] Open

Fu T, Berlin S, Gupta A, Sommer J. Automated Incidental Findings Notification Through the Electronic Health Record Utilizing Dictation Macros. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-024-01357-7. [PMID: 39806184 DOI: 10.1007/s10278-024-01357-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 11/11/2024] [Accepted: 11/24/2024] [Indexed: 01/16/2025]

Fathi M, Vakili K, Hajibeygi R, Bahrami A, Behzad S, Tafazolimoghadam A, Aghabozorgi H, Eshraghi R, Bhatt V, Gholamrezanezhad A. Cultivating diagnostic clarity: The importance of reporting artificial intelligence confidence levels in radiologic diagnoses. Clin Imaging 2025;117:110356. [PMID: 39566394 DOI: 10.1016/j.clinimag.2024.110356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 11/01/2024] [Accepted: 11/09/2024] [Indexed: 11/22/2024]

Abstract

Accurate image interpretation is essential in the field of radiology to the healthcare team in order to provide optimal patient care. This article discusses the use of artificial intelligence (AI) confidence levels to enhance the accuracy and dependability of its radiological diagnoses. The current advances in AI technologies have changed how radiologists and clinicians make the diagnoses of pathological conditions such as aneurysms, hemorrhages, pneumothorax, pneumoperitoneum, and particularly fractures. To enhance the utility of these AI models, radiologists need a more comprehensive understanding of the model's levels of confidence and certainty behind the results they produce. This allows radiologists to make more informed decisions that have the potential to drastically change a patient's clinical management. Several AI models, especially those utilizing deep learning models (DL) with convolutional neural networks (CNNs), have demonstrated significant potential in identifying subtle findings in medical imaging that are often missed by radiologists. It is necessary to create standardized levels of confidence metrics in order for AI systems to be relevant and reliable in the clinical setting. Incorporating AI into clinical practice does have certain obstacles like the need for clinical validation, concerns regarding the interpretability of AI system results, and addressing confusion and misunderstandings within the medical community. This study emphasizes the importance of AI systems to clearly convey their level of confidence in radiological diagnosis. This paper highlights the importance of conducting research to establish AI confidence level metrics that are limited to a specific anatomical region or lesion type. KEY POINT OF THE VIEW: Accurate fracture diagnosis relies on radiologic certainty, where Artificial intelligence (AI), especially convolutional neural networks (CNNs) and deep learning (DL), shows promise in enhancing X-ray interpretation amidst a shortage of radiologists. Overcoming integration challenges through improved AI interpretability and education is crucial for widespread acceptance and better patient outcomes.

Collapse

Breitwieser M, Moore V, Wiesner T, Wichlas F, Deininger C. NLP-Driven Analysis of Pneumothorax Incidence Following Central Venous Catheter Procedures: A Data-Driven Re-Evaluation of Routine Imaging in Value-Based Medicine. Diagnostics (Basel) 2024;14:2792. [PMID: 39767153 PMCID: PMC11674588 DOI: 10.3390/diagnostics14242792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 11/14/2024] [Accepted: 12/10/2024] [Indexed: 01/11/2025] Open

Abstract

Background: This study presents a systematic approach using a natural language processing (NLP) algorithm to assess the necessity of routine imaging after central venous catheter (CVC) placement and removal. With pneumothorax being a key complication of CVC procedures, this research aims to provide evidence-based recommendations for optimizing imaging protocols and minimizing unnecessary imaging risks. Methods: We analyzed electronic health records from four university hospitals in Salzburg, Austria, focusing on X-rays performed between 2012 and 2021 following CVC procedures. A custom-built NLP algorithm identified cases of pneumothorax from radiologists' reports and clinician requests, while excluding cases with contraindications such as chest injuries, prior pneumothorax, or missing data. Chi-square tests were used to compare pneumothorax rates between CVC insertion and removal, and multivariate logistic regression identified risk factors, with a focus on age and gender. Results: This study analyzed 17,175 cases of patients aged 18 and older, with 95.4% involving CVC insertion and 4.6% involving CVC removal. Pneumothorax was observed in 106 cases post-insertion (1.3%) and in 3 cases post-removal (0.02%), with no statistically significant difference between procedures (p = 0.5025). The NLP algorithm achieved an accuracy of 93%, with a sensitivity of 97.9%, a specificity of 87.9%, and an area under the ROC curve (AUC) of 0.9283. Conclusions: The findings indicate no significant difference in pneumothorax incidence between CVC insertion and removal, supporting existing recommendations against routine imaging post-removal for asymptomatic patients and suggesting that routine imaging after CVC insertion may also be unnecessary in similar cases. This study demonstrates how advanced NLP techniques can support value-based medicine by enhancing clinical decision making and optimizing resources.

Collapse

Guellil I, Wu J, Pradipta Gema A, Francis F, Berrachedi Y, Chenni N, Tobin R, Llewellyn C, Arakelyan S, Wu H, Guthrie B, Alex B. Natural language processing for detecting adverse drug events: A systematic review protocol. NIHR OPEN RESEARCH 2024;3:67. [PMID: 39931191 PMCID: PMC11808655 DOI: 10.3310/nihropenres.13504.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/07/2025] [Indexed: 02/13/2025]

Abstract

Background

Detecting Adverse Drug Events (ADEs) is an emerging research area, attracting great interest in the research community. Better anticipatory management of predisposing factors has considerable potential to improve outcomes. Automatic extraction of ADEs using Natural Language Processing (NLP) has a great potential to significantly facilitate efficient and effective distillation of such knowledge, to better understand and predict risk of adverse events.

Methods

This systematic review follows the six-stage including the literature from 6 databases (Embase, Medline, Web Of Science Core Collection, ACM Guide to Computing Literature, IEEE Digital Library and Scopus). Following the title, abstract and full-text screenings, characteristics and main findings of the included studies and resources will be tabulated and summarized. The risk of bias and reporting quality was assessed using the PROBAST tool.

Results

We developed our search strategy and collected all relevant publications. As of December 2024, we have completed all the stages of the systematic review. We identified 178 studies for inclusion through the academic literature search (where data was extracted from all of the papers). Right now, we are writing up the systematic review paper where we are synthesising the different findings. Further refinement of the eligibility criteria and data extraction has been ongoing since August 2022.

Conclusion

In this systematic review, we will identify and consolidate information and evidence related to the use and effectiveness of existing NLP approaches and tools for automatically detecting ADEs from free text (discharge summaries, General Practitioner notes, social media, etc.). Our findings will improve the understanding of the current landscape of the use of NLP for extracting ADEs. It will lead to better anticipatory management of predisposing factors with the potential to improve outcomes considerably. Our results will also be valuable both to NLP researchers developing methods to extract ADEs and to translational/clinical researchers who use NLP for this purpose and in healthcare in general. For example, from our initial analysis of the studies, we can conclude that the majority of the proposed works are about the detection (extraction) of ADEs from text. An important portion of studies also focus on the binary classification of text (for highlighting if it includes or not ADEs). Different challenges related to the unbalanced dataset, abbreviations and acronyms but also to the lower results with rare ADEs were also mentioned by the studied papers.

Collapse

Lastrucci A, Wandael Y, Barra A, Ricci R, Pirrera A, Lepri G, Gulino RA, Miele V, Giansanti D. Revolutionizing Radiology with Natural Language Processing and Chatbot Technologies: A Narrative Umbrella Review on Current Trends and Future Directions. J Clin Med 2024;13:7337. [PMID: 39685793 DOI: 10.3390/jcm13237337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 11/18/2024] [Accepted: 11/26/2024] [Indexed: 12/18/2024] Open

Abstract

The application of chatbots and NLP in radiology is an emerging field, currently characterized by a growing body of research. An umbrella review has been proposed utilizing a standardized checklist and quality control procedure for including scientific papers. This review explores the early developments and potential future impact of these technologies in radiology. The current literature, comprising 15 systematic reviews, highlights potentialities, opportunities, areas needing improvements, and recommendations. This umbrella review offers a comprehensive overview of the current landscape of natural language processing (NLP) and natural language models (NLMs), including chatbots, in healthcare. These technologies show potential for improving clinical decision-making, patient engagement, and communication across various medical fields. However, significant challenges remain, particularly the lack of standardized protocols, which raises concerns about the reliability and consistency of these tools in different clinical contexts. Without uniform guidelines, variability in outcomes may hinder the broader adoption of NLP/NLM technologies by healthcare providers. Moreover, the limited research on how these technologies intersect with medical devices (MDs) is a notable gap in the literature. Future research must address these challenges to fully realize the potential of NLP/NLM applications in healthcare. Key future research directions include the development of standardized protocols to ensure the consistent and safe deployment of NLP/NLM tools, particularly in high-stake areas like radiology. Investigating the integration of these technologies with MD workflows will be crucial to enhance clinical decision-making and patient care. Ethical concerns, such as data privacy, informed consent, and algorithmic bias, must also be explored to ensure responsible use in clinical settings. Longitudinal studies are needed to evaluate the long-term impact of these technologies on patient outcomes, while interdisciplinary collaboration between healthcare professionals, data scientists, and ethicists is essential for driving innovation in an ethically sound manner. Addressing these areas will advance the application of NLP/NLM technologies and improve patient care in this emerging field.

Collapse

Lee JJ, Zepeda A, Arbour G, Isaac KV, Ng RT, Nichol AM. Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing. JCO Clin Cancer Inform 2024;8:e2400107. [PMID: 39705642 DOI: 10.1200/cci.24.00107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 08/15/2024] [Accepted: 10/18/2024] [Indexed: 12/22/2024] Open

Xiang RF. Use of n-grams and K-means clustering to classify data from free text bone marrow reports. J Pathol Inform 2024;15:100358. [PMID: 38292072 PMCID: PMC10825612 DOI: 10.1016/j.jpi.2023.100358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/10/2023] [Accepted: 12/23/2023] [Indexed: 02/01/2024] Open

Chen LC, Zack T, Demirci A, Sushil M, Miao B, Kasap C, Butte A, Collisson EA, Hong JC. Assessing Large Language Models for Oncology Data Inference From Radiology Reports. JCO Clin Cancer Inform 2024;8:e2400126. [PMID: 39661914 DOI: 10.1200/cci.24.00126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 08/14/2024] [Accepted: 09/23/2024] [Indexed: 12/13/2024] Open

Abstract

PURPOSE

We examined the effectiveness of proprietary and open large language models (LLMs) in detecting disease presence, location, and treatment response in pancreatic cancer from radiology reports.

METHODS

We analyzed 203 deidentified radiology reports, manually annotated for disease status, location, and indeterminate nodules needing follow-up. Using generative pre-trained transformer (GPT)-4, GPT-3.5-turbo, and open models such as Gemma-7B and Llama3-8B, we employed strategies such as ablation and prompt engineering to boost accuracy. Discrepancies between human and model interpretations were reviewed by a secondary oncologist.

RESULTS

Among 164 patients with pancreatic tumor, GPT-4 showed the highest accuracy in inferring disease status, achieving a 75.5% correctness (F1-micro). Open models Mistral-7B and Llama3-8B performed comparably, with accuracies of 68.6% and 61.4%, respectively. Mistral-7B excelled in deriving correct inferences from objective findings directly. Most tested models demonstrated proficiency in identifying disease containing anatomic locations from a list of choices, with GPT-4 and Llama3-8B showing near-parity in precision and recall for disease site identification. However, open models struggled with differentiating benign from malignant postsurgical changes, affecting their precision in identifying findings indeterminate for cancer. A secondary review occasionally favored GPT-3.5's interpretations, indicating the variability in human judgment.

CONCLUSION

LLMs, especially GPT-4, are proficient in deriving oncologic insights from radiology reports. Their performance is enhanced by effective summarization strategies, demonstrating their potential in clinical support and health care analytics. This study also underscores the possibility of zero-shot open model utility in environments where proprietary models are restricted. Finally, by providing a set of annotated radiology reports, this paper presents a valuable data set for further LLM research in oncology.

Collapse

Valente AS, Trunfio TA, Aiello M, Baldi D, Baldi M, Imbò S, Russo MA, Cavaliere C, Franzese M. Text mining approach for feature extraction and cartilage disease grade classification using knee MRI radiology reports. Comput Struct Biotechnol J 2024;24:622-629. [PMID: 39963548 PMCID: PMC11832019 DOI: 10.1016/j.csbj.2024.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 10/01/2024] [Accepted: 10/01/2024] [Indexed: 02/20/2025] Open

Cho HN, Jun TJ, Kim YH, Kang H, Ahn I, Gwon H, Kim Y, Seo J, Choi H, Kim M, Han J, Kee G, Park S, Ko S. Task-Specific Transformer-Based Language Models in Health Care: Scoping Review. JMIR Med Inform 2024;12:e49724. [PMID: 39556827 PMCID: PMC11612605 DOI: 10.2196/49724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 07/10/2023] [Accepted: 10/21/2024] [Indexed: 11/20/2024] Open

Abstract

BACKGROUND

Transformer-based language models have shown great potential to revolutionize health care by advancing clinical decision support, patient interaction, and disease prediction. However, despite their rapid development, the implementation of transformer-based language models in health care settings remains limited. This is partly due to the lack of a comprehensive review, which hinders a systematic understanding of their applications and limitations. Without clear guidelines and consolidated information, both researchers and physicians face difficulties in using these models effectively, resulting in inefficient research efforts and slow integration into clinical workflows.

OBJECTIVE

This scoping review addresses this gap by examining studies on medical transformer-based language models and categorizing them into 6 tasks: dialogue generation, question answering, summarization, text classification, sentiment analysis, and named entity recognition.

METHODS

We conducted a scoping review following the Cochrane scoping review protocol. A comprehensive literature search was performed across databases, including Google Scholar and PubMed, covering publications from January 2017 to September 2024. Studies involving transformer-derived models in medical tasks were included. Data were categorized into 6 key tasks.

RESULTS

Our key findings revealed both advancements and critical challenges in applying transformer-based models to health care tasks. For example, models like MedPIR involving dialogue generation show promise but face privacy and ethical concerns, while question-answering models like BioBERT improve accuracy but struggle with the complexity of medical terminology. The BioBERTSum summarization model aids clinicians by condensing medical texts but needs better handling of long sequences.

CONCLUSIONS

This review attempted to provide a consolidated understanding of the role of transformer-based language models in health care and to guide future research directions. By addressing current challenges and exploring the potential for real-world applications, we envision significant improvements in health care informatics. Addressing the identified challenges and implementing proposed solutions can enable transformer-based language models to significantly improve health care delivery and patient outcomes. Our review provides valuable insights for future research and practical applications, setting the stage for transformative advancements in medical informatics.

Collapse

Affiliation(s)

Ha Na Cho Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
Tae Joon Jun Big Data Research Center, Asan Institute for Life Sciences, Asan Medical Center, Seoul, Republic of Korea
Young-Hak Kim Division of Cardiology, Department of Information Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Heejun Kang Division of Cardiology, Asan Medical Center, Seoul, Republic of Korea
Imjin Ahn Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
Hansle Gwon Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
Yunha Kim Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Jiahn Seo Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Heejung Choi Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Minkyoung Kim Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Jiye Han Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Gaeun Kee Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
Seohyun Park Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
Soyoung Ko Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea

Collapse

Lotfian G, Parekh K, Abdul Sami M, Suthar PP. Evaluation of ChatGPT 4.0 in Thoracic Imaging and Diagnostics. Cureus 2024;16:e73741. [PMID: 39677135 PMCID: PMC11646414 DOI: 10.7759/cureus.73741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/15/2024] [Indexed: 12/17/2024] Open

Abstract

Recent advancements in natural language processing (NLP) have profoundly transformed the medical industry, enhancing large cohort data analysis, improving diagnostic capabilities, and streamlining clinical workflows. Among the leading tools in this domain is ChatGPT 4.0 (OpenAI, San Francisco, California, US), a commercial NLP model widely used across various applications. This study evaluates the diagnostic performance of ChatGPT 4.0 specifically in thoracic imaging by assessing its ability to answer diagnostic questions related to this field. We utilized the model to respond to multiple-choice questions derived from thoracic imaging scenarios, followed by rigorous statistical analysis to assess its accuracy and variability across different subgroups. Our analysis revealed significant variability across different subgroups. Overall, the model achieved an impressive accuracy of 84.9% in diagnosing thoracic radiology questions. It excelled in terminology and diagnostic signs, achieving perfect scores, and demonstrated strong performance in the intensive care and normal anatomy categories, with accuracies of 90% and 80%, respectively. In pathology subgroups, ChatGPT achieved an average accuracy of 89.1%, particularly excelling in diagnosing infectious pneumonia and atelectasis, though it scored lower in diffuse alveolar disease (66.7%). For disease-related questions, the mean accuracy was 79.1%, with perfect scores in several specific subcategories. However, accuracy was notably lower for vascular disease (50%) and lung cancer (66.7%). In conclusion, while ChatGPT 4.0 shows strong potential in diagnosing thoracic conditions, the variability identified underscores the necessity for ongoing research and refinement of its transformer architecture. This will enhance its reliability and applicability in broader clinical and patient care settings.

Collapse

Martín-Noguerol T, López-Úbeda P, Paulano-Godino F, Luna A. Natural language processing-based analysis of the level of adoption by expert radiologists of the ASSR, ASNR and NASS version 2.0 of lumbar disc nomenclature: an eight-year survey. Quant Imaging Med Surg 2024;14:7780-7790. [PMID: 39544464 PMCID: PMC11558493 DOI: 10.21037/qims-23-1294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Accepted: 12/26/2023] [Indexed: 11/17/2024]

Yao J, Chu LC, Patlas M. Applications of Artificial Intelligence in Acute Abdominal Imaging. Can Assoc Radiol J 2024;75:761-770. [PMID: 38715249 DOI: 10.1177/08465371241250197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024] Open

Shankar SV, Dhingra LS, Aminorroaya A, Adejumo P, Nadkarni GN, Xu H, Brandt C, Oikonomou EK, Pedroso AF, Khera R. Automated Transformation of Unstructured Cardiovascular Diagnostic Reports into Structured Datasets Using Sequentially Deployed Large Language Models. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.10.08.24315035. [PMID: 39417094 PMCID: PMC11482995 DOI: 10.1101/2024.10.08.24315035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]

Abstract

Background

Rich data in cardiovascular diagnostic testing are often sequestered in unstructured reports, with the necessity of manual abstraction limiting their use in real-time applications in patient care and research.

Methods

We developed a two-step process that sequentially deploys generative and interpretative large language models (LLMs; Llama2 70b and Llama2 13b). Using a Llama2 70b model, we generated varying formats of transthoracic echocardiogram (TTE) reports from 3,000 real-world echo reports with paired structured elements, leveraging temporal changes in reporting formats to define the variations. Subsequently, we fine-tuned Llama2 13b using sequentially larger batches of generated echo reports as inputs, to extract data from free-text narratives across 18 clinically relevant echocardiographic fields. This was set up as a prompt-based supervised training task. We evaluated the fine-tuned Llama2 13b model, HeartDx-LM, on several distinct echocardiographic datasets: (i) reports across the different time periods and formats at Yale New Haven Health System (YNHHS), (ii) the Medical Information Mart for Intensive Care (MIMIC) III dataset, and (iii) the MIMIC IV dataset. We used the accuracy of extracted fields and Cohen's Kappa as the metrics and have publicly released the HeartDX-LM model.

Results

The HeartDX-LM model was trained on randomly selected 2,000 synthetic echo reports with varying formats and paired structured labels, with a wide range of clinical findings. We identified a lower threshold of 500 annotated reports required for fine-tuning Llama2 13b to achieve stable and consistent performance. At YNHHS, the HeartDx-LM model accurately extracted 69,144 out of 70,032 values (98.7%) across 18 clinical fields from unstructured reports in the test set from contemporary records where paired structured data were also available. In older echo reports where only unstructured reports were available, the model achieved 87.1% accuracy against expert annotations for the same 18 fields for a random sample of 100 reports. Similarly, in expert-annotated external validation sets from MIMIC-IV and MIMIC-III, HeartDx-LM correctly extracted 201 out of 220 available values (91.3%) and 615 out of 707 available values (87.9%), respectively, from 100 randomly chosen and expert annotated echo reports from each set.

Conclusion

We developed a novel method using paired large and moderate-sized LLMs to automate the extraction of unstructured echocardiographic reports into tabular datasets. Our approach represents a scalable strategy that transforms unstructured reports into computable elements that can be leveraged to improve cardiovascular care quality and enable research.

Collapse

Cai W. Uncovering Demographic Bias in Natural Language Processing Tools for Radiology. Radiology 2024;313:e242723. [PMID: 39436296 DOI: 10.1148/radiol.242723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]

Mittermeier A, Aßenmacher M, Schachtner B, Grosu S, Dakovic V, Kandratovich V, Sabel B, Ingrisch M. [Automatic ICD-10 coding : Natural language processing for German MRI reports]. RADIOLOGIE (HEIDELBERG, GERMANY) 2024;64:793-800. [PMID: 39120724 DOI: 10.1007/s00117-024-01349-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/27/2024] [Indexed: 08/10/2024]

Abstract

BACKGROUND

The medical coding of radiology reports is essential for a good quality of care and correct billing, but at the same time a complex and error-prone task.

OBJECTIVE

To assess the performance of natural language processing (NLP) for ICD-10 coding of German radiology reports using fine tuning of suitable language models.

MATERIAL AND METHODS

This retrospective study included all magnetic resonance imaging (MRI) radiology reports acquired at our institution between 2010 and 2020. The codes on discharge ICD-10 were matched to the corresponding reports to construct a dataset for multiclass classification. Fine tuning of GermanBERT and flanT5 was carried out on the total dataset (dstotal) containing 1035 different ICD-10 codes and 2 reduced subsets containing the 100 (ds100) and 50 (ds50) most frequent codes. The performance of the model was assessed using top‑k accuracy for k = 1, 3 and 5. In an ablation study both models were trained on the accompanying metadata and the radiology report alone.

RESULTS

The total dataset consisted of 100,672 radiology reports, the reduced subsets ds100 of 68,103 and ds50 of 52,293 reports. The performance of the model increased when several of the best predictions of the model were taken into consideration, when the number of target classes was reduced and the metadata were combined with the report. The flanT5 outperformed GermanBERT across all datasets and metrics and was is suited as a medical coding assistant, achieving a top 3 accuracy of nearly 70% in the real-world dataset dstotal.

CONCLUSION

Finely tuned language models can reliably predict ICD-10 codes of German magnetic resonance imaging (MRI) radiology reports across various settings. As a coding assistant flanT5 can guide medical coders to make informed decisions and potentially reduce the workload.

Collapse

Su Y, Babore YB, Kahn CE. A Large Language Model to Detect Negated Expressions in Radiology Reports. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01274-9. [PMID: 39322813 DOI: 10.1007/s10278-024-01274-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 08/28/2024] [Accepted: 09/12/2024] [Indexed: 09/27/2024]

Abstract

Natural language processing (NLP) is crucial to extract information accurately from unstructured text to provide insights for clinical decision-making, quality improvement, and medical research. This study compared the performance of a rule-based NLP system and a medical-domain transformer-based model to detect negated concepts in radiology reports. Using a corpus of 984 de-identified radiology reports from a large U.S.-based academic health system (1000 consecutive reports, excluding 16 duplicates), the investigators compared the rule-based medspaCy system and the Clinical Assertion and Negation Classification Bidirectional Encoder Representations from Transformers (CAN-BERT) system to detect negated expressions of terms from RadLex, the Unified Medical Language System Metathesaurus, and the Radiology Gamuts Ontology. Power analysis determined a sample size of 382 terms to achieve α = 0.05 and β = 0.8 for McNemar's test; based on an estimate of 15% negated terms, 2800 randomly selected terms were annotated manually as negated or not negated. Precision, recall, and F1 of the two models were compared using McNemar's test. Of the 2800 terms, 387 (13.8%) were negated. For negation detection, medspaCy attained a recall of 0.795, precision of 0.356, and F1 of 0.492. CAN-BERT achieved a recall of 0.785, precision of 0.768, and F1 of 0.777. Although recall was not significantly different, CAN-BERT had significantly better precision (χ2 = 304.64; p < 0.001). The transformer-based CAN-BERT model detected negated terms in radiology reports with high precision and recall; its precision significantly exceeded that of the rule-based medspaCy system. Use of this system will improve data extraction from textual reports to support information retrieval, AI model training, and discovery of causal relationships.

Collapse

Omar M, Naffaa ME, Glicksberg BS, Reuveni H, Nadkarni GN, Klang E. Advancing rheumatology with natural language processing: insights and prospects from a systematic review. Rheumatol Adv Pract 2024;8:rkae120. [PMID: 39399162 PMCID: PMC11467191 DOI: 10.1093/rap/rkae120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 08/14/2024] [Indexed: 10/15/2024] Open

Zhao Y, Coppola A, Karamchandani U, Amiras D, Gupte CM. Artificial intelligence applied to magnetic resonance imaging reliably detects the presence, but not the location, of meniscus tears: a systematic review and meta-analysis. Eur Radiol 2024;34:5954-5964. [PMID: 38386028 PMCID: PMC11364796 DOI: 10.1007/s00330-024-10625-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 12/24/2023] [Accepted: 01/13/2024] [Indexed: 02/23/2024]

Abstract

OBJECTIVES

To review and compare the accuracy of convolutional neural networks (CNN) for the diagnosis of meniscal tears in the current literature and analyze the decision-making processes utilized by these CNN algorithms.

MATERIALS AND METHODS

PubMed, MEDLINE, EMBASE, and Cochrane databases up to December 2022 were searched in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) statement. Risk of analysis was used for all identified articles. Predictive performance values, including sensitivity and specificity, were extracted for quantitative analysis. The meta-analysis was divided between AI prediction models identifying the presence of meniscus tears and the location of meniscus tears.

RESULTS

Eleven articles were included in the final review, with a total of 13,467 patients and 57,551 images. Heterogeneity was statistically significantly large for the sensitivity of the tear identification analysis (I2 = 79%). A higher level of accuracy was observed in identifying the presence of a meniscal tear over locating tears in specific regions of the meniscus (AUC, 0.939 vs 0.905). Pooled sensitivity and specificity were 0.87 (95% confidence interval (CI) 0.80-0.91) and 0.89 (95% CI 0.83-0.93) for meniscus tear identification and 0.88 (95% CI 0.82-0.91) and 0.84 (95% CI 0.81-0.85) for locating the tears.

CONCLUSIONS

AI prediction models achieved favorable performance in the diagnosis, but not location, of meniscus tears. Further studies on the clinical utilities of deep learning should include standardized reporting, external validation, and full reports of the predictive performances of these models, with a view to localizing tears more accurately.

CLINICAL RELEVANCE STATEMENT

Meniscus tears are hard to diagnose in the knee magnetic resonance images. AI prediction models may play an important role in improving the diagnostic accuracy of clinicians and radiologists.

KEY POINTS

• Artificial intelligence (AI) provides great potential in improving the diagnosis of meniscus tears. • The pooled diagnostic performance for artificial intelligence (AI) in identifying meniscus tears was better (sensitivity 87%, specificity 89%) than locating the tears (sensitivity 88%, specificity 84%). • AI is good at confirming the diagnosis of meniscus tears, but future work is required to guide the management of the disease.

Collapse

Mostafa E, Hui A, Aasman B, Chowdary K, Mani K, Mardakhaev E, Zampolin R, Blumfield E, Berman J, Ramos RDLG, Fourman M, Yassari R, Eleswarapu A, Mirhaji P. Development of a natural language processing algorithm for the detection of spinal metastasis based on magnetic resonance imaging reports. NORTH AMERICAN SPINE SOCIETY JOURNAL 2024;19:100513. [PMID: 39149563 PMCID: PMC11325227 DOI: 10.1016/j.xnsj.2024.100513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 06/25/2024] [Indexed: 08/17/2024]

Abstract

Background

Metastasis to the spinal column is a common complication of malignancy, potentially causing pain and neurologic injury. An automated system to identify and refer patients with spinal metastases can help overcome barriers to timely treatment. We describe the training, optimization and validation of a natural language processing algorithm to identify the presence of vertebral metastasis and metastatic epidural cord compression (MECC) from radiology reports of spinal MRIs.

Methods

Reports from patients with spine MRI studies performed between January 1, 2008 and April 14, 2019 were reviewed by a team of radiologists to assess for the presence of cancer and generate a labeled dataset for model training. Using regular expression, impression sections were extracted from the reports and converted to all lower-case letters with all nonalphabetic characters removed. The reports were then tokenized and vectorized using the doc2vec algorithm. These were then used to train a neural network to predict the likelihood of spinal tumor or MECC. For each report, the model provided a number from 0 to 1 corresponding to its impression. We then obtained 111 MRI reports from outside the test set, 92 manually labeled negative and 19 with MECC to test the model's performance.

Results

About 37,579 radiology reports were reviewed. About 36,676 were labeled negative, and 903 with MECC. We chose a cutoff of 0.02 as a positive result to optimize for a low false negative rate. At this threshold we found a 100% sensitivity rate with a low false positive rate of 2.2%.

Conclusions

The NLP model described predicts the presence of spinal tumor and MECC in spine MRI reports with high accuracy. We plan to implement the algorithm into our EMR to allow for faster referral of these patients to appropriate specialists, allowing for reduced morbidity and increased survival.

Collapse

Reichenpfader D, Müller H, Denecke K. A scoping review of large language model based approaches for information extraction from radiology reports. NPJ Digit Med 2024;7:222. [PMID: 39182008 PMCID: PMC11344824 DOI: 10.1038/s41746-024-01219-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 08/09/2024] [Indexed: 08/27/2024] Open

Bergomi L, Buonocore TM, Antonazzo P, Alberghi L, Bellazzi R, Preda L, Bortolotto C, Parimbelli E. Reshaping free-text radiology notes into structured reports with generative question answering transformers. Artif Intell Med 2024;154:102924. [PMID: 38964194 DOI: 10.1016/j.artmed.2024.102924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/22/2024] [Accepted: 06/25/2024] [Indexed: 07/06/2024]

Abstract

BACKGROUND

Radiology reports are typically written in a free-text format, making clinical information difficult to extract and use. Recently, the adoption of structured reporting (SR) has been recommended by various medical societies thanks to the advantages it offers, e.g. standardization, completeness, and information retrieval. We propose a pipeline to extract information from Italian free-text radiology reports that fits with the items of the reference SR registry proposed by a national society of interventional and medical radiology, focusing on CT staging of patients with lymphoma.

METHODS

Our work aims to leverage the potential of Natural Language Processing and Transformer-based models to deal with automatic SR registry filling. With the availability of 174 Italian radiology reports, we investigate a rule-free generative Question Answering approach based on the Italian-specific version of T5: IT5. To address information content discrepancies, we focus on the six most frequently filled items in the annotations made on the reports: three categorical (multichoice), one free-text (free-text), and two continuous numerical (factual). In the preprocessing phase, we encode also information that is not supposed to be entered. Two strategies (batch-truncation and ex-post combination) are implemented to comply with the IT5 context length limitations. Performance is evaluated in terms of strict accuracy, f1, and format accuracy, and compared with the widely used GPT-3.5 Large Language Model. Unlike multichoice and factual, free-text answers do not have 1-to-1 correspondence with their reference annotations. For this reason, we collect human-expert feedback on the similarity between medical annotations and generated free-text answers, using a 5-point Likert scale questionnaire (evaluating the criteria of correctness and completeness).

RESULTS

The combination of fine-tuning and batch splitting allows IT5 ex-post combination to achieve notable results in terms of information extraction of different types of structured data, performing on par with GPT-3.5. Human-based assessment scores of free-text answers show a high correlation with the AI performance metrics f1 (Spearman's correlation coefficients>0.5, p-values<0.001) for both IT5 ex-post combination and GPT-3.5. The latter is better at generating plausible human-like statements, even if it systematically provides answers even when they are not supposed to be given.

CONCLUSIONS

In our experimental setting, a fine-tuned Transformer-based model with a modest number of parameters (i.e., IT5, 220 M) performs well as a clinical information extraction system for automatic SR registry filling task. It can extract information from more than one place in the report, elaborating it in a manner that complies with the response specifications provided by the SR registry (for multichoice and factual items), or that closely approximates the work of a human-expert (free-text items); with the ability to discern when an answer is supposed to be given or not to a user query.

Collapse

Kim S, Kim SS, Kim E, Cecchini M, Park MS, Choi JA, Kim SH, Hwang HK, Kang CM, Choi HJ, Shin SJ, Kang J, Lee CK. Deep-Transfer-Learning-Based Natural Language Processing of Serial Free-Text Computed Tomography Reports for Predicting Survival of Patients With Pancreatic Cancer. JCO Clin Cancer Inform 2024;8:e2400021. [PMID: 39151114 DOI: 10.1200/cci.24.00021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 04/22/2024] [Accepted: 07/10/2024] [Indexed: 08/18/2024] Open

Abstract

PURPOSE

To explore the predictive potential of serial computed tomography (CT) radiology reports for pancreatic cancer survival using natural language processing (NLP).

METHODS

Deep-transfer-learning-based NLP models were retrospectively trained and tested with serial, free-text CT reports, and survival information of consecutive patients diagnosed with pancreatic cancer in a Korean tertiary hospital was extracted. Randomly selected patients with pancreatic cancer and their serial CT reports from an independent tertiary hospital in the United States were included in the external testing data set. The concordance index (c-index) of predicted survival and actual survival, and area under the receiver operating characteristic curve (AUROC) for predicting 1-year survival were calculated.

RESULTS

Between January 2004 and June 2021, 2,677 patients with 12,255 CT reports and 670 patients with 3,058 CT reports were allocated to training and internal testing data sets, respectively. ClinicalBERT (Bidirectional Encoder Representations from Transformers) model trained on the single, first CT reports showed a c-index of 0.653 and AUROC of 0.722 in predicting the overall survival of patients with pancreatic cancer. ClinicalBERT trained on up to 15 consecutive reports from the initial report showed an improved c-index of 0.811 and AUROC of 0.911. On the external testing set with 273 patients with 1,947 CT reports, the AUROC was 0.888, indicating the generalizability of our model. Further analyses showed our model's contextual interpretation beyond specific phrases.

CONCLUSION

Deep-transfer-learning-based NLP model of serial CT reports can predict the survival of patients with pancreatic cancer. Clinical decisions can be supported by the developed model, with survival information extracted solely from serial radiology reports.

Collapse

Affiliation(s)

Sunkyu Kim Department of Computer Science and Engineering, Korea University, Seoul, Korea
Seung-Seob Kim Department of Radiology and Research Institute of Radiological Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea
Eejung Kim Department of Internal Medicine (Medical Oncology), Yale University School of Medicine, New Haven, CT Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
Michael Cecchini Department of Internal Medicine (Medical Oncology), Yale University School of Medicine, New Haven, CT
Mi-Suk Park Department of Radiology and Research Institute of Radiological Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea
Ji A Choi Song-dang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, Korea
Sung Hyun Kim Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea Department of Surgery, Yonsei University College of Medicine, Seoul, Korea
Ho Kyoung Hwang Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea Department of Surgery, Yonsei University College of Medicine, Seoul, Korea
Chang Moo Kang Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea Department of Surgery, Yonsei University College of Medicine, Seoul, Korea
Hye Jin Choi Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea Division of Medical Oncology, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Korea
Sang Joon Shin Song-dang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, Korea Division of Medical Oncology, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Korea
Jaewoo Kang Department of Computer Science and Engineering, Korea University, Seoul, Korea AIGEN Sciences Inc, Seoul, Korea
Choong-Kun Lee Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea Song-dang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, Korea Division of Medical Oncology, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Korea

Collapse

Tejani AS, Bialecki B, O’Donnell K, Sippel Schmidt T, Kohli MD, Alkasab T. Standardizing imaging findings representation: harnessing Common Data Elements semantics and Fast Healthcare Interoperability Resources structures. J Am Med Inform Assoc 2024;31:1735-1742. [PMID: 38900188 PMCID: PMC11258419 DOI: 10.1093/jamia/ocae134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 05/22/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024] Open

Abstract

OBJECTIVES

Designing a framework representing radiology results in a standards-based data structure using joint Radiological Society of North America/American College of Radiology Common Data Elements (CDEs) as the semantic labels on standard structures. This allows radiologist-created report data to integrate with artificial intelligence-generated results for use throughout downstream systems.

MATERIALS AND METHODS

We developed a framework modeling radiology findings as Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) observations using CDE set/element identifiers as standardized semantic labels. This framework deploys CDE identifiers to specify radiology findings and attributes, providing consistent labels for radiology report concepts-diagnoses, recommendations, tabular/quantitative data-with built-in integration with RadLex, SNOMED CT, LOINC, and other ontologies. Observation structures fit within larger HL7 FHIR DiagnosticReport resources, providing output including both nuanced text and structured data.

RESULTS

Labeling radiology findings as discrete data for interchange between systems requires two components: structure and semantics. CDE definitions provide semantic identifiers for findings and their component values. The FHIR observation resource specifies a structure for associating identifiers with radiology findings in the context of reports, with CDE-encoded observations referring to definitions for CDE identifiers in a central repository. The discussion includes an example of encoding pulmonary nodules on a chest CT as CDE-labeled observations, demonstrating the application of this framework to exchange findings throughout the imaging workflow, making imaging data available to downstream clinical systems.

DISCUSSION

CDE-labeled observations establish a lingua franca for encoding, exchanging, and consuming radiology data at the level of individual findings, facilitating use throughout healthcare systems.

IMPORTANCE

CDE-labeled FHIR observation objects can increase the value of radiology results by facilitating their use throughout patient care.

Collapse

Wieland-Jorna Y, van Kooten D, Verheij RA, de Man Y, Francke AL, Oosterveld-Vlug MG. Natural language processing systems for extracting information from electronic health records about activities of daily living. A systematic review. JAMIA Open 2024;7:ooae044. [PMID: 38798774 PMCID: PMC11126158 DOI: 10.1093/jamiaopen/ooae044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 03/21/2024] [Accepted: 05/07/2024] [Indexed: 05/29/2024] Open

Lam BD, Chrysafi P, Chiasakul T, Khosla H, Karagkouni D, McNichol M, Adamski A, Reyes N, Abe K, Mantha S, Vlachos IS, Zwicker JI, Patell R. Machine learning natural language processing for identifying venous thromboembolism: systematic review and meta-analysis. Blood Adv 2024;8:2991-3000. [PMID: 38522096 PMCID: PMC11215191 DOI: 10.1182/bloodadvances.2023012200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 02/22/2024] [Accepted: 02/22/2024] [Indexed: 03/26/2024] Open

Abstract

ABSTRACT

Venous thromboembolism (VTE) is a leading cause of preventable in-hospital mortality. Monitoring VTE cases is limited by the challenges of manual medical record review and diagnosis code interpretation. Natural language processing (NLP) can automate the process. Rule-based NLP methods are effective but time consuming. Machine learning (ML)-NLP methods present a promising solution. We conducted a systematic review and meta-analysis of studies published before May 2023 that use ML-NLP to identify VTE diagnoses in the electronic health records. Four reviewers screened all manuscripts, excluding studies that only used a rule-based method. A meta-analysis evaluated the pooled performance of each study's best performing model that evaluated for pulmonary embolism and/or deep vein thrombosis. Pooled sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with confidence interval (CI) were calculated by DerSimonian and Laird method using a random-effects model. Study quality was assessed using an adapted TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) tool. Thirteen studies were included in the systematic review and 8 had data available for meta-analysis. Pooled sensitivity was 0.931 (95% CI, 0.881-0.962), specificity 0.984 (95% CI, 0.967-0.992), PPV 0.910 (95% CI, 0.865-0.941) and NPV 0.985 (95% CI, 0.977-0.990). All studies met at least 13 of the 21 NLP-modified TRIPOD items, demonstrating fair quality. The highest performing models used vectorization rather than bag-of-words and deep-learning techniques such as convolutional neural networks. There was significant heterogeneity in the studies, and only 4 validated their model on an external data set. Further standardization of ML studies can help progress this novel technology toward real-world implementation.

Collapse

Affiliation(s)

Barbara D. Lam Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA Division of Clinical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
Pavlina Chrysafi Department of Medicine, Mount Auburn Hospital, Harvard Medical School, Boston, MA
Thita Chiasakul Center of Excellence in Translational Hematology, Division of Hematology, Department of Medicine, Faculty of Medicine, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok, Thailand
Harshit Khosla Department of Medicine, Saint Vincent Hospital, Worcester, MA
Dimitra Karagkouni Department of Pathology, Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
Megan McNichol Library Sciences, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
Alys Adamski Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
Nimia Reyes Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
Karon Abe Division of Blood Disorders, National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
Simon Mantha Division of Hematology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
Ioannis S. Vlachos Department of Pathology, Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
Jeffrey I. Zwicker Division of Hematology, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
Rushad Patell Division of Hematology, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA

Collapse

Patell R, Zwicker JI, Singh R, Mantha S. Machine learning in cancer-associated thrombosis: hype or hope in untangling the clot. BLEEDING, THROMBOSIS AND VASCULAR BIOLOGY 2024;3:123. [PMID: 39323613 PMCID: PMC11423546 DOI: 10.4081/btvb.2024.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 03/22/2024] [Indexed: 09/27/2024]

Sindhu A, Jadhav U, Ghewade B, Bhanushali J, Yadav P. Revolutionizing Pulmonary Diagnostics: A Narrative Review of Artificial Intelligence Applications in Lung Imaging. Cureus 2024;16:e57657. [PMID: 38707160 PMCID: PMC11070215 DOI: 10.7759/cureus.57657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 04/04/2024] [Indexed: 05/07/2024] Open

Crombé A, Lecomte JC, Seux M, Banaste N, Gorincour G. Using the Textual Content of Radiological Reports to Detect Emerging Diseases: A Proof-of-Concept Study of COVID-19. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024;37:620-632. [PMID: 38343242 PMCID: PMC11031522 DOI: 10.1007/s10278-023-00949-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/02/2023] [Accepted: 10/04/2023] [Indexed: 04/20/2024]

Abstract

Changes in the content of radiological reports at population level could detect emerging diseases. Herein, we developed a method to quantify similarities in consecutive temporal groupings of radiological reports using natural language processing, and we investigated whether appearance of dissimilarities between consecutive periods correlated with the beginning of the COVID-19 pandemic in France. CT reports from 67,368 consecutive adults across 62 emergency departments throughout France between October 2019 and March 2020 were collected. Reports were vectorized using time frequency-inverse document frequency (TF-IDF) analysis on one-grams. For each successive 2-week period, we performed unsupervised clustering of the reports based on TF-IDF values and partition-around-medoids. Next, we assessed the similarities between this clustering and a clustering from two weeks before according to the average adjusted Rand index (AARI). Statistical analyses included (1) cross-correlation functions (CCFs) with the number of positive SARS-CoV-2 tests and advanced sanitary index for flu syndromes (ASI-flu, from open-source dataset), and (2) linear regressions of time series at different lags to understand the variations of AARI over time. Overall, 13,235 chest CT reports were analyzed. AARI was correlated with ASI-flu at lag = + 1, + 5, and + 6 weeks (P = 0.0454, 0.0121, and 0.0042, respectively) and with SARS-CoV-2 positive tests at lag = - 1 and 0 week (P = 0.0057 and 0.0001, respectively). In the best fit, AARI correlated with the ASI-flu with a lag of 2 weeks (P = 0.0026), SARS-CoV-2-positive tests in the same week (P < 0.0001) and their interaction (P < 0.0001) (adjusted R2 = 0.921). Thus, our method enables the automatic monitoring of changes in radiological reports and could help capturing disease emergence.

Collapse

Martín-Noguerol T, López-Úbeda P, Luna A. Imagine there is no paperwork… it's easy if you try. Br J Radiol 2024;97:744-746. [PMID: 38335929 PMCID: PMC11027242 DOI: 10.1093/bjr/tqae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 01/11/2024] [Accepted: 02/05/2024] [Indexed: 02/12/2024] Open

Cappello A, Murgia Y, Giacobbe DR, Mora S, Gazzarata R, Rosso N, Giacomini M, Bassetti M. Automated extraction of standardized antibiotic resistance and prescription data from laboratory information systems and electronic health records: a narrative review. FRONTIERS IN ANTIBIOTICS 2024;3:1380380. [PMID: 39816258 PMCID: PMC11731964 DOI: 10.3389/frabi.2024.1380380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 02/26/2024] [Indexed: 01/18/2025]

Martín-Noguerol T, López-Úbeda P, Pons-Escoda A, Luna A. Natural language processing deep learning models for the differential between high-grade gliomas and metastasis: what if the key is how we report them? Eur Radiol 2024;34:2113-2120. [PMID: 37665389 DOI: 10.1007/s00330-023-10202-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/10/2023] [Accepted: 07/20/2023] [Indexed: 09/05/2023]

Abstract

OBJECTIVES

The differential between high-grade glioma (HGG) and metastasis remains challenging in common radiological practice. We compare different natural language processing (NLP)-based deep learning models to assist radiologists based on data contained in radiology reports.

METHODS

This retrospective study included 185 MRI reports between 2010 and 2022 from two different institutions. A total of 117 reports were used for the training and 21 were reserved for the validation set, while the rest were used as a test set. A comparison of the performance of different deep learning models for HGG and metastasis classification has been carried out. Specifically, Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), a hybrid version of BiLSTM and CNN, and a radiology-specific Bidirectional Encoder Representations from Transformers (RadBERT) model were used.

RESULTS

For the classification of MRI reports, the CNN network provided the best results among all tested, showing a macro-avg precision of 87.32%, a sensitivity of 87.45%, and an F1 score of 87.23%. In addition, our NLP algorithm detected keywords such as tumor, temporal, and lobe to positively classify a radiological report as HGG or metastasis group.

CONCLUSIONS

A deep learning model based on CNN enables radiologists to discriminate between HGG and metastasis based on MRI reports with high-precision values. This approach should be considered an additional tool in diagnosing these central nervous system lesions.

CLINICAL RELEVANCE STATEMENT

The use of our NLP model enables radiologists to differentiate between patients with high-grade glioma and metastasis based on their MRI reports and can be used as an additional tool to the conventional image-based approach for this challenging task.

KEY POINTS

• Differential between high-grade glioma and metastasis is still challenging in common radiological practice. • Natural language processing (NLP)-based deep learning models can assist radiologists based on data contained in radiology reports. • We have developed and tested a natural language processing model for discriminating between high-grade glioma and metastasis based on MRI reports that show high precision for this task.

Collapse

Liu W, Cai L, Li Y. Application of natural language processing to post-structuring of rectal cancer MRI reports. Clin Radiol 2024;79:e204-e210. [PMID: 38042740 DOI: 10.1016/j.crad.2023.10.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/20/2023] [Accepted: 10/26/2023] [Indexed: 12/04/2023]

Nobel JM, Puts S, Krdzalic J, Zegers KML, Lobbes MBI, F Robben SG, Dekker ALAJ. Natural Language Processing Algorithm Used for Staging Pulmonary Oncology from Free-Text Radiological Reports: "Including PET-CT and Validation Towards Clinical Use". JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024;37:3-12. [PMID: 38343237 PMCID: PMC10976919 DOI: 10.1007/s10278-023-00913-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 08/26/2023] [Accepted: 09/03/2023] [Indexed: 03/02/2024]

Reichenpfader D, Müller H, Denecke K. Large language model-based information extraction from free-text radiology reports: a scoping review protocol. BMJ Open 2023;13:e076865. [PMID: 38070902 PMCID: PMC10729196 DOI: 10.1136/bmjopen-2023-076865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 11/21/2023] [Indexed: 12/18/2023] Open

Liu F, Zhu T, Wu X, Yang B, You C, Wang C, Lu L, Liu Z, Zheng Y, Sun X, Yang Y, Clifton L, Clifton DA. A medical multimodal large language model for future pandemics. NPJ Digit Med 2023;6:226. [PMID: 38042919 PMCID: PMC10693607 DOI: 10.1038/s41746-023-00952-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 10/24/2023] [Indexed: 12/04/2023] Open