1
|
Li T, Zhang Y, Su D, Liu M, Ge M, Chen L, Li C, Tang J. Knowledge Graph-Based Few-Shot Learning for Label of Medical Imaging Reports. Acad Radiol 2025:S1076-6332(25)00189-8. [PMID: 40140273 DOI: 10.1016/j.acra.2025.02.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 02/23/2025] [Accepted: 02/25/2025] [Indexed: 03/28/2025]
Abstract
BACKGROUND The application of artificial intelligence (AI) in the field of automatic imaging report labeling faces the challenge of manually labeling large datasets. PURPOSE To propose a data augmentation method by using knowledge graph (KG) and few-shot learning. METHODS A KG of lumbar spine X-ray images was constructed, and 2000 data were annotated based on the KG, which were divided into training, validation, and test sets in a ratio of 7:2:1. The training dataset was augmented based on the synonym/replacement attributes of the KG and was the augmented data was input into the BERT (Bidirectional Encoder Representations from Transformers) model for automatic annotation training. The performance of the model under different augmentation ratios (1:10, 1:100, 1:1000) and augmentation methods (synonyms only, replacements only, combination of synonyms and replacements) was evaluated using the precision and F1 scores. In addition, with the augmentation ratio was fixed, iterative experiments were performed by supplementing the data of nodes that perform poorly in the validation set to further improve model's performance. RESULTS Prior to data augmentation, the precision was 0.728 and the F1 score was 0.666. By adjusting the augmentation ratio, the precision increased from 0.912 at a 1:10 augmentation ratio to 0.932 at a 1:100 augmentation ratio (P<.05), while F1 score improved from 0.853 at a 1:10 augmentation ratio to 0.881 at a 1:100 augmentation ratio (P<.05). Additionally, the effectiveness of various augmentation methods was compared at a 1:100 augmentation ratio. The augmentation method that combined synonyms and replacements (F1=0.881) was superior to the methods that only used synonyms (F1=0.815) and only used replacements (F1=0.753) (P<.05). For nodes that exhibited suboptimal performance on the validation set, supplementing the training set with target data improved model performance, increasing the average F1 score to 0.979 (P<.05). CONCLUSION Based on the KG, this study trained an automatic labeling model of radiology reports using a few-shot data set. This method effectively reduces the workload of manual labeling, improves the efficiency and accuracy of image data labeling, and provides an important research strategy for the application of AI in the domain of automatic labeling of image reports.
Collapse
Affiliation(s)
- Tiancheng Li
- The First Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei 230032, China (T.L., D.S., J.T.); Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China (T.L., D.S., C.L., J.T.)
| | - Yuxuan Zhang
- College of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, China (Y.Z., M.G., L.C., C.L.)
| | - Deyu Su
- The First Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei 230032, China (T.L., D.S., J.T.); Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China (T.L., D.S., C.L., J.T.)
| | - Ming Liu
- College of Artificial Intelligence, Anhui University, Hefei, China (M.L.)
| | - Mingxin Ge
- College of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, China (Y.Z., M.G., L.C., C.L.)
| | - Linyu Chen
- College of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, China (Y.Z., M.G., L.C., C.L.)
| | - Chuanfu Li
- College of Medical Information Engineering, Anhui University of Traditional Chinese Medicine, Hefei, China (Y.Z., M.G., L.C., C.L.); First Clinical Medical College, Anhui University of Traditional Chinese Medicine, Hefei, China (C.L.); Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China (T.L., D.S., C.L., J.T.)
| | - Jin Tang
- The First Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei 230032, China (T.L., D.S., J.T.); Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China (T.L., D.S., C.L., J.T.).
| |
Collapse
|
2
|
Breitwieser M, Moore V, Wiesner T, Wichlas F, Deininger C. NLP-Driven Analysis of Pneumothorax Incidence Following Central Venous Catheter Procedures: A Data-Driven Re-Evaluation of Routine Imaging in Value-Based Medicine. Diagnostics (Basel) 2024; 14:2792. [PMID: 39767153 PMCID: PMC11674588 DOI: 10.3390/diagnostics14242792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 11/14/2024] [Accepted: 12/10/2024] [Indexed: 01/11/2025] Open
Abstract
Background: This study presents a systematic approach using a natural language processing (NLP) algorithm to assess the necessity of routine imaging after central venous catheter (CVC) placement and removal. With pneumothorax being a key complication of CVC procedures, this research aims to provide evidence-based recommendations for optimizing imaging protocols and minimizing unnecessary imaging risks. Methods: We analyzed electronic health records from four university hospitals in Salzburg, Austria, focusing on X-rays performed between 2012 and 2021 following CVC procedures. A custom-built NLP algorithm identified cases of pneumothorax from radiologists' reports and clinician requests, while excluding cases with contraindications such as chest injuries, prior pneumothorax, or missing data. Chi-square tests were used to compare pneumothorax rates between CVC insertion and removal, and multivariate logistic regression identified risk factors, with a focus on age and gender. Results: This study analyzed 17,175 cases of patients aged 18 and older, with 95.4% involving CVC insertion and 4.6% involving CVC removal. Pneumothorax was observed in 106 cases post-insertion (1.3%) and in 3 cases post-removal (0.02%), with no statistically significant difference between procedures (p = 0.5025). The NLP algorithm achieved an accuracy of 93%, with a sensitivity of 97.9%, a specificity of 87.9%, and an area under the ROC curve (AUC) of 0.9283. Conclusions: The findings indicate no significant difference in pneumothorax incidence between CVC insertion and removal, supporting existing recommendations against routine imaging post-removal for asymptomatic patients and suggesting that routine imaging after CVC insertion may also be unnecessary in similar cases. This study demonstrates how advanced NLP techniques can support value-based medicine by enhancing clinical decision making and optimizing resources.
Collapse
Affiliation(s)
- Martin Breitwieser
- Department for Orthopedic Surgery and Traumatology, Paracelsus Medical University, 5020 Salzburg, Austria; (V.M.); (F.W.); (C.D.)
| | | | | | | | | |
Collapse
|
3
|
Zhang L, Liu M, Wang L, Zhang Y, Xu X, Pan Z, Feng Y, Zhao J, Zhang L, Yao G, Chen X, Xie X. Constructing a Large Language Model to Generate Impressions from Findings in Radiology Reports. Radiology 2024; 312:e240885. [PMID: 39287525 DOI: 10.1148/radiol.240885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
Background The specialization and complexity of radiology makes the automatic generation of radiologic impressions (ie, a diagnosis with differential diagnosis and management recommendations) challenging. Purpose To develop a large language model (LLM) that generates impressions based on imaging findings and to evaluate its performance in professional and linguistic dimensions. Materials and Methods Six radiologists recorded imaging examination findings from August 2 to 31, 2023, at Shanghai General Hospital and used the developed LLM before routinely writing report impressions for multiple radiologic modalities (CT, MRI, radiography, mammography) and anatomic sites (cranium and face, neck, chest, upper abdomen, lower abdomen, vessels, bone and joint, spine, breast), making necessary corrections and completing the radiologic impression. A subset was defined to investigate cases where the LLM-generated impressions differed from the final radiologist impressions by excluding identical and highly similar cases. An expert panel scored the LLM-generated impressions on a five-point Likert scale (5 = strongly agree) based on scientific terminology, coherence, specific diagnosis, differential diagnosis, management recommendations, correctness, comprehensiveness, harmlessness, and lack of bias. Results In this retrospective study, an LLM was pretrained using 20 GB of medical and general-purpose text data. The fine-tuning data set comprised 1.5 GB of data, including 800 radiology reports with paired instructions (describing the output task in natural language) and outputs. Test set 2 included data from 3988 patients (median age, 56 years [IQR, 40-68 years]; 2159 male). The median recall, precision, and F1 score of LLM-generated impressions were 0.775 (IQR, 0.56-1), 0.84 (IQR, 0.611-1), and 0.772 (IQR, 0.578-0.957), respectively, using the final impressions as the reference standard. In a subset of 1014 patients (median age, 57 years [IQR, 42-69 years]; 528 male), the overall median expert panel score for LLM-generated impressions was 5 (IQR, 5-5), ranging from 4 (IQR, 3-5) to 5 (IQR, 5-5). Conclusion The developed LLM generated radiologic impressions that were professionally and linguistically appropriate for a full spectrum of radiology examinations. © RSNA, 2024 Supplemental material is available for this article.
Collapse
Affiliation(s)
- Lu Zhang
- From the Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (Lu Zhang, L.W., Y.Z., Y.F., J.Z., Lin Zhang, G.Y., X. Xie); Winning Health Technology, Shanghai, China (M.L., X. Xu, Z.P., X.C.); and Department of Radiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Yan Chang Zhong Rd 301, Shanghai 200040, China (X. Xie)
| | - Mingqian Liu
- From the Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (Lu Zhang, L.W., Y.Z., Y.F., J.Z., Lin Zhang, G.Y., X. Xie); Winning Health Technology, Shanghai, China (M.L., X. Xu, Z.P., X.C.); and Department of Radiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Yan Chang Zhong Rd 301, Shanghai 200040, China (X. Xie)
| | - Lingyun Wang
- From the Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (Lu Zhang, L.W., Y.Z., Y.F., J.Z., Lin Zhang, G.Y., X. Xie); Winning Health Technology, Shanghai, China (M.L., X. Xu, Z.P., X.C.); and Department of Radiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Yan Chang Zhong Rd 301, Shanghai 200040, China (X. Xie)
| | - Yaping Zhang
- From the Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (Lu Zhang, L.W., Y.Z., Y.F., J.Z., Lin Zhang, G.Y., X. Xie); Winning Health Technology, Shanghai, China (M.L., X. Xu, Z.P., X.C.); and Department of Radiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Yan Chang Zhong Rd 301, Shanghai 200040, China (X. Xie)
| | - Xiangjun Xu
- From the Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (Lu Zhang, L.W., Y.Z., Y.F., J.Z., Lin Zhang, G.Y., X. Xie); Winning Health Technology, Shanghai, China (M.L., X. Xu, Z.P., X.C.); and Department of Radiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Yan Chang Zhong Rd 301, Shanghai 200040, China (X. Xie)
| | - Zhijun Pan
- From the Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (Lu Zhang, L.W., Y.Z., Y.F., J.Z., Lin Zhang, G.Y., X. Xie); Winning Health Technology, Shanghai, China (M.L., X. Xu, Z.P., X.C.); and Department of Radiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Yan Chang Zhong Rd 301, Shanghai 200040, China (X. Xie)
| | - Yan Feng
- From the Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (Lu Zhang, L.W., Y.Z., Y.F., J.Z., Lin Zhang, G.Y., X. Xie); Winning Health Technology, Shanghai, China (M.L., X. Xu, Z.P., X.C.); and Department of Radiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Yan Chang Zhong Rd 301, Shanghai 200040, China (X. Xie)
| | - Jue Zhao
- From the Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (Lu Zhang, L.W., Y.Z., Y.F., J.Z., Lin Zhang, G.Y., X. Xie); Winning Health Technology, Shanghai, China (M.L., X. Xu, Z.P., X.C.); and Department of Radiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Yan Chang Zhong Rd 301, Shanghai 200040, China (X. Xie)
| | - Lin Zhang
- From the Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (Lu Zhang, L.W., Y.Z., Y.F., J.Z., Lin Zhang, G.Y., X. Xie); Winning Health Technology, Shanghai, China (M.L., X. Xu, Z.P., X.C.); and Department of Radiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Yan Chang Zhong Rd 301, Shanghai 200040, China (X. Xie)
| | - Gehong Yao
- From the Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (Lu Zhang, L.W., Y.Z., Y.F., J.Z., Lin Zhang, G.Y., X. Xie); Winning Health Technology, Shanghai, China (M.L., X. Xu, Z.P., X.C.); and Department of Radiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Yan Chang Zhong Rd 301, Shanghai 200040, China (X. Xie)
| | - Xu Chen
- From the Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (Lu Zhang, L.W., Y.Z., Y.F., J.Z., Lin Zhang, G.Y., X. Xie); Winning Health Technology, Shanghai, China (M.L., X. Xu, Z.P., X.C.); and Department of Radiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Yan Chang Zhong Rd 301, Shanghai 200040, China (X. Xie)
| | - Xueqian Xie
- From the Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (Lu Zhang, L.W., Y.Z., Y.F., J.Z., Lin Zhang, G.Y., X. Xie); Winning Health Technology, Shanghai, China (M.L., X. Xu, Z.P., X.C.); and Department of Radiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Yan Chang Zhong Rd 301, Shanghai 200040, China (X. Xie)
| |
Collapse
|
4
|
Zhang Y, Feng Y, Sun J, Zhang L, Ding Z, Wang L, Zhao K, Pan Z, Li Q, Guo N, Xie X. Fully automated artificial intelligence-based coronary CT angiography image processing: efficiency, diagnostic capability, and risk stratification. Eur Radiol 2024; 34:4909-4919. [PMID: 38193925 DOI: 10.1007/s00330-023-10494-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/10/2023] [Accepted: 10/16/2023] [Indexed: 01/10/2024]
Abstract
OBJECTIVES To prospectively investigate whether fully automated artificial intelligence (FAAI)-based coronary CT angiography (CCTA) image processing is non-inferior to semi-automated mode in efficiency, diagnostic ability, and risk stratification of coronary artery disease (CAD). MATERIALS AND METHODS Adults with indications for CCTA were prospectively and consecutively enrolled at two hospitals and randomly assigned to either FAAI-based or semi-automated image processing using equipment workstations. Outcome measures were workflow efficiency, diagnostic accuracy for obstructive CAD (≥ 50% stenosis), and cardiovascular events at 2-year follow-up. The endpoints included major adverse cardiovascular events, hospitalization for unstable angina, and recurrence of cardiac symptoms. The non-inferiority margin was 3 percentage difference in diagnostic accuracy and C-index. RESULTS In total, 1801 subjects (62.7 ± 11.1 years) were included, of whom 893 and 908 were assigned to the FAAI-based and semi-automated modes, respectively. Image processing times were 121.0 ± 18.6 and 433.5 ± 68.4 s, respectively (p <0.001). Scan-to-report release times were 6.4 ± 2.7 and 10.5 ± 3.8 h, respectively (p < 0.001). Of all subjects, 152 and 159 in the FAAI-based and semi-automated modes, respectively, subsequently underwent invasive coronary angiography. The diagnostic accuracies for obstructive CAD were 94.7% (89.9-97.7%) and 94.3% (89.5-97.4%), respectively (difference 0.4%). Of all subjects, 779 and 784 in the FAAI-based and semi-automated modes were followed for 589 ± 182 days, respectively, and the C-statistic for cardiovascular events were 0.75 (0.67 to 0.83) and 0.74 (0.66 to 0.82) (difference 1%). CONCLUSIONS FAAI-based CCTA image processing significantly improves workflow efficiency than semi-automated mode, and is non-inferior in diagnosing obstructive CAD and risk stratification for cardiovascular events. CLINICAL RELEVANCE STATEMENT Conventional coronary CT angiography image processing is semi-automated. This observation shows that fully automated artificial intelligence-based image processing greatly improves efficiency, and maintains high diagnostic accuracy and the effectiveness in stratifying patients for cardiovascular events. KEY POINTS • Coronary CT angiography (CCTA) relies heavily on high-quality and fast image processing. • Full-automation CCTA image processing is clinically non-inferior to the semi-automated mode. • Full automation can facilitate the application of CCTA in early detection of coronary artery disease.
Collapse
Affiliation(s)
- Yaping Zhang
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Haining Rd.100, Shanghai, 200080, China
| | - Yan Feng
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Haining Rd.100, Shanghai, 200080, China
| | - Jianqing Sun
- Shukun (Beijing) Technology Co, Ltd, Jinhui Bd, Qiyang Rd, Beijing, 100102, China
| | - Lu Zhang
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Haining Rd.100, Shanghai, 200080, China
| | - Zhenhong Ding
- Shukun (Beijing) Technology Co, Ltd, Jinhui Bd, Qiyang Rd, Beijing, 100102, China
| | - Lingyun Wang
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Haining Rd.100, Shanghai, 200080, China
| | - Keke Zhao
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Haining Rd.100, Shanghai, 200080, China
| | - Zhijie Pan
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Haining Rd.100, Shanghai, 200080, China
| | - Qingyao Li
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Haining Rd.100, Shanghai, 200080, China
- Radiology Department, Shanghai General Hospital, University of Shanghai for Science and Technology, Haining Rd.100, Shanghai, 200080, China
| | - Ning Guo
- Shukun (Beijing) Technology Co, Ltd, Jinhui Bd, Qiyang Rd, Beijing, 100102, China
| | - Xueqian Xie
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Haining Rd.100, Shanghai, 200080, China.
| |
Collapse
|
5
|
Gorenstein L, Konen E, Green M, Klang E. Bidirectional Encoder Representations from Transformers in Radiology: A Systematic Review of Natural Language Processing Applications. J Am Coll Radiol 2024; 21:914-941. [PMID: 38302036 DOI: 10.1016/j.jacr.2024.01.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/13/2024] [Accepted: 01/26/2024] [Indexed: 02/03/2024]
Abstract
INTRODUCTION Bidirectional Encoder Representations from Transformers (BERT), introduced in 2018, has revolutionized natural language processing. Its bidirectional understanding of word context has enabled innovative applications, notably in radiology. This study aimed to assess BERT's influence and applications within the radiologic domain. METHODS Adhering to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a systematic review, searching PubMed for literature on BERT-based models and natural language processing in radiology from January 1, 2018, to February 12, 2023. The search encompassed keywords related to generative models, transformer architecture, and various imaging techniques. RESULTS Of 597 results, 30 met our inclusion criteria. The remaining were unrelated to radiology or did not use BERT-based models. The included studies were retrospective, with 14 published in 2022. The primary focus was on classification and information extraction from radiology reports, with x-rays as the prevalent imaging modality. Specific investigations included automatic CT protocol assignment and deep learning applications in chest x-ray interpretation. CONCLUSION This review underscores the primary application of BERT in radiology for report classification. It also reveals emerging BERT applications for protocol assignment and report generation. As BERT technology advances, we foresee further innovative applications. Its implementation in radiology holds potential for enhancing diagnostic precision, expediting report generation, and optimizing patient care.
Collapse
Affiliation(s)
- Larisa Gorenstein
- Department of Diagnostic Imaging, Sheba Medical Center, Ramat-Gan, Israel; Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Eli Konen
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel; Chair, Department of Diagnostic Imaging, Sheba Medical Center, Ramat-Gan, Israel
| | - Michael Green
- Department of Diagnostic Imaging, Sheba Medical Center, Ramat-Gan, Israel; Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eyal Klang
- Icahn School of Medicine at Mount Sinai, New York, New York; and Associate Professor of Radiology, Innovation Center, Sheba Medical Center, Affiliated with Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
6
|
Dong T, Sunderland N, Nightingale A, Fudulu DP, Chan J, Zhai B, Freitas A, Caputo M, Dimagli A, Mires S, Wyatt M, Benedetto U, Angelini GD. Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database. Bioengineering (Basel) 2023; 10:1307. [PMID: 38002431 PMCID: PMC10669818 DOI: 10.3390/bioengineering10111307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/03/2023] [Accepted: 11/09/2023] [Indexed: 11/26/2023] Open
Abstract
BACKGROUND Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis. OBJECTIVES To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use. METHODS 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated. RESULTS Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, p < 0.05) alongside high R2 values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75-0.9, p < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E' Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance. CONCLUSIONS The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.
Collapse
Affiliation(s)
- Tim Dong
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Nicholas Sunderland
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Angus Nightingale
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Daniel P. Fudulu
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Jeremy Chan
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Ben Zhai
- School of Computing Science, Northumbria University, Newcastle upon Tyne NE1 8ST, UK
| | - Alberto Freitas
- Faculty of Medicine, University of Porto, 4100 Porto, Portugal;
| | - Massimo Caputo
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Arnaldo Dimagli
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Stuart Mires
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Mike Wyatt
- University Hospitals Bristol and Weston, Marlborough St, Bristol BS1 3NU, UK;
| | - Umberto Benedetto
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Gianni D. Angelini
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| |
Collapse
|
7
|
Chng SY, Tern PJW, Kan MRX, Cheng LTE. Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods. HEALTH CARE SCIENCE 2023; 2:120-128. [PMID: 38938764 PMCID: PMC11080679 DOI: 10.1002/hcs2.40] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 01/31/2023] [Accepted: 02/23/2023] [Indexed: 06/29/2024]
Abstract
Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models. This paper explains the necessary data preprocessing steps, reviews the main methods for automated labelling and compares their performance. There are four main methods of automated labelling, namely: (1) rules-based text-matching algorithms, (2) conventional machine learning models, (3) neural network models and (4) Bidirectional Encoder Representations from Transformers (BERT) models. Rules-based labellers perform a brute force search against manually curated keywords and are able to achieve high F1 scores. However, they require proper handling of negative words. Machine learning models require preprocessing that involves tokenization and vectorization of text into numerical vectors. Multilabel classification approaches are required in labelling radiology reports and conventional models can achieve good performance if they have large enough training sets. Deep learning models make use of connected neural networks, often a long short-term memory network, and are similarly able to achieve good performance if trained on a large data set. BERT is a transformer-based model that utilizes attention. Pretrained BERT models only require fine-tuning with small data sets. In particular, domain-specific BERT models can achieve superior performance compared with the other methods for automated labelling.
Collapse
Affiliation(s)
- Seo Yi Chng
- Department of PaediatricsNational University of SingaporeSingaporeSingapore
| | - Paul J. W. Tern
- Department of CardiologyNational Heart CentreSingaporeSingapore
| | | | - Lionel T. E. Cheng
- Department of Diagnostic RadiologySingapore General HospitalSingaporeSingapore
| |
Collapse
|
8
|
Implementation of artificial intelligence in thoracic imaging-a what, how, and why guide from the European Society of Thoracic Imaging (ESTI). Eur Radiol 2023:10.1007/s00330-023-09409-2. [PMID: 36729173 PMCID: PMC9892666 DOI: 10.1007/s00330-023-09409-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 11/29/2022] [Accepted: 12/27/2022] [Indexed: 02/03/2023]
Abstract
This statement from the European Society of Thoracic imaging (ESTI) explains and summarises the essentials for understanding and implementing Artificial intelligence (AI) in clinical practice in thoracic radiology departments. This document discusses the current AI scientific evidence in thoracic imaging, its potential clinical utility, implementation and costs, training requirements and validation, its' effect on the training of new radiologists, post-implementation issues, and medico-legal and ethical issues. All these issues have to be addressed and overcome, for AI to become implemented clinically in thoracic radiology. KEY POINTS: • Assessing the datasets used for training and validation of the AI system is essential. • A departmental strategy and business plan which includes continuing quality assurance of AI system and a sustainable financial plan is important for successful implementation. • Awareness of the negative effect on training of new radiologists is vital.
Collapse
|
9
|
Zhang Y, Liu M, Zhang L, Wang L, Zhao K, Hu S, Chen X, Xie X. Comparison of Chest Radiograph Captions Based on Natural Language Processing vs Completed by Radiologists. JAMA Netw Open 2023; 6:e2255113. [PMID: 36753278 PMCID: PMC9909497 DOI: 10.1001/jamanetworkopen.2022.55113] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 12/19/2022] [Indexed: 02/09/2023] Open
Abstract
IMPORTANCE Artificial intelligence (AI) can interpret abnormal signs in chest radiography (CXR) and generate captions, but a prospective study is needed to examine its practical value. OBJECTIVE To prospectively compare natural language processing (NLP)-generated CXR captions and the diagnostic findings of radiologists. DESIGN, SETTING, AND PARTICIPANTS A multicenter diagnostic study was conducted. The training data set included CXR images and reports retrospectively collected from February 1, 2014, to February 28, 2018. The retrospective test data set included consecutive images and reports from April 1 to July 31, 2019. The prospective test data set included consecutive images and reports from May 1 to September 30, 2021. EXPOSURES A bidirectional encoder representation from a transformers model was used to extract language entities and relationships from unstructured CXR reports to establish 23 labels of abnormal signs to train convolutional neural networks. The participants in the prospective test group were randomly assigned to 1 of 3 different caption generation models: a normal template, NLP-generated captions, and rule-based captions based on convolutional neural networks. For each case, a resident drafted the report based on the randomly assigned captions and an experienced radiologist finalized the report blinded to the original captions. A total of 21 residents and 19 radiologists were involved. MAIN OUTCOMES AND MEASURES Time to write reports based on different caption generation models. RESULTS The training data set consisted of 74 082 cases (39 254 [53.0%] women; mean [SD] age, 50.0 [17.1] years). In the retrospective (n = 8126; 4345 [53.5%] women; mean [SD] age, 47.9 [15.9] years) and prospective (n = 5091; 2416 [47.5%] women; mean [SD] age, 45.1 [15.6] years) test data sets, the mean (SD) area under the curve of abnormal signs was 0.87 (0.11) in the retrospective data set and 0.84 (0.09) in the prospective data set. The residents' mean (SD) reporting time using the NLP-generated model was 283 (37) seconds-significantly shorter than the normal template (347 [58] seconds; P < .001) and the rule-based model (296 [46] seconds; P < .001). The NLP-generated captions showed the highest similarity to the final reports with a mean (SD) bilingual evaluation understudy score of 0.69 (0.24)-significantly higher than the normal template (0.37 [0.09]; P < .001) and the rule-based model (0.57 [0.19]; P < .001). CONCLUSIONS AND RELEVANCE In this diagnostic study of NLP-generated CXR captions, prior information provided by NLP was associated with greater efficiency in the reporting process, while maintaining good consistency with the findings of radiologists.
Collapse
Affiliation(s)
- Yaping Zhang
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | | | - Lu Zhang
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Lingyun Wang
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Keke Zhao
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shundong Hu
- Radiology Department, Shanghai Sixth People Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xu Chen
- Winning Health Technology Ltd, Shanghai, China
| | - Xueqian Xie
- Radiology Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
10
|
Torres-Lopez VM, Rovenolt GE, Olcese AJ, Garcia GE, Chacko SM, Robinson A, Gaiser E, Acosta J, Herman AL, Kuohn LR, Leary M, Soto AL, Zhang Q, Fatima S, Falcone GJ, Payabvash MS, Sharma R, Struck AF, Sheth KN, Westover MB, Kim JA. Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports. JAMA Netw Open 2022; 5:e2227109. [PMID: 35972739 PMCID: PMC9382443 DOI: 10.1001/jamanetworkopen.2022.27109] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 06/20/2022] [Indexed: 12/17/2022] Open
Abstract
Importance Clinical text reports from head computed tomography (CT) represent rich, incompletely utilized information regarding acute brain injuries and neurologic outcomes. CT reports are unstructured; thus, extracting information at scale requires automated natural language processing (NLP). However, designing new NLP algorithms for each individual injury category is an unwieldy proposition. An NLP tool that summarizes all injuries in head CT reports would facilitate exploration of large data sets for clinical significance of neuroradiological findings. Objective To automatically extract acute brain pathological data and their features from head CT reports. Design, Setting, and Participants This diagnostic study developed a 2-part named entity recognition (NER) NLP model to extract and summarize data on acute brain injuries from head CT reports. The model, termed BrainNERD, extracts and summarizes detailed brain injury information for research applications. Model development included building and comparing 2 NER models using a custom dictionary of terms, including lesion type, location, size, and age, then designing a rule-based decoder using NER outputs to evaluate for the presence or absence of injury subtypes. BrainNERD was evaluated against independent test data sets of manually classified reports, including 2 external validation sets. The model was trained on head CT reports from 1152 patients generated by neuroradiologists at the Yale Acute Brain Injury Biorepository. External validation was conducted using reports from 2 outside institutions. Analyses were conducted from May 2020 to December 2021. Main Outcomes and Measures Performance of the BrainNERD model was evaluated using precision, recall, and F1 scores based on manually labeled independent test data sets. Results A total of 1152 patients (mean [SD] age, 67.6 [16.1] years; 586 [52%] men), were included in the training set. NER training using transformer architecture and bidirectional encoder representations from transformers was significantly faster than spaCy. For all metrics, the 10-fold cross-validation performance was 93% to 99%. The final test performance metrics for the NER test data set were 98.82% (95% CI, 98.37%-98.93%) for precision, 98.81% (95% CI, 98.46%-99.06%) for recall, and 98.81% (95% CI, 98.40%-98.94%) for the F score. The expert review comparison metrics were 99.06% (95% CI, 97.89%-99.13%) for precision, 98.10% (95% CI, 97.93%-98.77%) for recall, and 98.57% (95% CI, 97.78%-99.10%) for the F score. The decoder test set metrics were 96.06% (95% CI, 95.01%-97.16%) for precision, 96.42% (95% CI, 94.50%-97.87%) for recall, and 96.18% (95% CI, 95.151%-97.16%) for the F score. Performance in external institution report validation including 1053 head CR reports was greater than 96%. Conclusions and Relevance These findings suggest that the BrainNERD model accurately extracted acute brain injury terms and their properties from head CT text reports. This freely available new tool could advance clinical research by integrating information in easily gathered head CT reports to expand knowledge of acute brain injury radiographic phenotypes.
Collapse
Affiliation(s)
| | | | - Angelo J. Olcese
- Department of Neurology, Yale University, New Haven, Connecticut
| | | | - Sarah M. Chacko
- Department of Neurology, Yale University, New Haven, Connecticut
| | - Amber Robinson
- Department of Neurology, Yale University, New Haven, Connecticut
| | - Edward Gaiser
- Department of Neurology, Yale University, New Haven, Connecticut
| | - Julian Acosta
- Department of Neurology, Yale University, New Haven, Connecticut
| | - Alison L. Herman
- Department of Neurology, Yale University, New Haven, Connecticut
| | - Lindsey R. Kuohn
- Department of Neurology, Yale University, New Haven, Connecticut
| | - Megan Leary
- Department of Neurology, Yale University, New Haven, Connecticut
| | | | - Qiang Zhang
- Department of Neurology, Yale University, New Haven, Connecticut
| | - Safoora Fatima
- Department of Neurology, University of Wisconsin, Madison
| | - Guido J. Falcone
- Department of Neurology, Yale University, New Haven, Connecticut
| | | | - Richa Sharma
- Department of Neurology, Yale University, New Haven, Connecticut
| | - Aaron F. Struck
- Department of Neurology, University of Wisconsin, Madison
- William S Middleton Veterans Hospital, Madison, Wisconsin
| | - Kevin N. Sheth
- Department of Neurology, Yale University, New Haven, Connecticut
| | | | - Jennifer A. Kim
- Department of Neurology, Yale University, New Haven, Connecticut
| |
Collapse
|