1
|
Das A, Talati IA, Chaves JMZ, Rubin D, Banerjee I. Weakly supervised language models for automated extraction of critical findings from radiology reports. NPJ Digit Med 2025; 8:257. [PMID: 40341617 PMCID: PMC12062347 DOI: 10.1038/s41746-025-01522-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 02/17/2025] [Indexed: 05/10/2025] Open
Abstract
Critical findings in radiology reports are life threatening conditions that need to be communicated promptly to physicians for timely management of patients. Although challenging, advancements in natural language processing (NLP), particularly large language models (LLMs), now enable the automated identification of key findings from verbose reports. Given the scarcity of labeled critical findings data, we implemented a two-phase, weakly supervised fine-tuning approach on 15,000 unlabeled Mayo Clinic reports. This fine-tuned model then automatically extracted critical terms on internal (Mayo Clinic, n = 80) and external (MIMIC-III, n = 123) test datasets, validated against expert annotations. Model performance was further assessed on 5000 MIMIC-IV reports using LLM-aided metrics, G-eval and Prometheus. Both manual and LLM-based evaluations showed improved task alignment with weak supervision. The pipeline and model, publicly available under an academic license, can aid in critical finding extraction for research and clinical use ( https://github.com/dasavisha/CriticalFindings_Extract ).
Collapse
Affiliation(s)
- Avisha Das
- Arizona Advanced AI & Innovation (A3I) Hub, Mayo Clinic Arizona, Phoenix, AZ, USA
| | - Ish A Talati
- Department of Radiology, Stanford University, Stanford, CA, USA
| | | | - Daniel Rubin
- Department of Radiology, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Imon Banerjee
- Arizona Advanced AI & Innovation (A3I) Hub, Mayo Clinic Arizona, Phoenix, AZ, USA.
- Department of Radiology, Mayo Clinic Arizona, Phoenix, AZ, USA.
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, USA.
| |
Collapse
|
2
|
Guo Y, Shi H, Book WM, Ivey LC, Rodriguez FH, Sameni R, Raskind-Hood C, Robichaux C, Downing KF, Sarker A. Machine Learning and Natural Language Processing to Improve Classification of Atrial Septal Defects in Electronic Health Records. Birth Defects Res 2025; 117:e2451. [PMID: 40035168 PMCID: PMC11955907 DOI: 10.1002/bdr2.2451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 01/20/2025] [Accepted: 02/02/2025] [Indexed: 03/05/2025]
Abstract
BACKGROUND International Classification of Disease (ICD) codes can accurately identify patients with certain congenital heart defects (CHDs). In ICD-defined CHD data sets, the code for secundum atrial septal defect (ASD) is the most common, but it has a low positive predictive value for CHD, potentially resulting in the drawing of erroneous conclusions from such data sets. Methods with reduced false positive rates for CHD among individuals captured with the ASD ICD code are needed for public health surveillance. METHODS We propose a two-level classification system, which includes a CHD and an ASD classification model, to categorize cases with an ASD ICD code into three groups: ASD, other CHD, or no CHD (including patent foramen ovale). In the proposed approach, a machine learning model that leverages structured data is combined with a text classification system. We compare performances for three text classification strategies: support vector machines (SVMs) using text-based features, a robustly optimized Transformer-based model (RoBERTa), and a scalable tree boosting system using non-text-based features (XGBoost). RESULTS Using SVM for both CHD and ASD resulted in the best performance for the ASD and no CHD group, achieving F1 scores of 0.53 (±0.05) and 0.78 (±0.02), respectively. XGBoost for CHD and SVM for ASD classification performed best for the other CHD group (F1 score: 0.39 [±0.03]). CONCLUSIONS This study demonstrates that it is feasible to use patients' clinical notes and machine learning to perform more fine-grained classification compared to ICD codes, particularly with higher PPV for CHD. The proposed approach can improve CHD surveillance.
Collapse
Affiliation(s)
- Yuting Guo
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia, USA
| | - Haoming Shi
- Department of Biomedical Engineering, Georgia Institute Technology, Atlanta, Georgia, USA
| | - Wendy M. Book
- Department of Cardiology, School of Medicine, Emory University, Atlanta, Georgia, USA
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
| | - Lindsey Carrie Ivey
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
| | - Fred H. Rodriguez
- Department of Cardiology, School of Medicine, Emory University, Atlanta, Georgia, USA
| | - Reza Sameni
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia, USA
| | - Cheryl Raskind-Hood
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
| | - Chad Robichaux
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia, USA
| | - Karrie F. Downing
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Abeed Sarker
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia, USA
| |
Collapse
|
3
|
Sugimoto K, Wada S, Konishi S, Sato J, Okada K, Kido S, Tomiyama N, Matsumura Y, Takeda T. Automated Detection of Cancer-Suspicious Findings in Japanese Radiology Reports with Natural Language Processing: A Multicenter Study. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-024-01338-w. [PMID: 39843717 DOI: 10.1007/s10278-024-01338-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 11/07/2024] [Accepted: 11/08/2024] [Indexed: 01/24/2025]
Abstract
Missed critical imaging findings, particularly those indicating cancer, are a common issue that can result in delays in patient follow-up and treatment. To address this, we developed a rule-based natural language processing (NLP) algorithm to detect cancer-suspicious findings from Japanese radiology reports. The dataset used consisted of chest and abdomen CT reports from six institutions. Reports from our institution were used for algorithm development and internal evaluation, while reports from the other five institutions were used for external evaluation. To create the gold standard, reports were annotated by two experienced physicians. Data were statistically analyzed using precision, recall and F1 score with 1000 bootstrap iterations. BERT was used as a baseline deep learning model, and its performance was compared with the proposed rule-based method. At the report level of detection, the overall precision, recall, and F-1 score were 0.886, 0.886, and 0.883, respectively, for the rule-based algorithm, which were higher than those of the deep learning algorithm (0.851, 0.679, and 0.733). The overall results include both internal and external validation data. For the internal validation set, the precision, recall, and F-1 score were 0.929, 0.929, and 0.927, respectively. For the external validation set, the precision, recall, and F-1 score were 0.875, 0.879, and 0.873, demonstrating generalizability. In conclusion, we show the rule-based NLP algorithm exhibited a high performance in detecting cancer-suspicious findings from multi-institutional CT reports.
Collapse
Affiliation(s)
- Kento Sugimoto
- Department of Medical Informatics, Osaka University Graduate School of Medicine, 2-2 Yamadaoka, Suita, 565-0871, Osaka, Japan.
| | - Shoya Wada
- Department of Medical Informatics, Osaka University Graduate School of Medicine, 2-2 Yamadaoka, Suita, 565-0871, Osaka, Japan
- Department of Transformative System for Medical Information, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, 565-0871, Osaka, Japan
| | - Shozo Konishi
- Department of Medical Informatics, Osaka University Graduate School of Medicine, 2-2 Yamadaoka, Suita, 565-0871, Osaka, Japan
| | - Junya Sato
- Department of Artificial Intelligence in Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, 565-0871, Osaka, Japan
| | - Katsuki Okada
- Department of Medical Informatics, Osaka University Graduate School of Medicine, 2-2 Yamadaoka, Suita, 565-0871, Osaka, Japan
| | - Shoji Kido
- Osaka University Institute for Radiation Sciences, 2-2, Yamadaoka, Suita, 565-0871, Osaka, Japan
- Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, 565-0871, Osaka, Japan
| | - Noriyuki Tomiyama
- Department of Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, 565-0871, Osaka, Japan
| | - Yasushi Matsumura
- Department of Medical Informatics, Osaka University Graduate School of Medicine, 2-2 Yamadaoka, Suita, 565-0871, Osaka, Japan
- National Hospital Organization Osaka National Hospital, 2-1-14 Hoenzaka Chuo-ku, 540-0006, Osaka, Japan
| | - Toshihiro Takeda
- Department of Medical Informatics, Osaka University Graduate School of Medicine, 2-2 Yamadaoka, Suita, 565-0871, Osaka, Japan
| |
Collapse
|
4
|
Fathi M, Vakili K, Hajibeygi R, Bahrami A, Behzad S, Tafazolimoghadam A, Aghabozorgi H, Eshraghi R, Bhatt V, Gholamrezanezhad A. Cultivating diagnostic clarity: The importance of reporting artificial intelligence confidence levels in radiologic diagnoses. Clin Imaging 2025; 117:110356. [PMID: 39566394 DOI: 10.1016/j.clinimag.2024.110356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 11/01/2024] [Accepted: 11/09/2024] [Indexed: 11/22/2024]
Abstract
Accurate image interpretation is essential in the field of radiology to the healthcare team in order to provide optimal patient care. This article discusses the use of artificial intelligence (AI) confidence levels to enhance the accuracy and dependability of its radiological diagnoses. The current advances in AI technologies have changed how radiologists and clinicians make the diagnoses of pathological conditions such as aneurysms, hemorrhages, pneumothorax, pneumoperitoneum, and particularly fractures. To enhance the utility of these AI models, radiologists need a more comprehensive understanding of the model's levels of confidence and certainty behind the results they produce. This allows radiologists to make more informed decisions that have the potential to drastically change a patient's clinical management. Several AI models, especially those utilizing deep learning models (DL) with convolutional neural networks (CNNs), have demonstrated significant potential in identifying subtle findings in medical imaging that are often missed by radiologists. It is necessary to create standardized levels of confidence metrics in order for AI systems to be relevant and reliable in the clinical setting. Incorporating AI into clinical practice does have certain obstacles like the need for clinical validation, concerns regarding the interpretability of AI system results, and addressing confusion and misunderstandings within the medical community. This study emphasizes the importance of AI systems to clearly convey their level of confidence in radiological diagnosis. This paper highlights the importance of conducting research to establish AI confidence level metrics that are limited to a specific anatomical region or lesion type. KEY POINT OF THE VIEW: Accurate fracture diagnosis relies on radiologic certainty, where Artificial intelligence (AI), especially convolutional neural networks (CNNs) and deep learning (DL), shows promise in enhancing X-ray interpretation amidst a shortage of radiologists. Overcoming integration challenges through improved AI interpretability and education is crucial for widespread acceptance and better patient outcomes.
Collapse
Affiliation(s)
- Mobina Fathi
- Advanced Diagnostic and Interventional Radiology Research Center (ADIR), Tehran University of Medical Science, Tehran, Iran; School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Kimia Vakili
- School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ramtin Hajibeygi
- Advanced Diagnostic and Interventional Radiology Research Center (ADIR), Tehran University of Medical Science, Tehran, Iran; Tehran University of Medical Science (TUMS), School of Medicine, Tehran, Iran
| | - Ashkan Bahrami
- Faculty of Medicine, Kashan University of Medical Science, Kashan, Iran
| | - Shima Behzad
- Advanced Diagnostic and Interventional Radiology Research Center (ADIR), Tehran University of Medical Science, Tehran, Iran
| | | | - Hadiseh Aghabozorgi
- Student Research Committee, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | - Reza Eshraghi
- Faculty of Medicine, Kashan University of Medical Science, Kashan, Iran
| | - Vivek Bhatt
- University of California, Riverside, School of Medicine, Riverside, CA, United States of America
| | - Ali Gholamrezanezhad
- Keck School of Medicine of University of Southern California, Los Angeles, CA, United States of America; Department of Radiology, Cedars Sinai Hospital, Los Angeles, CA, United States of America.
| |
Collapse
|
5
|
Wataya T, Miura A, Sakisuka T, Fujiwara M, Tanaka H, Hiraoka Y, Sato J, Tomiyama M, Nishigaki D, Kita K, Suzuki Y, Kido S, Tomiyama N. Comparison of natural language processing algorithms in assessing the importance of head computed tomography reports written in Japanese. Jpn J Radiol 2024; 42:697-708. [PMID: 38551771 PMCID: PMC11217108 DOI: 10.1007/s11604-024-01549-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 02/18/2024] [Indexed: 07/03/2024]
Abstract
PURPOSE To propose a five-point scale for radiology report importance called Report Importance Category (RIC) and to compare the performance of natural language processing (NLP) algorithms in assessing RIC using head computed tomography (CT) reports written in Japanese. MATERIALS AND METHODS 3728 Japanese head CT reports performed at Osaka University Hospital in 2020 were included. RIC (category 0: no findings, category 1: minor findings, category 2: routine follow-up, category 3: careful follow-up, and category 4: examination or therapy) was established based not only on patient severity but also on the novelty of the information. The manual assessment of RIC for the reports was performed under the consensus of two out of four neuroradiologists. The performance of four NLP models for classifying RIC was compared using fivefold cross-validation: logistic regression, bidirectional long-short-term memory (BiLSTM), general bidirectional encoder representations of transformers (general BERT), and domain-specific BERT (BERT for medical domain). RESULTS The proportion of each RIC in the whole data set was 15.0%, 26.7%, 44.2%, 7.7%, and 6.4%, respectively. Domain-specific BERT showed the highest accuracy (0.8434 ± 0.0063) in assessing RIC and significantly higher AUC in categories 1 (0.9813 ± 0.0011), 2 (0.9492 ± 0.0045), 3 (0.9637 ± 0.0050), and 4 (0.9548 ± 0.0074) than the other models (p < .05). Analysis using layer-integrated gradients showed that the domain-specific BERT model could detect important words, such as disease names in reports. CONCLUSIONS Domain-specific BERT has superiority over the other models in assessing our newly proposed criteria called RIC of head CT radiology reports. The accumulation of similar and further studies of has a potential to contribute to medical safety by preventing missed important findings by clinicians.
Collapse
Affiliation(s)
- Tomohiro Wataya
- Department of Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
- Department of Artificial Intelligence Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Azusa Miura
- Department of Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Takahisa Sakisuka
- Department of Diagnostic Imaging, Osaka General Medical Center, 3-1-56. Mandai Higashi, Sumiyoshi, Osaka, 558-8558, Japan
| | - Masahiro Fujiwara
- Department of Diagnostic Radiology, Sakai City Medical Center, 1-1-1, Ebaracho, Sakai, Osaka, 593-8304, Japan
| | - Hisashi Tanaka
- Division of Health Science, Osaka University Graduate School of Medicine, 1-7, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Yu Hiraoka
- Department of Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
- Department of Artificial Intelligence Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Junya Sato
- Department of Artificial Intelligence Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Miyuki Tomiyama
- Department of Artificial Intelligence Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Daiki Nishigaki
- Department of Artificial Intelligence Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Kosuke Kita
- Department of Artificial Intelligence Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Yuki Suzuki
- Department of Artificial Intelligence Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Shoji Kido
- Department of Artificial Intelligence Diagnostic Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan.
| | - Noriyuki Tomiyama
- Department of Radiology, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, Osaka, 565-0871, Japan
| |
Collapse
|
6
|
Muizelaar H, Haas M, van Dortmont K, van der Putten P, Spruit M. Extracting patient lifestyle characteristics from Dutch clinical text with BERT models. BMC Med Inform Decis Mak 2024; 24:151. [PMID: 38831420 PMCID: PMC11149227 DOI: 10.1186/s12911-024-02557-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND BERT models have seen widespread use on unstructured text within the clinical domain. However, little to no research has been conducted into classifying unstructured clinical notes on the basis of patient lifestyle indicators, especially in Dutch. This article aims to test the feasibility of deep BERT models on the task of patient lifestyle classification, as well as introducing an experimental framework that is easily reproducible in future research. METHODS This study makes use of unstructured general patient text data from HagaZiekenhuis, a large hospital in The Netherlands. Over 148 000 notes were provided to us, which were each automatically labelled on the basis of the respective patients' smoking, alcohol usage and drug usage statuses. In this paper we test feasibility of automatically assigning labels, and justify it using hand-labelled input. Ultimately, we compare macro F1-scores of string matching, SGD and several BERT models on the task of classifying smoking, alcohol and drug usage. We test Dutch BERT models and English models with translated input. RESULTS We find that our further pre-trained MedRoBERTa.nl-HAGA model outperformed every other model on smoking (0.93) and drug usage (0.77). Interestingly, our ClinicalBERT model that was merely fine-tuned on translated text performed best on the alcohol task (0.80). In t-SNE visualisations, we show our MedRoBERTa.nl-HAGA model is the best model to differentiate between classes in the embedding space, explaining its superior classification performance. CONCLUSIONS We suggest MedRoBERTa.nl-HAGA to be used as a baseline in future research on Dutch free text patient lifestyle classification. We furthermore strongly suggest further exploring the application of translation to input text in non-English clinical BERT research, as we only translated a subset of the full set and yet achieved very promising results.
Collapse
Affiliation(s)
- Hielke Muizelaar
- LIACS, Leiden University, P.O. Box 9512, Leiden, 2300RA, The Netherlands.
- Department of Public Health and Primary Care, Leiden University Medical Center, Albinusdreef 2, Leiden, 2333ZA, The Netherlands.
| | - Marcel Haas
- Department of Public Health and Primary Care, Leiden University Medical Center, Albinusdreef 2, Leiden, 2333ZA, The Netherlands
| | - Koert van Dortmont
- Department of Business Intelligence, HagaZiekenhuis, Els Borst-Eilersplein 275, Den Haag, 2545AA, The Netherlands
| | | | - Marco Spruit
- LIACS, Leiden University, P.O. Box 9512, Leiden, 2300RA, The Netherlands
- Department of Public Health and Primary Care, Leiden University Medical Center, Albinusdreef 2, Leiden, 2333ZA, The Netherlands
| |
Collapse
|
7
|
Gorenstein L, Konen E, Green M, Klang E. Bidirectional Encoder Representations from Transformers in Radiology: A Systematic Review of Natural Language Processing Applications. J Am Coll Radiol 2024; 21:914-941. [PMID: 38302036 DOI: 10.1016/j.jacr.2024.01.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/13/2024] [Accepted: 01/26/2024] [Indexed: 02/03/2024]
Abstract
INTRODUCTION Bidirectional Encoder Representations from Transformers (BERT), introduced in 2018, has revolutionized natural language processing. Its bidirectional understanding of word context has enabled innovative applications, notably in radiology. This study aimed to assess BERT's influence and applications within the radiologic domain. METHODS Adhering to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a systematic review, searching PubMed for literature on BERT-based models and natural language processing in radiology from January 1, 2018, to February 12, 2023. The search encompassed keywords related to generative models, transformer architecture, and various imaging techniques. RESULTS Of 597 results, 30 met our inclusion criteria. The remaining were unrelated to radiology or did not use BERT-based models. The included studies were retrospective, with 14 published in 2022. The primary focus was on classification and information extraction from radiology reports, with x-rays as the prevalent imaging modality. Specific investigations included automatic CT protocol assignment and deep learning applications in chest x-ray interpretation. CONCLUSION This review underscores the primary application of BERT in radiology for report classification. It also reveals emerging BERT applications for protocol assignment and report generation. As BERT technology advances, we foresee further innovative applications. Its implementation in radiology holds potential for enhancing diagnostic precision, expediting report generation, and optimizing patient care.
Collapse
Affiliation(s)
- Larisa Gorenstein
- Department of Diagnostic Imaging, Sheba Medical Center, Ramat-Gan, Israel; Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Eli Konen
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel; Chair, Department of Diagnostic Imaging, Sheba Medical Center, Ramat-Gan, Israel
| | - Michael Green
- Department of Diagnostic Imaging, Sheba Medical Center, Ramat-Gan, Israel; Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eyal Klang
- Icahn School of Medicine at Mount Sinai, New York, New York; and Associate Professor of Radiology, Innovation Center, Sheba Medical Center, Affiliated with Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
8
|
Jiang Z, Cai X, Yang L, Gao D, Zhao W, Han J, Liu J, Shen D, Liu T. Learning to Summarize Chinese Radiology Findings With a Pre-Trained Encoder. IEEE Trans Biomed Eng 2023; 70:3277-3287. [PMID: 37314905 DOI: 10.1109/tbme.2023.3280987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Automatic radiology report summarization has been an attractive research problem towards computer-aided diagnosis to alleviate physicians' workload in recent years. However, existing methods for English radiology report summarization using deep learning techniques cannot be directly applied to Chinese radiology reports due to limitations of the related corpus. In response to this, we propose an abstractive summarization approach for Chinese chest radiology report. Our approach involves the construction of a pre-training corpus using a Chinese medical-related pre-training dataset, and the collection of Chinese chest radiology reports from Department of Radiology at the Second Xiangya Hospital as the fine-tuning corpus. To improve the initialization of the encoder, we introduce a new task-oriented pre-training objective called Pseudo Summary Objective on the pre-training corpus. We then develop a Chinese pre-trained language model called Chinese medical BERT (CMBERT), which is used to initialize the encoder and fine-tuned on the abstractive summarization task. In testing our approach on a real large-scale hospital dataset, we observe that the performance of our proposed approach achieves outstanding improvement compared with other abstractive summarization models. This highlights the effectiveness of our approach in addressing the limitations of previous methods for Chinese radiology report summarization. Overall, our proposed approach demonstrates a promising direction for the automatic summarization of Chinese chest radiology reports, offering a viable solution to alleviate physicians' workload in the field of computer-aided diagnosis.
Collapse
|