Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Percha B, Zhang Y, Bozkurt S, Rubin D, Altman RB, Langlotz CP. Expanding a radiology lexicon using contextual patterns in radiology reports. J Am Med Inform Assoc 2018;25:679-685. [PMID: 29329435 PMCID: PMC5978019 DOI: 10.1093/jamia/ocx152] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 11/01/2017] [Accepted: 12/18/2017] [Indexed: 11/14/2022] Open

For:	Percha B, Zhang Y, Bozkurt S, Rubin D, Altman RB, Langlotz CP. Expanding a radiology lexicon using contextual patterns in radiology reports. J Am Med Inform Assoc 2018;25:679-685. [PMID: 29329435 PMCID: PMC5978019 DOI: 10.1093/jamia/ocx152] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 11/01/2017] [Accepted: 12/18/2017] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Reichenpfader D, Knupp J, von Däniken SU, Gaio R, Dennstädt F, Cereghetti GM, Sander A, Hiltbrunner H, Nairz K, Denecke K. Enhancing Bidirectional Encoder Representations From Transformers (BERT) With Frame Semantics to Extract Clinically Relevant Information From German Mammography Reports: Algorithm Development and Validation. J Med Internet Res 2025;27:e68427. [PMID: 40279645 PMCID: PMC12064967 DOI: 10.2196/68427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 02/20/2025] [Accepted: 03/16/2025] [Indexed: 04/27/2025] Open

Abstract

BACKGROUND

Structured reporting is essential for improving the clarity and accuracy of radiological information. Despite its benefits, the European Society of Radiology notes that it is not widely adopted. For example, while structured reporting frameworks such as the Breast Imaging Reporting and Data System provide standardized terminology and classification for mammography findings, radiology reports still mostly comprise free-text sections. This variability complicates the systematic extraction of key clinical data. Moreover, manual structuring of reports is time-consuming and prone to inconsistencies. Recent advancements in large language models have shown promise for clinical information extraction by enabling models to understand contextual nuances in medical text. However, challenges such as domain adaptation, privacy concerns, and generalizability remain. To address these limitations, frame semantics offers an approach to information extraction grounded in computational linguistics, allowing a structured representation of clinically relevant concepts.

OBJECTIVE

This study explores the combination of Bidirectional Encoder Representations from Transformers (BERT) architecture with the linguistic concept of frame semantics to extract and normalize information from free-text mammography reports.

METHODS

After creating an annotated corpus of 210 German reports for fine-tuning, we generate several BERT model variants by applying 3 pretraining strategies to hospital data. Afterward, a fact extraction pipeline is built, comprising an extractive question-answering model and a sequence labeling model. We quantitatively evaluate all model variants using common evaluation metrics (model perplexity, Stanford Question Answering Dataset 2.0 [SQuAD_v2], seqeval) and perform a qualitative clinician evaluation of the entire pipeline on a manually generated synthetic dataset of 21 reports, as well as a comparison with a generative approach following best practice prompting techniques using the open-source Llama 3.3 model (Meta).

RESULTS

Our system is capable of extracting 14 fact types and 40 entities from the clinical findings section of mammography reports. Further pretraining on hospital data reduced model perplexity, although it did not significantly impact the 2 downstream tasks. We achieved average F1-scores of 90.4% and 81% for question answering and sequence labeling, respectively (best pretraining strategy). Qualitative evaluation of the pipeline based on synthetic data shows an overall precision of 96.1% and 99.6% for facts and entities, respectively. In contrast, generative extraction shows an overall precision of 91.2% and 87.3% for facts and entities, respectively. Hallucinations and extraction inconsistencies were observed.

CONCLUSIONS

This study demonstrates that frame semantics provides a robust and interpretable framework for automating structured reporting. By leveraging frame semantics, the approach enables customizable information extraction and supports generalization to diverse radiological domains and clinical contexts with additional annotation efforts. Furthermore, the BERT-based model architecture allows for efficient, on-premise deployment, ensuring data privacy. Future research should focus on validating the model's generalizability across external datasets and different report types to ensure its broader applicability in clinical practice.

Collapse

Diab KM, Deng J, Wu Y, Yesha Y, Collado-Mesa F, Nguyen P. Natural Language Processing for Breast Imaging: A Systematic Review. Diagnostics (Basel) 2023;13:diagnostics13081420. [PMID: 37189521 DOI: 10.3390/diagnostics13081420] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 04/05/2023] [Accepted: 04/11/2023] [Indexed: 05/17/2023] Open

Yang R, Zhu D, Howard LE, De Hoedt A, Williams SB, Freedland SJ, Klaassen Z. Identification of Patients With Metastatic Prostate Cancer With Natural Language Processing and Machine Learning. JCO Clin Cancer Inform 2022;6:e2100071. [PMID: 36215673 DOI: 10.1200/cci.21.00071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open

Abstract

PURPOSE

Understanding treatment patterns and effectiveness for patients with metastatic prostate cancer (mPCa) is dependent on accurate assessment of metastatic status. The objective was to develop a natural language processing (NLP) model for identifying patients with mPCa and evaluate the model's performance against chart-reviewed data and an International Classification of Diseases (ICD) 9/10 code-based method.

METHODS

In total, 139,057 radiology reports on 6,211 unique patients from the Department of Veterans Affairs were used. The gold standard was metastases by detailed chart review of radiology reports. NLP performance was assessed by sensitivity, specificity, positive predictive value, negative predictive value, and date of metastases detection. Receiver operating characteristic curves was used to assess model performance.

RESULTS

When compared with chart review, the NLP model had high sensitivity and specificity (85% and 96%, respectively). The NLP model was able to predict patient-level metastasis status with a sensitivity of 91% and specificity of 81%, whereas sensitivity and specificity using ICD9/10 billing codes were 73% and 86%, respectively. For the NLP model, date of metastases detection was exactly concordant and within < 1 week in 55% and 58% of patients, compared with 8% and 17%, respectively, using the ICD9/10 billing codes method. The area under the curve for the NLP model was 0.911. A limitation is the NLP model was developed on the basis of a subset of patients with mPCa and may not be generalizable to all patients with mPCa.

CONCLUSION

This population-level NLP model for identifying patients with mPCa was more accurate than using ICD9/10 billing codes when compared with chart-reviewed data. Upon further validation, this model may allow for efficient population-level identification of patients with mPCa.

Collapse

Khan MS, Landman BA, Deppen SA, Matheny ME. Intrinsic Evaluation of Contextual and Non-contextual Word Embeddings using Radiology Reports. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022;2021:631-640. [PMID: 35308988 PMCID: PMC8861761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]

Filice RW, Kahn CE. Biomedical Ontologies to Guide AI Development in Radiology. J Digit Imaging 2021;34:1331-1341. [PMID: 34724143 PMCID: PMC8669056 DOI: 10.1007/s10278-021-00527-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 04/27/2021] [Accepted: 10/13/2021] [Indexed: 10/25/2022] Open

Sarker A. LexExp: a system for automatically expanding concept lexicons for noisy biomedical texts. Bioinformatics 2021;37:2499-2501. [PMID: 33244602 PMCID: PMC8388038 DOI: 10.1093/bioinformatics/btaa995] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 10/04/2020] [Accepted: 11/17/2020] [Indexed: 11/16/2022] Open

Jung E, Jain H, Sinha AP, Gaudioso C. Building a specialized lexicon for breast cancer clinical trial subject eligibility analysis. Health Informatics J 2021;27:1460458221989392. [PMID: 33535885 DOI: 10.1177/1460458221989392] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Percha B. Modern Clinical Text Mining: A Guide and Review. Annu Rev Biomed Data Sci 2021;4:165-187. [PMID: 34465177 DOI: 10.1146/annurev-biodatasci-030421-030931] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Casey A, Davidson E, Poon M, Dong H, Duma D, Grivas A, Grover C, Suárez-Paniagua V, Tobin R, Whiteley W, Wu H, Alex B. A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak 2021;21:179. [PMID: 34082729 PMCID: PMC8176715 DOI: 10.1186/s12911-021-01533-7] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/17/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Natural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports.

METHODS

We conduct an automated literature search yielding 4836 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics.

RESULTS

We present a comprehensive analysis of the 164 publications retrieved with publications in 2019 almost triple those in 2015. Each publication is categorised into one of 6 clinical application categories. Deep learning use increases in the period but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results.

CONCLUSIONS

Automated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process and we show that research in this field continues to grow. Reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers in the field providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication.

Collapse

Maros ME, Cho CG, Junge AG, Kämpgen B, Saase V, Siegel F, Trinkmann F, Ganslandt T, Groden C, Wenz H. Comparative analysis of machine learning algorithms for computer-assisted reporting based on fully automated cross-lingual RadLex mappings. Sci Rep 2021;11:5529. [PMID: 33750857 PMCID: PMC7970897 DOI: 10.1038/s41598-021-85016-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 02/23/2021] [Indexed: 02/03/2023] Open

Affiliation(s)

Máté E Maros Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany. Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
Chang Gyu Cho Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
Andreas G Junge Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany
Benedikt Kämpgen Empolis Information Management GmbH, Kaiserslautern, Germany
Victor Saase Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany
Fabian Siegel Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
Frederik Trinkmann Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
Thomas Ganslandt Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
Christoph Groden Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany
Holger Wenz Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany

Collapse

Bozkurt S, Alkim E, Banerjee I, Rubin DL. Automated Detection of Measurements and Their Descriptors in Radiology Reports Using a Hybrid Natural Language Processing Algorithm. J Digit Imaging 2020;32:544-553. [PMID: 31222557 PMCID: PMC6646482 DOI: 10.1007/s10278-019-00237-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open

Deshpande P, Rasin A, Son J, Kim S, Brown E, Furst J, Raicu DS, Montner SM, Armato SG. Ontology-Based Radiology Teaching File Summarization, Coverage, and Integration. J Digit Imaging 2020;33:797-813. [PMID: 32253657 PMCID: PMC7256159 DOI: 10.1007/s10278-020-00331-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

Abstract

Radiology teaching file repositories contain a large amount of information about patient health and radiologist interpretation of medical findings. Although valuable for radiology education, the use of teaching file repositories has been hindered by the ability to perform advanced searches on these repositories given the unstructured format of the data and the sparseness of the different repositories. Our term coverage analysis of two major medical ontologies, Radiology Lexicon (RadLex) and Unified Medical Language System (UMLS) Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and two teaching file repositories, Medical Imaging Resource Community (MIRC) and MyPacs, showed that both ontologies combined cover 56.3% of terms in the MIRC and only 17.9% of terms in MyPacs. Furthermore, the overlap between the two ontologies (i.e., terms included by both the RadLex and UMLS SNOMED CT) was a mere 5.6% for the MIRC and 2% for the RadLex. Clustering the content of the teaching file repositories showed that they focus on different diagnostic areas within radiology. The MIRC teaching file covers mostly pediatric cases; a few cases are female patients with heart-, chest-, and bone-related diseases. The MyPacs contains a range of different diseases with no focus on a particular disease category, gender, or age group. MyPacs also provides a wide variety of cases related to the neck, face, heart, chest, and breast. These findings provide valuable insights on what new cases should be added or how existent cases may be integrated to provide more comprehensive data repositories. Similarly, the low-term coverage by the ontologies shows the need to expand ontologies with new terminology such as new terms learned from these teaching file repositories and validated by experts. While our methodology to organize and index data using clustering approaches and medical ontologies is applied to teaching file repositories, it can be applied to any other medical clinical data.

Collapse

Fan Y, Pakhomov S, McEwan R, Zhao W, Lindemann E, Zhang R. Using word embeddings to expand terminology of dietary supplements on clinical notes. JAMIA Open 2019;2:246-253. [PMID: 31825016 PMCID: PMC6904105 DOI: 10.1093/jamiaopen/ooz007] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

NimbleMiner. ACTA ACUST UNITED AC 2019;37:583-590. [DOI: 10.1097/cin.0000000000000557] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Tsuji S, Yagahara A, Fukuda A, Nishimoto N, Tanikawa T, Kawamata M, Uchida K, Ogasawara K. [Toward Launching Electronic Terminology Services in Radiological Technology-The History and Transition of Activities for Building Standard Vocabularies in JSRT]. Nihon Hoshasen Gijutsu Gakkai Zasshi 2019;75:854-860. [PMID: 31434859 DOI: 10.6009/jjrt.2019_jsrt_75.8.854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Bozkurt S, Kan KM, Ferrari MK, Rubin DL, Blayney DW, Hernandez-Boussard T, Brooks JD. Is it possible to automatically assess pretreatment digital rectal examination documentation using natural language processing? A single-centre retrospective study. BMJ Open 2019;9:e027182. [PMID: 31324681 PMCID: PMC6661600 DOI: 10.1136/bmjopen-2018-027182] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open

Coquet J, Bozkurt S, Kan KM, Ferrari MK, Blayney DW, Brooks JD, Hernandez-Boussard T. Comparison of orthogonal NLP methods for clinical phenotyping and assessment of bone scan utilization among prostate cancer patients. J Biomed Inform 2019;94:103184. [PMID: 31014980 PMCID: PMC6584041 DOI: 10.1016/j.jbi.2019.103184] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 04/15/2019] [Accepted: 04/19/2019] [Indexed: 01/31/2023]

Morey JM, Haney NM, Kim W. Applications of AI Beyond Image Interpretation. Artif Intell Med Imaging 2019. [DOI: 10.1007/978-3-319-94878-2_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open

Bozkurt S, Park JI, Kan KM, Ferrari M, Rubin DL, Brooks JD, Hernandez-Boussard T. An Automated Feature Engineering for Digital Rectal Examination Documentation using Natural Language Processing. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018;2018:288-294. [PMID: 30815067 PMCID: PMC6371344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]