1
|
Diab KM, Deng J, Wu Y, Yesha Y, Collado-Mesa F, Nguyen P. Natural Language Processing for Breast Imaging: A Systematic Review. Diagnostics (Basel) 2023; 13:diagnostics13081420. [PMID: 37189521 DOI: 10.3390/diagnostics13081420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 04/05/2023] [Accepted: 04/11/2023] [Indexed: 05/17/2023] Open
Abstract
Natural Language Processing (NLP) has gained prominence in diagnostic radiology, offering a promising tool for improving breast imaging triage, diagnosis, lesion characterization, and treatment management in breast cancer and other breast diseases. This review provides a comprehensive overview of recent advances in NLP for breast imaging, covering the main techniques and applications in this field. Specifically, we discuss various NLP methods used to extract relevant information from clinical notes, radiology reports, and pathology reports and their potential impact on the accuracy and efficiency of breast imaging. In addition, we reviewed the state-of-the-art in NLP-based decision support systems for breast imaging, highlighting the challenges and opportunities of NLP applications for breast imaging in the future. Overall, this review underscores the potential of NLP in enhancing breast imaging care and offers insights for clinicians and researchers interested in this exciting and rapidly evolving field.
Collapse
Affiliation(s)
- Kareem Mahmoud Diab
- Institute for Data Science and Computing, University of Miami, Miami, FL 33146, USA
| | - Jamie Deng
- Department of Computer Science, University of Miami, Miami, FL 33146, USA
| | - Yusen Wu
- Institute for Data Science and Computing, University of Miami, Miami, FL 33146, USA
| | - Yelena Yesha
- Institute for Data Science and Computing, University of Miami, Miami, FL 33146, USA
- Department of Computer Science, University of Miami, Miami, FL 33146, USA
- Department of Radiology, Miller School of Medicine, University of Miami, Miami, FL 33146, USA
| | - Fernando Collado-Mesa
- Department of Radiology, Miller School of Medicine, University of Miami, Miami, FL 33146, USA
| | - Phuong Nguyen
- Institute for Data Science and Computing, University of Miami, Miami, FL 33146, USA
- Department of Computer Science, University of Miami, Miami, FL 33146, USA
- OpenKnect Inc., Halethorpe, MD 21227, USA
| |
Collapse
|
2
|
Yang R, Zhu D, Howard LE, De Hoedt A, Williams SB, Freedland SJ, Klaassen Z. Identification of Patients With Metastatic Prostate Cancer With Natural Language Processing and Machine Learning. JCO Clin Cancer Inform 2022; 6:e2100071. [PMID: 36215673 DOI: 10.1200/cci.21.00071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
PURPOSE Understanding treatment patterns and effectiveness for patients with metastatic prostate cancer (mPCa) is dependent on accurate assessment of metastatic status. The objective was to develop a natural language processing (NLP) model for identifying patients with mPCa and evaluate the model's performance against chart-reviewed data and an International Classification of Diseases (ICD) 9/10 code-based method. METHODS In total, 139,057 radiology reports on 6,211 unique patients from the Department of Veterans Affairs were used. The gold standard was metastases by detailed chart review of radiology reports. NLP performance was assessed by sensitivity, specificity, positive predictive value, negative predictive value, and date of metastases detection. Receiver operating characteristic curves was used to assess model performance. RESULTS When compared with chart review, the NLP model had high sensitivity and specificity (85% and 96%, respectively). The NLP model was able to predict patient-level metastasis status with a sensitivity of 91% and specificity of 81%, whereas sensitivity and specificity using ICD9/10 billing codes were 73% and 86%, respectively. For the NLP model, date of metastases detection was exactly concordant and within < 1 week in 55% and 58% of patients, compared with 8% and 17%, respectively, using the ICD9/10 billing codes method. The area under the curve for the NLP model was 0.911. A limitation is the NLP model was developed on the basis of a subset of patients with mPCa and may not be generalizable to all patients with mPCa. CONCLUSION This population-level NLP model for identifying patients with mPCa was more accurate than using ICD9/10 billing codes when compared with chart-reviewed data. Upon further validation, this model may allow for efficient population-level identification of patients with mPCa.
Collapse
Affiliation(s)
- Ruixin Yang
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC
| | - Di Zhu
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC
| | - Lauren E Howard
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC.,Duke Cancer Institute, Duke University School of Medicine, Durham, NC
| | - Amanda De Hoedt
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC
| | - Stephen B Williams
- Division of Urology, Department of Surgery, The University of Texas Medical Branch, Galveston, TX
| | - Stephen J Freedland
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC.,Division of Urology, Department of Surgery, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA.,Center for Integrated Research in Cancer and Lifestyle, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA
| | - Zachary Klaassen
- Division of Urology, Medical College of Georgia at Augusta University, Augusta, GA.,Georgia Cancer Center, Augusta, GA
| |
Collapse
|
3
|
Khan MS, Landman BA, Deppen SA, Matheny ME. Intrinsic Evaluation of Contextual and Non-contextual Word Embeddings using Radiology Reports. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2021:631-640. [PMID: 35308988 PMCID: PMC8861761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]
Abstract
Many clinical natural language processing methods rely on non-contextual word embedding (NCWE) or contextual word embedding (CWE) models. Yet, few, if any, intrinsic evaluation benchmarks exist comparing embedding representations against clinician judgment. We developed intrinsic evaluation tasks for embedding models using a corpus of radiology reports: term pair similarity for NCWEs and cloze task accuracy for CWEs. Using surveys, we quantified the agreement between clinician judgment and embedding model representations. We compare embedding models trained on a custom radiology report corpus (RRC), a general corpus, and PubMed and MIMIC-III corpora (P&MC). Cloze task accuracy was equivalent for RRC and P&MC models. For term pair similarity, P&MC-trained NCWEs outperformed all other NCWE models (ρspearman 0.61 vs. 0.27-0.44). Among models trained on RRC, fastText models often outperformed other NCWE models and spherical embeddings provided overly optimistic representations of term pair similarity.
Collapse
Affiliation(s)
- Mirza S Khan
- US Dept. of Veterans Affairs, Nashville, TN,Vanderbilt University, Nasvhille, TN,Vanderbilt University Medical Center, Nashville, TN
| | - Bennett A Landman
- Vanderbilt University, Nasvhille, TN,Vanderbilt University Medical Center, Nashville, TN
| | | | - Michael E Matheny
- US Dept. of Veterans Affairs, Nashville, TN,Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
4
|
Filice RW, Kahn CE. Biomedical Ontologies to Guide AI Development in Radiology. J Digit Imaging 2021; 34:1331-1341. [PMID: 34724143 PMCID: PMC8669056 DOI: 10.1007/s10278-021-00527-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 04/27/2021] [Accepted: 10/13/2021] [Indexed: 10/25/2022] Open
Abstract
The advent of deep learning has engendered renewed and rapidly growing interest in artificial intelligence (AI) in radiology to analyze images, manipulate textual reports, and plan interventions. Applications of deep learning and other AI approaches must be guided by sound medical knowledge to assure that they are developed successfully and that they address important problems in biomedical research or patient care. To date, AI has been applied to a limited number of real-world radiology applications. As AI systems become more pervasive and are applied more broadly, they will benefit from medical knowledge on a larger scale, such as that available through computer-based approaches. A key approach to represent computer-based knowledge in a particular domain is an ontology. As defined in informatics, an ontology defines a domain's terms through their relationships with other terms in the ontology. Those relationships, then, define the terms' semantics, or "meaning." Biomedical ontologies commonly define the relationships between terms and more general terms, and can express causal, part-whole, and anatomic relationships. Ontologies express knowledge in a form that is both human-readable and machine-computable. Some ontologies, such as RSNA's RadLex radiology lexicon, have been applied to applications in clinical practice and research, and may be familiar to many radiologists. This article describes how ontologies can support research and guide emerging applications of AI in radiology, including natural language processing, image-based machine learning, radiomics, and planning.
Collapse
Affiliation(s)
- Ross W Filice
- Department of Radiology, MedStar Georgetown University Hospital, Washington, DC, USA
| | - Charles E Kahn
- Department of Radiology and Institute for Biomedical Informatics, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, USA.
| |
Collapse
|
5
|
Sarker A. LexExp: a system for automatically expanding concept lexicons for noisy biomedical texts. Bioinformatics 2021; 37:2499-2501. [PMID: 33244602 PMCID: PMC8388038 DOI: 10.1093/bioinformatics/btaa995] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 10/04/2020] [Accepted: 11/17/2020] [Indexed: 11/16/2022] Open
Abstract
Summary LexExp is an open-source, data-centric lexicon expansion system that generates spelling variants of lexical expressions in a lexicon using a phrase embedding model, lexical similarity-based natural language processing methods and a set of tunable threshold decay functions. The system is customizable, can be optimized for recall or precision and can generate variants for multi-word expressions. Availability and implementation Code available at: https://bitbucket.org/asarker/lexexp; data and resources available at: https://sarkerlab.org/lexexp. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Abeed Sarker
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
6
|
Jung E, Jain H, Sinha AP, Gaudioso C. Building a specialized lexicon for breast cancer clinical trial subject eligibility analysis. Health Informatics J 2021; 27:1460458221989392. [PMID: 33535885 DOI: 10.1177/1460458221989392] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
A natural language processing (NLP) application requires sophisticated lexical resources to support its processing goals. Different solutions, such as dictionary lookup and MetaMap, have been proposed in the healthcare informatics literature to identify disease terms with more than one word (multi-gram disease named entities). Although a lot of work has been done in the identification of protein- and gene-named entities in the biomedical field, not much research has been done on the recognition and resolution of terminologies in the clinical trial subject eligibility analysis. In this study, we develop a specialized lexicon for improving NLP and text mining analysis in the breast cancer domain, and evaluate it by comparing it with the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). We use a hybrid methodology, which combines the knowledge of domain experts, terms from multiple online dictionaries, and the mining of text from sample clinical trials. Use of our methodology introduces 4243 unique lexicon items, which increase bigram entity match by 38.6% and trigram entity match by 41%. Our lexicon, which adds a significant number of new terms, is very useful for matching patients to clinical trials automatically based on eligibility matching. Beyond clinical trial matching, the specialized lexicon developed in this study could serve as a foundation for future healthcare text mining applications.
Collapse
Affiliation(s)
- Euisung Jung
- Information Operations and Technology Management, John B. and Lillian E. Neff College of Business and Innovation, The University of Toledo, USA
| | - Hemant Jain
- Gary W. Rollins College of Business, The University of Tennessee at Chattanooga, USA
| | - Atish P Sinha
- Lubar School of Business, University of Wisconsin-Milwaukee, USA
| | | |
Collapse
|
7
|
Abstract
Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g., physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, this review describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation in health systems and in industry.
Collapse
Affiliation(s)
- Bethany Percha
- Department of Medicine and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10025, USA;
| |
Collapse
|
8
|
Casey A, Davidson E, Poon M, Dong H, Duma D, Grivas A, Grover C, Suárez-Paniagua V, Tobin R, Whiteley W, Wu H, Alex B. A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak 2021; 21:179. [PMID: 34082729 PMCID: PMC8176715 DOI: 10.1186/s12911-021-01533-7] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/17/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Natural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports. METHODS We conduct an automated literature search yielding 4836 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics. RESULTS We present a comprehensive analysis of the 164 publications retrieved with publications in 2019 almost triple those in 2015. Each publication is categorised into one of 6 clinical application categories. Deep learning use increases in the period but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results. CONCLUSIONS Automated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process and we show that research in this field continues to grow. Reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers in the field providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication.
Collapse
Affiliation(s)
- Arlene Casey
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland
| | - Emma Davidson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland
| | - Michael Poon
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland
| | - Hang Dong
- Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland
- Health Data Research UK, London, UK
| | - Daniel Duma
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland
| | - Andreas Grivas
- Institute for Language, Cognition and Computation, School of informatics, University of Edinburgh, Edinburgh, Scotland
| | - Claire Grover
- Institute for Language, Cognition and Computation, School of informatics, University of Edinburgh, Edinburgh, Scotland
| | - Víctor Suárez-Paniagua
- Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland
- Health Data Research UK, London, UK
| | - Richard Tobin
- Institute for Language, Cognition and Computation, School of informatics, University of Edinburgh, Edinburgh, Scotland
| | - William Whiteley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Honghan Wu
- Health Data Research UK, London, UK
- Institute of Health Informatics, University College London, London, UK
| | - Beatrice Alex
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland
- Edinburgh Futures Institute, University of Edinburgh, Edinburgh, Scotland
| |
Collapse
|
9
|
Maros ME, Cho CG, Junge AG, Kämpgen B, Saase V, Siegel F, Trinkmann F, Ganslandt T, Groden C, Wenz H. Comparative analysis of machine learning algorithms for computer-assisted reporting based on fully automated cross-lingual RadLex mappings. Sci Rep 2021; 11:5529. [PMID: 33750857 PMCID: PMC7970897 DOI: 10.1038/s41598-021-85016-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 02/23/2021] [Indexed: 02/03/2023] Open
Abstract
Computer-assisted reporting (CAR) tools were suggested to improve radiology report quality by context-sensitively recommending key imaging biomarkers. However, studies evaluating machine learning (ML) algorithms on cross-lingual ontological (RadLex) mappings for developing embedded CAR algorithms are lacking. Therefore, we compared ML algorithms developed on human expert-annotated features against those developed on fully automated cross-lingual (German to English) RadLex mappings using 206 CT reports of suspected stroke. Target label was whether the Alberta Stroke Programme Early CT Score (ASPECTS) should have been provided (yes/no:154/52). We focused on probabilistic outputs of ML-algorithms including tree-based methods, elastic net, support vector machines (SVMs) and fastText (linear classifier), which were evaluated in the same 5 × fivefold nested cross-validation framework. This allowed for model stacking and classifier rankings. Performance was evaluated using calibration metrics (AUC, brier score, log loss) and -plots. Contextual ML-based assistance recommending ASPECTS was feasible. SVMs showed the highest accuracies both on human-extracted- (87%) and RadLex features (findings:82.5%; impressions:85.4%). FastText achieved the highest accuracy (89.3%) and AUC (92%) on impressions. Boosted trees fitted on findings had the best calibration profile. Our approach provides guidance for choosing ML classifiers for CAR tools in fully automated and language-agnostic fashion using bag-of-RadLex terms on limited expert-labelled training data.
Collapse
Affiliation(s)
- Máté E Maros
- Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany.
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
| | - Chang Gyu Cho
- Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Andreas G Junge
- Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany
| | | | - Victor Saase
- Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany
| | - Fabian Siegel
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Frederik Trinkmann
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Thomas Ganslandt
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Christoph Groden
- Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany
| | - Holger Wenz
- Department of Neuroradiology, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68137, Mannheim, Germany
| |
Collapse
|
10
|
Bozkurt S, Alkim E, Banerjee I, Rubin DL. Automated Detection of Measurements and Their Descriptors in Radiology Reports Using a Hybrid Natural Language Processing Algorithm. J Digit Imaging 2020; 32:544-553. [PMID: 31222557 PMCID: PMC6646482 DOI: 10.1007/s10278-019-00237-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Radiological measurements are reported in free text reports, and it is challenging to extract such measures for treatment planning such as lesion summarization and cancer response assessment. The purpose of this work is to develop and evaluate a natural language processing (NLP) pipeline that can extract measurements and their core descriptors, such as temporality, anatomical entity, imaging observation, RadLex descriptors, series number, image number, and segment from a wide variety of radiology reports (MR, CT, and mammogram). We created a hybrid NLP pipeline that integrates rule-based feature extraction modules and conditional random field (CRF) model for extraction of the measurements from the radiology reports and links them with clinically relevant features such as anatomical entities or imaging observations. The pipeline was trained on 1117 CT/MR reports, and performance of the system was evaluated on an independent set of 100 expert-annotated CT/MR reports and also tested on 25 mammography reports. The system detected 813 out of 806 measurements in the CT/MR reports; 784 were true positives, 29 were false positives, and 0 were false negatives. Similarly, from the mammography reports, 96% of the measurements with their modifiers were extracted correctly. Our approach could enable the development of computerized applications that can utilize summarized lesion measurements from radiology report of varying modalities and improve practice by tracking the same lesions along multiple radiologic encounters.
Collapse
Affiliation(s)
- Selen Bozkurt
- Department of Biomedical Data Science, Stanford University School of Medicine, Medical School Office Building (MSOB), Room X-335, MC 5464, 1265 Welch Road, Stanford, CA, 94305-5479, USA
| | - Emel Alkim
- Department of Biomedical Data Science, Stanford University School of Medicine, Medical School Office Building (MSOB), Room X-335, MC 5464, 1265 Welch Road, Stanford, CA, 94305-5479, USA
| | - Imon Banerjee
- Department of Biomedical Data Science, Stanford University School of Medicine, Medical School Office Building (MSOB), Room X-335, MC 5464, 1265 Welch Road, Stanford, CA, 94305-5479, USA.,Department of Radiology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Daniel L Rubin
- Department of Biomedical Data Science, Stanford University School of Medicine, Medical School Office Building (MSOB), Room X-335, MC 5464, 1265 Welch Road, Stanford, CA, 94305-5479, USA. .,Department of Radiology, Stanford University School of Medicine, Stanford, CA, 94305, USA.
| |
Collapse
|
11
|
Deshpande P, Rasin A, Son J, Kim S, Brown E, Furst J, Raicu DS, Montner SM, Armato SG. Ontology-Based Radiology Teaching File Summarization, Coverage, and Integration. J Digit Imaging 2020; 33:797-813. [PMID: 32253657 PMCID: PMC7256159 DOI: 10.1007/s10278-020-00331-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Radiology teaching file repositories contain a large amount of information about patient health and radiologist interpretation of medical findings. Although valuable for radiology education, the use of teaching file repositories has been hindered by the ability to perform advanced searches on these repositories given the unstructured format of the data and the sparseness of the different repositories. Our term coverage analysis of two major medical ontologies, Radiology Lexicon (RadLex) and Unified Medical Language System (UMLS) Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and two teaching file repositories, Medical Imaging Resource Community (MIRC) and MyPacs, showed that both ontologies combined cover 56.3% of terms in the MIRC and only 17.9% of terms in MyPacs. Furthermore, the overlap between the two ontologies (i.e., terms included by both the RadLex and UMLS SNOMED CT) was a mere 5.6% for the MIRC and 2% for the RadLex. Clustering the content of the teaching file repositories showed that they focus on different diagnostic areas within radiology. The MIRC teaching file covers mostly pediatric cases; a few cases are female patients with heart-, chest-, and bone-related diseases. The MyPacs contains a range of different diseases with no focus on a particular disease category, gender, or age group. MyPacs also provides a wide variety of cases related to the neck, face, heart, chest, and breast. These findings provide valuable insights on what new cases should be added or how existent cases may be integrated to provide more comprehensive data repositories. Similarly, the low-term coverage by the ontologies shows the need to expand ontologies with new terminology such as new terms learned from these teaching file repositories and validated by experts. While our methodology to organize and index data using clustering approaches and medical ontologies is applied to teaching file repositories, it can be applied to any other medical clinical data.
Collapse
Affiliation(s)
| | | | - Jun Son
- DePaul University, Chicago, IL USA
| | | | | | | | | | | | | |
Collapse
|
12
|
Fan Y, Pakhomov S, McEwan R, Zhao W, Lindemann E, Zhang R. Using word embeddings to expand terminology of dietary supplements on clinical notes. JAMIA Open 2019; 2:246-253. [PMID: 31825016 PMCID: PMC6904105 DOI: 10.1093/jamiaopen/ooz007] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Objective The objective of this study is to demonstrate the feasibility of applying word embeddings to expand the terminology of dietary supplements (DS) using over 26 million clinical notes. Methods Word embedding models (ie, word2vec and GloVe) trained on clinical notes were used to predefine a list of top 40 semantically related terms for each of 14 commonly used DS. Each list was further evaluated by experts to generate semantically similar terms. We investigated the effect of corpus size and other settings (ie, vector size and window size) as well as the 2 word embedding models on performance for DS term expansion. We compared the number of clinical notes (and patients they represent) that were retrieved using the word embedding expanded terms to both the baseline terms and external DS sources expanded terms. Results Using the word embedding models trained on clinical notes, we could identify 1–12 semantically similar terms for each DS. Using the word embedding expanded terms, we were able to retrieve averagely 8.39% more clinical notes and 11.68% more patients for each DS compared with 2 sets of terms. The increasing corpus size results in more misspellings, but not more semantic variants and brand names. Word2vec model is also found more capable of detecting semantically similar terms than GloVe. Conclusion Our study demonstrates the utility of word embeddings on clinical notes for terminology expansion on 14 DS. We propose that this method can be potentially applied to create a DS vocabulary for downstream applications, such as information extraction.
Collapse
Affiliation(s)
- Yadan Fan
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Serguei Pakhomov
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA.,College of Pharmacy, University of Minnesota, Minneapolis, Minnesota, USA
| | - Reed McEwan
- Academic Health Center-Information Systems, University of Minnesota, Minneapolis, Minnesota, USA
| | - Wendi Zhao
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | | | - Rui Zhang
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA.,College of Pharmacy, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
13
|
|
14
|
Tsuji S, Yagahara A, Fukuda A, Nishimoto N, Tanikawa T, Kawamata M, Uchida K, Ogasawara K. [Toward Launching Electronic Terminology Services in Radiological Technology-The History and Transition of Activities for Building Standard Vocabularies in JSRT]. Nihon Hoshasen Gijutsu Gakkai Zasshi 2019; 75:854-860. [PMID: 31434859 DOI: 10.6009/jjrt.2019_jsrt_75.8.854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
| | - Ayako Yagahara
- Faculty of Health Sciences, Hokkaido University.,Faculty of Health Sciences, Hokkaido University of Science
| | | | - Naoki Nishimoto
- Clinical Research and Medical Innovation Center, Hokkaido University Hospital
| | - Takumi Tanikawa
- Faculty of Health Sciences, Hokkaido University.,Faculty of Health Sciences, Hokkaido University of Science
| | - Minoru Kawamata
- Department of Radiology, Osaka International Cancer Institute
| | | | | |
Collapse
|
15
|
Bozkurt S, Kan KM, Ferrari MK, Rubin DL, Blayney DW, Hernandez-Boussard T, Brooks JD. Is it possible to automatically assess pretreatment digital rectal examination documentation using natural language processing? A single-centre retrospective study. BMJ Open 2019; 9:e027182. [PMID: 31324681 PMCID: PMC6661600 DOI: 10.1136/bmjopen-2018-027182] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
OBJECTIVES To develop and test a method for automatic assessment of a quality metric, provider-documented pretreatment digital rectal examination (DRE), using the outputs of a natural language processing (NLP) framework. SETTING An electronic health records (EHR)-based prostate cancer data warehouse was used to identify patients and associated clinical notes from 1 January 2005 to 31 December 2017. Using a previously developed natural language processing pipeline, we classified DRE assessment as documented (currently or historically performed), deferred (or suggested as a future examination) and refused. PRIMARY AND SECONDARY OUTCOME MEASURES We investigated the quality metric performance, documentation 6 months before treatment and identified patient and clinical factors associated with metric performance. RESULTS The cohort included 7215 patients with prostate cancer and 426 227 unique clinical notes associated with pretreatment encounters. DREs of 5958 (82.6%) patients were documented and 1257 (17.4%) of patients did not have a DRE documented in the EHR. A total of 3742 (51.9%) patient DREs were documented within 6 months prior to treatment, meeting the quality metric. Patients with private insurance had a higher rate of DRE 6 months prior to starting treatment as compared with Medicaid-based or Medicare-based payors (77.3%vs69.5%, p=0.001). Patients undergoing chemotherapy, radiation therapy or surgery as the first line of treatment were more likely to have a documented DRE 6 months prior to treatment. CONCLUSION EHRs contain valuable unstructured information and with NLP, it is feasible to accurately and efficiently identify quality metrics with current documentation clinician workflow.
Collapse
Affiliation(s)
- Selen Bozkurt
- Biomedical Data Science, Stanford University, Stanford, CA, USA
- Medicine (Biomedical Informatics), Stanford University, Stanford, CA, USA
| | - Kathleen M Kan
- Urology, Stanford Lucile Salter Packard Children's Hospital, Stanford, CA, USA
| | | | - Daniel L Rubin
- Biomedical Data Science, Stanford University, Stanford, CA, USA
- Radiology, Stanford University, Stanford, CA, USA
| | | | - Tina Hernandez-Boussard
- Biomedical Data Science, Stanford University, Stanford, CA, USA
- Medicine (Biomedical Informatics), Stanford University, Stanford, CA, USA
- Surgery, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
16
|
Coquet J, Bozkurt S, Kan KM, Ferrari MK, Blayney DW, Brooks JD, Hernandez-Boussard T. Comparison of orthogonal NLP methods for clinical phenotyping and assessment of bone scan utilization among prostate cancer patients. J Biomed Inform 2019; 94:103184. [PMID: 31014980 PMCID: PMC6584041 DOI: 10.1016/j.jbi.2019.103184] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 04/15/2019] [Accepted: 04/19/2019] [Indexed: 01/31/2023]
Abstract
OBJECTIVE Clinical care guidelines recommend that newly diagnosed prostate cancer patients at high risk for metastatic spread receive a bone scan prior to treatment and that low risk patients not receive it. The objective was to develop an automated pipeline to interrogate heterogeneous data to evaluate the use of bone scans using a two different Natural Language Processing (NLP) approaches. MATERIALS AND METHODS Our cohort was divided into risk groups based on Electronic Health Records (EHR). Information on bone scan utilization was identified in both structured data and free text from clinical notes. Our pipeline annotated sentences with a combination of a rule-based method using the ConText algorithm (a generalization of NegEx) and a Convolutional Neural Network (CNN) method using word2vec to produce word embeddings. RESULTS A total of 5500 patients and 369,764 notes were included in the study. A total of 39% of patients were high-risk and 73% of these received a bone scan; of the 18% low risk patients, 10% received one. The accuracy of CNN model outperformed the rule-based model one (F-measure = 0.918 and 0.897 respectively). We demonstrate a combination of both models could maximize precision or recall, based on the study question. CONCLUSION Using structured data, we accurately classified patients' cancer risk group, identified bone scan documentation with two NLP methods, and evaluated guideline adherence. Our pipeline can be used to provide concrete feedback to clinicians and guide treatment decisions.
Collapse
Affiliation(s)
- Jean Coquet
- Department of Medicine, Stanford University, Stanford, CA, USA
| | - Selen Bozkurt
- Department of Medicine, Stanford University, Stanford, CA, USA; Department of Biomedical Data Science, Stanford University, Stanford, USA
| | - Kathleen M Kan
- Department of Urology, Stanford University School of Medicine, Stanford, USA
| | - Michelle K Ferrari
- Department of Urology, Stanford University School of Medicine, Stanford, USA
| | - Douglas W Blayney
- Department of Medicine, Stanford University, Stanford, CA, USA; Stanford Cancer Institute, Stanford University School of Medicine, Stanford, USA
| | - James D Brooks
- Department of Urology, Stanford University School of Medicine, Stanford, USA; Stanford Cancer Institute, Stanford University School of Medicine, Stanford, USA
| | - Tina Hernandez-Boussard
- Department of Medicine, Stanford University, Stanford, CA, USA; Department of Biomedical Data Science, Stanford University, Stanford, USA; Department of Surgery, Stanford University School of Medicine, Stanford, USA.
| |
Collapse
|
17
|
|
18
|
Bozkurt S, Park JI, Kan KM, Ferrari M, Rubin DL, Brooks JD, Hernandez-Boussard T. An Automated Feature Engineering for Digital Rectal Examination Documentation using Natural Language Processing. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:288-294. [PMID: 30815067 PMCID: PMC6371344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Digital rectal examination (DRE) is considered a quality metric for prostate cancer care. However, much of the DRE related rich information is documented as free-text in clinical narratives. Therefore, we aimed to develop a natural language processing (NLP) pipeline for automatic documentation of DRE in clinical notes using a domain-specific dictionary created by clinical experts and an extended version of the same dictionary learned by clinical notes using distributional semantics algorithms. The proposed pipeline was compared to a baseline NLP algorithm and the results of the proposed pipeline were found superior in terms of precision (0.95) and recall (0.90) for documentation of DRE. We believe the rule-based NLP pipeline enriched with terms learned from the whole corpus can provide accurate and efficient identification of this quality metric.
Collapse
Affiliation(s)
- Selen Bozkurt
- Department of Medicine, Center for Biomedical Informatics Research, Stanford University, Stanford, CA
- Department of Biomedical Data Science, Stanford University, Stanford, CA
| | - Jung In Park
- Department of Medicine, Center for Biomedical Informatics Research, Stanford University, Stanford, CA
| | - Kathleen Mary Kan
- Department of Urology, Stanford University School of Medicine, Stanford, CA
| | - Michelle Ferrari
- Department of Urology, Stanford University School of Medicine, Stanford, CA
| | - Daniel L Rubin
- Department of Medicine, Center for Biomedical Informatics Research, Stanford University, Stanford, CA
- Department of Biomedical Data Science, Stanford University, Stanford, CA
- Department of Radiology, Stanford University School of Medicine, Stanford, CA
| | - James D Brooks
- Department of Urology, Stanford University School of Medicine, Stanford, CA
| | - Tina Hernandez-Boussard
- Department of Medicine, Center for Biomedical Informatics Research, Stanford University, Stanford, CA
- Department of Biomedical Data Science, Stanford University, Stanford, CA
| |
Collapse
|