Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wadia R, Akgun K, Brandt C, Fenton BT, Levin W, Marple AH, Garla V, Rose MG, Taddei T, Taylor C. Comparison of Natural Language Processing and Manual Coding for the Identification of Cross-Sectional Imaging Reports Suspicious for Lung Cancer. JCO Clin Cancer Inform 2019;2:1-7. [PMID: 30652545 PMCID: PMC6873962 DOI: 10.1200/cci.17.00069] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open

For:	Wadia R, Akgun K, Brandt C, Fenton BT, Levin W, Marple AH, Garla V, Rose MG, Taddei T, Taylor C. Comparison of Natural Language Processing and Manual Coding for the Identification of Cross-Sectional Imaging Reports Suspicious for Lung Cancer. JCO Clin Cancer Inform 2019;2:1-7. [PMID: 30652545 PMCID: PMC6873962 DOI: 10.1200/cci.17.00069] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open

Number

Cited by Other Article(s)

Wang H, Wu Y, Sun M, Cui X. Enhancing diagnosis of benign lesions and lung cancer through ensemble text and breath analysis: a retrospective cohort study. Sci Rep 2024;14:8731. [PMID: 38627587 PMCID: PMC11021445 DOI: 10.1038/s41598-024-59474-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 04/11/2024] [Indexed: 04/19/2024] Open

Abstract

Early diagnosis of lung cancer (LC) can significantly reduce its mortality rate. Considering the limitations of the high false positive rate and reliance on radiologists' experience in computed tomography (CT)-based diagnosis, a multi-modal early LC screening model that combines radiology with other non-invasive, rapid detection methods is warranted. A high-resolution, multi-modal, and low-differentiation LC screening strategy named ensemble text and breath analysis (ETBA) is proposed that ensembles radiology report text analysis and breath analysis. In total, 231 samples (140 LC patients and 91 benign lesions [BL] patients) were screened using proton transfer reaction-time of flight-mass spectrometry and CT screening. Participants were randomly assigned to a training set and a validation set (4:1) with stratification. The report section of the radiology reports was used to train a text analysis (TA) model with a natural language processing algorithm. Twenty-two volatile organic compounds (VOCs) in the exhaled breath and the prediction results of the TA model were used as predictors to develop the ETBA model using an extreme gradient boosting algorithm. A breath analysis model was developed based on the 22 VOCs. The BA and TA models were compared with the ETBA model. The ETBA model achieved a sensitivity of 94.3%, a specificity of 77.3%, and an accuracy of 87.7% with the validation set. The radiologist diagnosis performance with the validation set had a sensitivity of 74.3%, a specificity of 59.1%, and an accuracy of 68.1%. High sensitivity and specificity were obtained by the ETBA model compared with radiologist diagnosis. The ETBA model has the potential to provide sensitivity and specificity in CT screening of LC. This approach is rapid, non-invasive, multi-dimensional, and accurate for LC and BL diagnosis.

Collapse

Lee YM, Bacchi S, Macri C, Tan Y, Casson RJ, Chan WO. Ophthalmology Operation Note Encoding with Open-Source Machine Learning and Natural Language Processing. Ophthalmic Res 2023;66:928-939. [PMID: 37231984 PMCID: PMC10308528 DOI: 10.1159/000530954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 04/24/2023] [Indexed: 05/27/2023]

Abstract

INTRODUCTION

Accurate assignment of procedural codes has important medico-legal, academic, and economic purposes for healthcare providers. Procedural coding requires accurate documentation and exhaustive manual labour to interpret complex operation notes. Ophthalmology operation notes are highly specialised making the process time-consuming and challenging to implement. This study aimed to develop natural language processing (NLP) models trained by medical professionals to assign procedural codes based on the surgical report. The automation and accuracy of these models can reduce burden on healthcare providers and generate reimbursements that reflect the operation performed.

METHODS

A retrospective analysis of ophthalmological operation notes from two metropolitan hospitals over a 12-month period was conducted. Procedural codes according to the Medicare Benefits Schedule (MBS) were applied. XGBoost, decision tree, Bidirectional Encoder Representations from Transformers (BERT) and logistic regression models were developed for classification experiments. Experiments involved both multi-label and binary classification, and the best performing model was used on the holdout test dataset.

RESULTS

There were 1,000 operation notes included in the study. Following manual review, the five most common procedures were cataract surgery (374 cases), vitrectomy (298 cases), laser therapy (149 cases), trabeculectomy (56 cases), and intravitreal injections (49 cases). Across the entire dataset, current coding was correct in 53.9% of cases. The BERT model had the highest classification accuracy (88.0%) in the multi-label classification on these five procedures. The total reimbursement achieved by the machine learning algorithm was $184,689.45 ($923.45 per case) compared with the gold standard of $214,527.50 ($1,072.64 per case).

CONCLUSION

Our study demonstrates accurate classification of ophthalmic operation notes into MBS coding categories with NLP technology. Combining human and machine-led approaches involves using NLP to screen operation notes to code procedures, with human review for further scrutiny. This technology can allow the assignment of correct MBS codes with greater accuracy. Further research and application in this area can facilitate accurate logging of unit activity, leading to reimbursements for healthcare providers. Increased accuracy of procedural coding can play an important role in training and education, study of disease epidemiology and improve research ways to optimise patient outcomes.

Collapse

Natural Language Processing Applications for Computer-Aided Diagnosis in Oncology. Diagnostics (Basel) 2023;13:diagnostics13020286. [PMID: 36673096 PMCID: PMC9857980 DOI: 10.3390/diagnostics13020286] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 12/24/2022] [Accepted: 01/05/2023] [Indexed: 01/15/2023] Open

Gupta AK, Kasthurirathne SN, Xu H, Li X, Ruppert MM, Harle CA, Grannis SJ. A framework for a consistent and reproducible evaluation of manual review for patient matching algorithms. J Am Med Inform Assoc 2022;29:2105-2109. [DOI: 10.1093/jamia/ocac175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 09/05/2022] [Accepted: 10/17/2022] [Indexed: 11/13/2022] Open

Choi S, Joo HJ, Kim Y, Kim JH, Seok J. Conversion of Automated 12-Lead Electrocardiogram Interpretations to OMOP CDM Vocabulary. Appl Clin Inform 2022;13:880-890. [PMID: 36130711 PMCID: PMC9492322 DOI: 10.1055/s-0042-1756427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open

Abstract

Background A computerized 12-lead electrocardiogram (ECG) can automatically generate diagnostic statements, which are helpful for clinical purposes. Standardization is required for big data analysis when using ECG data generated by different interpretation algorithms. The common data model (CDM) is a standard schema designed to overcome heterogeneity between medical data. Diagnostic statements usually contain multiple CDM concepts and also include non-essential noise information, which should be removed during CDM conversion. Existing CDM conversion tools have several limitations, such as the requirement for manual validation, inability to extract multiple CDM concepts, and inadequate noise removal.

Objectives We aim to develop a fully automated text data conversion algorithm that overcomes limitations of existing tools and manual conversion.

Methods We used interpretations printed by 12-lead resting ECG tests from three different vendors: GE Medical Systems, Philips Medical Systems, and Nihon Kohden. For automatic mapping, we first constructed an ontology-lexicon of ECG interpretations. After clinical coding, an optimized tool for converting ECG interpretation to CDM terminology is developed using term-based text processing.

Results Using the ontology-lexicon, the cosine similarity-based algorithm and rule-based hierarchical algorithm showed comparable conversion accuracy (97.8 and 99.6%, respectively), while an integrated algorithm based on a heuristic approach, ECG2CDM, demonstrated superior performance (99.9%) for datasets from three major vendors.

Conclusion We developed a user-friendly software that runs the ECG2CDM algorithm that is easy to use even if the user is not familiar with CDM or medical terminology. We propose that automated algorithms can be helpful for further big data analysis with an integrated and standardized ECG dataset.

Collapse

Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022;6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open

Byrd C, Ajawara U, Laundry R, Radin J, Bhandari P, Leung A, Han S, Asch SM, Zeliadt S, Harris AHS, Backhus L. Performance of a rule-based semi-automated method to optimize chart abstraction for surveillance imaging among patients treated for non-small cell lung cancer. BMC Med Inform Decis Mak 2022;22:148. [PMID: 35659230 PMCID: PMC9166440 DOI: 10.1186/s12911-022-01863-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 03/30/2022] [Indexed: 11/10/2022] Open

Abstract Abstract Background We aim to develop and test performance of a semi-automated method (computerized query combined with manual review) for chart abstraction in the identification and characterization of surveillance radiology imaging for post-treatment non-small cell lung cancer patients. Methods A gold standard dataset consisting of 3011 radiology reports from 361 lung cancer patients treated at the Veterans Health Administration from 2008 to 2016 was manually created by an abstractor coding image type, image indication, and image findings. Computerized queries using a text search tool were performed to code reports. The primary endpoint of query performance was evaluated by sensitivity, positive predictive value (PPV), and F1 score. The secondary endpoint of efficiency compared semi-automated abstraction time to manual abstraction time using a separate dataset and the Wilcoxon rank-sum test. Results Query for image type demonstrated the highest sensitivity of 85%, PPV 95%, and F1 score 0.90. Query for image indication demonstrated sensitivity 72%, PPV 70%, and F1 score 0.71. The image findings queries ranged from sensitivity 75–85%, PPV 23–25%, and F1 score 0.36–0.37. Semi-automated abstraction with our best performing query (image type) improved abstraction times by 68% per patient compared to manual abstraction alone (from median 21.5 min (interquartile range 16.0) to 6.9 min (interquartile range 9.5), p < 0.005). Conclusions Semi-automated abstraction using the best performing query of image type improved abstraction efficiency while preserving data accuracy. The computerized query acts as a pre-processing tool for manual abstraction by restricting effort to relevant images. Determining image indication and findings requires the addition of manual review for a semi-automatic abstraction approach in order to ensure data accuracy. Collapse

Automating Access to Real World Evidence. JTO Clin Res Rep 2022;3:100340. [PMID: 35719866 PMCID: PMC9201015 DOI: 10.1016/j.jtocrr.2022.100340] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 05/04/2022] [Accepted: 05/09/2022] [Indexed: 11/21/2022] Open

Abstract

Introduction

Real-world evidence is important in regulatory and funding decisions. Manual data extraction from electronic health records (EHRs) is time-consuming and challenging to maintain. Automated extraction using natural language processing (NLP) and artificial intelligence may facilitate this process. Whereas NLP offers a faster solution than manual methods of extraction, the validity of extracted data remains in question. The current study compared manual and automated data extraction from the EHR of patients with advanced lung cancer.

Methods

Previously, we extracted EHRs from 1209 patients diagnosed with advanced lung cancer (stage IIIB or IV) between January 2015 and December 2017 at Princess Margaret Cancer Centre (Toronto, Canada) using the commercially available artificial intelligence engine, DARWEN (Pentavere, Ontario, Canada). For comparison, 100 of 333 patients that received systemic therapy were randomly selected and clinical data manually extracted by two trained abstractors using the same accepted gold standard feature definitions, including patient, disease characteristics, and treatment data. All cases were re-reviewed by an expert adjudicator. Accuracy and concordance between automated and manual methods are reported.

Results

Automated extraction required considerably less time (<1 day) than manual extraction (∼225 person-hr). The collection of demographic data (age, sex, diagnosis) was highly accurate and concordant with both methods (96%–100%). Accuracy (for either extraction approach) and concordance were lower for unstructured data elements in EHR, such as performance status, date of diagnosis, and smoking status (NLP accuracy: 88%–94%; Manual accuracy: 78%–94%; concordance: 71%–82%). Concurrent medications (86%–100%) and comorbid conditions (96%–100%), were reported with high accuracy and concordance. Treatment details were also accurately captured with both methods (84%–100%) and highly concordant (83%–99%). Detection of whether biomarker testing was performed was highly accurate and concordant (96%–98%), although detection of biomarker test results was more variable (accuracy 84%–100%, concordance 84%–99%). Features with syntactic or semantic variation requiring clinical interpretation were extracted with slightly lower accuracy by both NLP and manual review. For example, metastatic sites were more accurately identified through NLP extraction (NLP: 88%–99%; manual: 71%–100%; concordance: 70%–99%) with the exception of lung and lymph node metastases (NLP: 66%–71%; manual: 87%–92%; concordance: 58%) owing to analogous terms used in radiology reports not being included in the accepted gold standard definition.

Conclusions

Automated data abstraction from EHR is highly accurate and faster than manual abstraction. Key challenges include poorly structured EHR and the use of analogous terms beyond the accepted gold standard definition. The application of NLP can facilitate real-world evidence studies at a greater scale than could be achieved with manual data extraction

Collapse

Kersloot MG, van Putten FJP, Abu-Hanna A, Cornet R, Arts DL. Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies. J Biomed Semantics 2020;11:14. [PMID: 33198814 PMCID: PMC7670625 DOI: 10.1186/s13326-020-00231-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 11/03/2020] [Indexed: 12/23/2022] Open

Abstract

BACKGROUND

Free-text descriptions in electronic health records (EHRs) can be of interest for clinical research and care optimization. However, free text cannot be readily interpreted by a computer and, therefore, has limited value. Natural Language Processing (NLP) algorithms can make free text machine-interpretable by attaching ontology concepts to it. However, implementations of NLP algorithms are not evaluated consistently. Therefore, the objective of this study was to review the current methods used for developing and evaluating NLP algorithms that map clinical text fragments onto ontology concepts. To standardize the evaluation of algorithms and reduce heterogeneity between studies, we propose a list of recommendations.

METHODS

Two reviewers examined publications indexed by Scopus, IEEE, MEDLINE, EMBASE, the ACM Digital Library, and the ACL Anthology. Publications reporting on NLP for mapping clinical text from EHRs to ontology concepts were included. Year, country, setting, objective, evaluation and validation methods, NLP algorithms, terminology systems, dataset size and language, performance measures, reference standard, generalizability, operational use, and source code availability were extracted. The studies' objectives were categorized by way of induction. These results were used to define recommendations.

RESULTS

Two thousand three hundred fifty five unique studies were identified. Two hundred fifty six studies reported on the development of NLP algorithms for mapping free text to ontology concepts. Seventy-seven described development and evaluation. Twenty-two studies did not perform a validation on unseen data and 68 studies did not perform external validation. Of 23 studies that claimed that their algorithm was generalizable, 5 tested this by external validation. A list of sixteen recommendations regarding the usage of NLP systems and algorithms, usage of data, evaluation and validation, presentation of results, and generalizability of results was developed.

CONCLUSION

We found many heterogeneous approaches to the reporting on the development and evaluation of NLP algorithms that map clinical text to ontology concepts. Over one-fourth of the identified publications did not perform an evaluation. In addition, over one-fourth of the included studies did not perform a validation, and 88% did not perform external validation. We believe that our recommendations, alongside an existing reporting standard, will increase the reproducibility and reusability of future studies and NLP algorithms in medicine.

Collapse

Senders JT, Karhade AV, Cote DJ, Mehrtash A, Lamba N, DiRisio A, Muskens IS, Gormley WB, Smith TR, Broekman MLD, Arnaout O. Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports. JCO Clin Cancer Inform 2020;3:1-9. [PMID: 31002562 DOI: 10.1200/cci.18.00138] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open

Abstract

PURPOSE

Although the bulk of patient-generated health data are increasing exponentially, their use is impeded because most data come in unstructured format, namely as free-text clinical reports. A variety of natural language processing (NLP) methods have emerged to automate the processing of free text ranging from statistical to deep learning-based models; however, the optimal approach for medical text analysis remains to be determined. The aim of this study was to provide a head-to-head comparison of novel NLP techniques and inform future studies about their utility for automated medical text analysis.

PATIENTS AND METHODS

Magnetic resonance imaging reports of patients with brain metastases treated in two tertiary centers were retrieved and manually annotated using a binary classification (single metastasis v two or more metastases). Multiple bag-of-words and sequence-based NLP models were developed and compared after randomly splitting the annotated reports into training and test sets in an 80:20 ratio.

RESULTS

A total of 1,479 radiology reports of patients diagnosed with brain metastases were retrieved. The least absolute shrinkage and selection operator (LASSO) regression model demonstrated the best overall performance on the hold-out test set with an area under the receiver operating characteristic curve of 0.92 (95% CI, 0.89 to 0.94), accuracy of 83% (95% CI, 80% to 87%), calibration intercept of -0.06 (95% CI, -0.14 to 0.01), and calibration slope of 1.06 (95% CI, 0.95 to 1.17).

CONCLUSION

Among various NLP techniques, the bag-of-words approach combined with a LASSO regression model demonstrated the best overall performance in extracting binary outcomes from free-text clinical reports. This study provides a framework for the development of machine learning-based NLP models as well as a clinical vignette of patients diagnosed with brain metastases.

Collapse

Odisho AY, Bridge M, Webb M, Ameli N, Eapen RS, Stauf F, Cowan JE, Washington SL, Herlemann A, Carroll PR, Cooperberg MR. Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research. JCO Clin Cancer Inform 2019;3:1-8. [PMID: 31314550 PMCID: PMC6874052 DOI: 10.1200/cci.18.00084] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/17/2019] [Indexed: 01/19/2023] Open

Kang SK, Garry K, Chung R, Moore WH, Iturrate E, Swartz JL, Kim DC, Horwitz LI, Blecker S. Natural Language Processing for Identification of Incidental Pulmonary Nodules in Radiology Reports. J Am Coll Radiol 2019;16:1587-1594. [PMID: 31132331 DOI: 10.1016/j.jacr.2019.04.026] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 04/29/2019] [Indexed: 12/26/2022]

Goulart BHL, Silgard ET, Baik CS, Bansal A, Sun Q, Durbin EB, Hands I, Shah D, Arnold SM, Ramsey SD, Kavuluru R, Schwartz SM. Validity of Natural Language Processing for Ascertainment of EGFR and ALK Test Results in SEER Cases of Stage IV Non-Small-Cell Lung Cancer. JCO Clin Cancer Inform 2019;3:1-15. [PMID: 31058542 PMCID: PMC6874053 DOI: 10.1200/cci.18.00098] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/29/2019] [Indexed: 01/03/2023] Open

Abstract

PURPOSE

SEER registries do not report results of epidermal growth factor receptor (EGFR) and anaplastic lymphoma kinase (ALK) mutation tests. To facilitate population-based research in molecularly defined subgroups of non-small-cell lung cancer (NSCLC), we assessed the validity of natural language processing (NLP) for the ascertainment of EGFR and ALK testing from electronic pathology (e-path) reports of NSCLC cases included in two SEER registries: the Cancer Surveillance System (CSS) and the Kentucky Cancer Registry (KCR).

METHODS

We obtained 4,278 e-path reports from 1,634 patients who were diagnosed with stage IV nonsquamous NSCLC from September 1, 2011, to December 31, 2013, included in CSS. We used 855 CSS reports to train NLP systems for the ascertainment of EGFR and ALK test status (reported v not reported) and test results (positive v negative). We assessed sensitivity, specificity, and positive and negative predictive values in an internal validation sample of 3,423 CSS e-path reports and repeated the analysis in an external sample of 1,041 e-path reports from 565 KCR patients. Two oncologists manually reviewed all e-path reports to generate gold-standard data sets.

RESULTS

NLP systems yielded internal validity metrics that ranged from 0.95 to 1.00 for EGFR and ALK test status and results in CSS e-path reports. NLP showed high internal accuracy for the ascertainment of EGFR and ALK in CSS patients-F scores of 0.95 and 0.96, respectively. In the external validation analysis, NLP yielded metrics that ranged from 0.02 to 0.96 in KCR reports and F scores of 0.70 and 0.72, respectively, in KCR patients.

CONCLUSION

NLP is an internally valid method for the ascertainment of EGFR and ALK test information from e-path reports available in SEER registries, but future work is necessary to increase NLP external validity.

Collapse