1
|
Wang H, Wu Y, Sun M, Cui X. Enhancing diagnosis of benign lesions and lung cancer through ensemble text and breath analysis: a retrospective cohort study. Sci Rep 2024; 14:8731. [PMID: 38627587 PMCID: PMC11021445 DOI: 10.1038/s41598-024-59474-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 04/11/2024] [Indexed: 04/19/2024] Open
Abstract
Early diagnosis of lung cancer (LC) can significantly reduce its mortality rate. Considering the limitations of the high false positive rate and reliance on radiologists' experience in computed tomography (CT)-based diagnosis, a multi-modal early LC screening model that combines radiology with other non-invasive, rapid detection methods is warranted. A high-resolution, multi-modal, and low-differentiation LC screening strategy named ensemble text and breath analysis (ETBA) is proposed that ensembles radiology report text analysis and breath analysis. In total, 231 samples (140 LC patients and 91 benign lesions [BL] patients) were screened using proton transfer reaction-time of flight-mass spectrometry and CT screening. Participants were randomly assigned to a training set and a validation set (4:1) with stratification. The report section of the radiology reports was used to train a text analysis (TA) model with a natural language processing algorithm. Twenty-two volatile organic compounds (VOCs) in the exhaled breath and the prediction results of the TA model were used as predictors to develop the ETBA model using an extreme gradient boosting algorithm. A breath analysis model was developed based on the 22 VOCs. The BA and TA models were compared with the ETBA model. The ETBA model achieved a sensitivity of 94.3%, a specificity of 77.3%, and an accuracy of 87.7% with the validation set. The radiologist diagnosis performance with the validation set had a sensitivity of 74.3%, a specificity of 59.1%, and an accuracy of 68.1%. High sensitivity and specificity were obtained by the ETBA model compared with radiologist diagnosis. The ETBA model has the potential to provide sensitivity and specificity in CT screening of LC. This approach is rapid, non-invasive, multi-dimensional, and accurate for LC and BL diagnosis.
Collapse
Affiliation(s)
- Hao Wang
- Institute of Biomedical Engineering, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China
| | - Yinghua Wu
- Institute of Biomedical Engineering, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China
| | - Meixiu Sun
- Institute of Biomedical Engineering, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China.
- Engineering Research Center of Pulmonary and Critical Care Medicine Technology and Device Ministry of Education, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China.
| | - Xiaonan Cui
- Department of Radiology, Key Laboratory of Cancer Prevention and Therapy, National Clinical Research Centre of Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin, 300060, China.
| |
Collapse
|
2
|
Lee YM, Bacchi S, Macri C, Tan Y, Casson RJ, Chan WO. Ophthalmology Operation Note Encoding with Open-Source Machine Learning and Natural Language Processing. Ophthalmic Res 2023; 66:928-939. [PMID: 37231984 PMCID: PMC10308528 DOI: 10.1159/000530954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 04/24/2023] [Indexed: 05/27/2023]
Abstract
INTRODUCTION Accurate assignment of procedural codes has important medico-legal, academic, and economic purposes for healthcare providers. Procedural coding requires accurate documentation and exhaustive manual labour to interpret complex operation notes. Ophthalmology operation notes are highly specialised making the process time-consuming and challenging to implement. This study aimed to develop natural language processing (NLP) models trained by medical professionals to assign procedural codes based on the surgical report. The automation and accuracy of these models can reduce burden on healthcare providers and generate reimbursements that reflect the operation performed. METHODS A retrospective analysis of ophthalmological operation notes from two metropolitan hospitals over a 12-month period was conducted. Procedural codes according to the Medicare Benefits Schedule (MBS) were applied. XGBoost, decision tree, Bidirectional Encoder Representations from Transformers (BERT) and logistic regression models were developed for classification experiments. Experiments involved both multi-label and binary classification, and the best performing model was used on the holdout test dataset. RESULTS There were 1,000 operation notes included in the study. Following manual review, the five most common procedures were cataract surgery (374 cases), vitrectomy (298 cases), laser therapy (149 cases), trabeculectomy (56 cases), and intravitreal injections (49 cases). Across the entire dataset, current coding was correct in 53.9% of cases. The BERT model had the highest classification accuracy (88.0%) in the multi-label classification on these five procedures. The total reimbursement achieved by the machine learning algorithm was $184,689.45 ($923.45 per case) compared with the gold standard of $214,527.50 ($1,072.64 per case). CONCLUSION Our study demonstrates accurate classification of ophthalmic operation notes into MBS coding categories with NLP technology. Combining human and machine-led approaches involves using NLP to screen operation notes to code procedures, with human review for further scrutiny. This technology can allow the assignment of correct MBS codes with greater accuracy. Further research and application in this area can facilitate accurate logging of unit activity, leading to reimbursements for healthcare providers. Increased accuracy of procedural coding can play an important role in training and education, study of disease epidemiology and improve research ways to optimise patient outcomes.
Collapse
Affiliation(s)
- Yong Min Lee
- Royal Adelaide Hospital, Adelaide, SA, Australia
- Machine Learning Division, Ophthalmic Research Laboratory, University of Adelaide, Adelaide, SA, Australia
| | - Stephen Bacchi
- Royal Adelaide Hospital, Adelaide, SA, Australia
- Machine Learning Division, Ophthalmic Research Laboratory, University of Adelaide, Adelaide, SA, Australia
| | - Carmelo Macri
- Royal Adelaide Hospital, Adelaide, SA, Australia
- Machine Learning Division, Ophthalmic Research Laboratory, University of Adelaide, Adelaide, SA, Australia
| | - Yiran Tan
- Royal Adelaide Hospital, Adelaide, SA, Australia
- Machine Learning Division, Ophthalmic Research Laboratory, University of Adelaide, Adelaide, SA, Australia
| | - Robert J. Casson
- Royal Adelaide Hospital, Adelaide, SA, Australia
- Machine Learning Division, Ophthalmic Research Laboratory, University of Adelaide, Adelaide, SA, Australia
| | - Weng Onn Chan
- Royal Adelaide Hospital, Adelaide, SA, Australia
- Machine Learning Division, Ophthalmic Research Laboratory, University of Adelaide, Adelaide, SA, Australia
| |
Collapse
|
3
|
Natural Language Processing Applications for Computer-Aided Diagnosis in Oncology. Diagnostics (Basel) 2023; 13:diagnostics13020286. [PMID: 36673096 PMCID: PMC9857980 DOI: 10.3390/diagnostics13020286] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 12/24/2022] [Accepted: 01/05/2023] [Indexed: 01/15/2023] Open
Abstract
In the era of big data, text-based medical data, such as electronic health records (EHR) and electronic medical records (EMR), are growing rapidly. EHR and EMR are collected from patients to record their basic information, lab tests, vital signs, clinical notes, and reports. EHR and EMR contain the helpful information to assist oncologists in computer-aided diagnosis and decision making. However, it is time consuming for doctors to extract the valuable information they need and analyze the information from the EHR and EMR data. Recently, more and more research works have applied natural language processing (NLP) techniques, i.e., rule-based, machine learning-based, and deep learning-based techniques, on the EHR and EMR data for computer-aided diagnosis in oncology. The objective of this review is to narratively review the recent progress in the area of NLP applications for computer-aided diagnosis in oncology. Moreover, we intend to reduce the research gap between artificial intelligence (AI) experts and clinical specialists to design better NLP applications. We originally identified 295 articles from the three electronic databases: PubMed, Google Scholar, and ACL Anthology; then, we removed the duplicated papers and manually screened the irrelevant papers based on the content of the abstract; finally, we included a total of 23 articles after the screening process of the literature review. Furthermore, we provided an in-depth analysis and categorized these studies into seven cancer types: breast cancer, lung cancer, liver cancer, prostate cancer, pancreatic cancer, colorectal cancer, and brain tumors. Additionally, we identified the current limitations of NLP applications on supporting the clinical practices and we suggest some promising future research directions in this paper.
Collapse
|
4
|
Gupta AK, Kasthurirathne SN, Xu H, Li X, Ruppert MM, Harle CA, Grannis SJ. A framework for a consistent and reproducible evaluation of manual review for patient matching algorithms. J Am Med Inform Assoc 2022; 29:2105-2109. [DOI: 10.1093/jamia/ocac175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 09/05/2022] [Accepted: 10/17/2022] [Indexed: 11/13/2022] Open
Abstract
Abstract
Healthcare systems are hampered by incomplete and fragmented patient health records. Record linkage is widely accepted as a solution to improve the quality and completeness of patient records. However, there does not exist a systematic approach for manually reviewing patient records to create gold standard record linkage data sets. We propose a robust framework for creating and evaluating manually reviewed gold standard data sets for measuring the performance of patient matching algorithms. Our 8-point approach covers data preprocessing, blocking, record adjudication, linkage evaluation, and reviewer characteristics. This framework can help record linkage method developers provide necessary transparency when creating and validating gold standard reference matching data sets. In turn, this transparency will support both the internal and external validity of recording linkage studies and improve the robustness of new record linkage strategies.
Collapse
Affiliation(s)
| | - Suranga N Kasthurirathne
- Center for Biomedical Informatics, Regenstrief Institute , Indianapolis, Indiana, USA
- Department of Family Medicine, Indiana University School of Medicine , Indianapolis, Indiana, USA
- Black Dog Institute, University of New South Wales , Sydney, New South Wales, Australia
| | - Huiping Xu
- Department of Biostatistics, Indiana University School of Medicine , Indianapolis, Indiana, USA
| | - Xiaochun Li
- Department of Biostatistics, Indiana University School of Medicine , Indianapolis, Indiana, USA
| | - Matthew M Ruppert
- Department of Medicine, University of Florida Health , Gainesville, Florida, USA
- Precision and Intelligent Systems in Medicine (PrismaP), University of Florida , Gainesville, Florida, USA
| | - Christopher A Harle
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida , Gainesville, Florida, USA
| | - Shaun J Grannis
- Center for Biomedical Informatics, Regenstrief Institute , Indianapolis, Indiana, USA
- Department of Family Medicine, Indiana University School of Medicine , Indianapolis, Indiana, USA
| |
Collapse
|
5
|
Choi S, Joo HJ, Kim Y, Kim JH, Seok J. Conversion of Automated 12-Lead Electrocardiogram Interpretations to OMOP CDM Vocabulary. Appl Clin Inform 2022; 13:880-890. [PMID: 36130711 PMCID: PMC9492322 DOI: 10.1055/s-0042-1756427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Background
A computerized 12-lead electrocardiogram (ECG) can automatically generate diagnostic statements, which are helpful for clinical purposes. Standardization is required for big data analysis when using ECG data generated by different interpretation algorithms. The common data model (CDM) is a standard schema designed to overcome heterogeneity between medical data. Diagnostic statements usually contain multiple CDM concepts and also include non-essential noise information, which should be removed during CDM conversion. Existing CDM conversion tools have several limitations, such as the requirement for manual validation, inability to extract multiple CDM concepts, and inadequate noise removal.
Objectives
We aim to develop a fully automated text data conversion algorithm that overcomes limitations of existing tools and manual conversion.
Methods
We used interpretations printed by 12-lead resting ECG tests from three different vendors: GE Medical Systems, Philips Medical Systems, and Nihon Kohden. For automatic mapping, we first constructed an ontology-lexicon of ECG interpretations. After clinical coding, an optimized tool for converting ECG interpretation to CDM terminology is developed using term-based text processing.
Results
Using the ontology-lexicon, the cosine similarity-based algorithm and rule-based hierarchical algorithm showed comparable conversion accuracy (97.8 and 99.6%, respectively), while an integrated algorithm based on a heuristic approach, ECG2CDM, demonstrated superior performance (99.9%) for datasets from three major vendors.
Conclusion
We developed a user-friendly software that runs the ECG2CDM algorithm that is easy to use even if the user is not familiar with CDM or medical terminology. We propose that automated algorithms can be helpful for further big data analysis with an integrated and standardized ECG dataset.
Collapse
Affiliation(s)
- Sunho Choi
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Hyung Joon Joo
- Korea University Research Institute for Medical Bigdata Science, Korea University, Seoul, South Korea.,Department of Cardiology, Cardiovascular Center, Korea University College of Medicine, Seoul, South Korea
| | - Yoojoong Kim
- School of Computer Science and Information Engineering, The Catholic University of Korea, Seoul, South Korea
| | - Jong-Ho Kim
- Korea University Research Institute for Medical Bigdata Science, Korea University, Seoul, South Korea.,Department of Cardiology, Cardiovascular Center, Korea University College of Medicine, Seoul, South Korea
| | - Junhee Seok
- School of Electrical Engineering, Korea University, Seoul, South Korea
| |
Collapse
|
6
|
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022; 6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open
Abstract
PURPOSE The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.
Collapse
Affiliation(s)
- Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Xiaoyang Ruan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sungrim Moon
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Michelle Mai
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Irbaz B. Riaz
- Department of Hematology/Oncology, Mayo Clinic, Scottsdale, AZ
| | - Nan Wang
- Department of Computer Science and Engineering, College of Science and Engineering, University of Minnesota, Minneapolis, MN
| | - Ping Yang
- Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX
| | - Jeremy L. Warner
- Departments of Medicine (Hematology/Oncology), Vanderbilt University, Nashville, TN
- Department Biomedical Informatics, Vanderbilt University, Nashville, TN
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| |
Collapse
|
7
|
Byrd C, Ajawara U, Laundry R, Radin J, Bhandari P, Leung A, Han S, Asch SM, Zeliadt S, Harris AHS, Backhus L. Performance of a rule-based semi-automated method to optimize chart abstraction for surveillance imaging among patients treated for non-small cell lung cancer. BMC Med Inform Decis Mak 2022; 22:148. [PMID: 35659230 PMCID: PMC9166440 DOI: 10.1186/s12911-022-01863-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 03/30/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
We aim to develop and test performance of a semi-automated method (computerized query combined with manual review) for chart abstraction in the identification and characterization of surveillance radiology imaging for post-treatment non-small cell lung cancer patients.
Methods
A gold standard dataset consisting of 3011 radiology reports from 361 lung cancer patients treated at the Veterans Health Administration from 2008 to 2016 was manually created by an abstractor coding image type, image indication, and image findings. Computerized queries using a text search tool were performed to code reports. The primary endpoint of query performance was evaluated by sensitivity, positive predictive value (PPV), and F1 score. The secondary endpoint of efficiency compared semi-automated abstraction time to manual abstraction time using a separate dataset and the Wilcoxon rank-sum test.
Results
Query for image type demonstrated the highest sensitivity of 85%, PPV 95%, and F1 score 0.90. Query for image indication demonstrated sensitivity 72%, PPV 70%, and F1 score 0.71. The image findings queries ranged from sensitivity 75–85%, PPV 23–25%, and F1 score 0.36–0.37. Semi-automated abstraction with our best performing query (image type) improved abstraction times by 68% per patient compared to manual abstraction alone (from median 21.5 min (interquartile range 16.0) to 6.9 min (interquartile range 9.5), p < 0.005).
Conclusions
Semi-automated abstraction using the best performing query of image type improved abstraction efficiency while preserving data accuracy. The computerized query acts as a pre-processing tool for manual abstraction by restricting effort to relevant images. Determining image indication and findings requires the addition of manual review for a semi-automatic abstraction approach in order to ensure data accuracy.
Collapse
|
8
|
Automating Access to Real World Evidence. JTO Clin Res Rep 2022; 3:100340. [PMID: 35719866 PMCID: PMC9201015 DOI: 10.1016/j.jtocrr.2022.100340] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 05/04/2022] [Accepted: 05/09/2022] [Indexed: 11/21/2022] Open
Abstract
Introduction Real-world evidence is important in regulatory and funding decisions. Manual data extraction from electronic health records (EHRs) is time-consuming and challenging to maintain. Automated extraction using natural language processing (NLP) and artificial intelligence may facilitate this process. Whereas NLP offers a faster solution than manual methods of extraction, the validity of extracted data remains in question. The current study compared manual and automated data extraction from the EHR of patients with advanced lung cancer. Methods Previously, we extracted EHRs from 1209 patients diagnosed with advanced lung cancer (stage IIIB or IV) between January 2015 and December 2017 at Princess Margaret Cancer Centre (Toronto, Canada) using the commercially available artificial intelligence engine, DARWEN (Pentavere, Ontario, Canada). For comparison, 100 of 333 patients that received systemic therapy were randomly selected and clinical data manually extracted by two trained abstractors using the same accepted gold standard feature definitions, including patient, disease characteristics, and treatment data. All cases were re-reviewed by an expert adjudicator. Accuracy and concordance between automated and manual methods are reported. Results Automated extraction required considerably less time (<1 day) than manual extraction (∼225 person-hr). The collection of demographic data (age, sex, diagnosis) was highly accurate and concordant with both methods (96%–100%). Accuracy (for either extraction approach) and concordance were lower for unstructured data elements in EHR, such as performance status, date of diagnosis, and smoking status (NLP accuracy: 88%–94%; Manual accuracy: 78%–94%; concordance: 71%–82%). Concurrent medications (86%–100%) and comorbid conditions (96%–100%), were reported with high accuracy and concordance. Treatment details were also accurately captured with both methods (84%–100%) and highly concordant (83%–99%). Detection of whether biomarker testing was performed was highly accurate and concordant (96%–98%), although detection of biomarker test results was more variable (accuracy 84%–100%, concordance 84%–99%). Features with syntactic or semantic variation requiring clinical interpretation were extracted with slightly lower accuracy by both NLP and manual review. For example, metastatic sites were more accurately identified through NLP extraction (NLP: 88%–99%; manual: 71%–100%; concordance: 70%–99%) with the exception of lung and lymph node metastases (NLP: 66%–71%; manual: 87%–92%; concordance: 58%) owing to analogous terms used in radiology reports not being included in the accepted gold standard definition. Conclusions Automated data abstraction from EHR is highly accurate and faster than manual abstraction. Key challenges include poorly structured EHR and the use of analogous terms beyond the accepted gold standard definition. The application of NLP can facilitate real-world evidence studies at a greater scale than could be achieved with manual data extraction
Collapse
|
9
|
Kersloot MG, van Putten FJP, Abu-Hanna A, Cornet R, Arts DL. Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies. J Biomed Semantics 2020; 11:14. [PMID: 33198814 PMCID: PMC7670625 DOI: 10.1186/s13326-020-00231-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 11/03/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Free-text descriptions in electronic health records (EHRs) can be of interest for clinical research and care optimization. However, free text cannot be readily interpreted by a computer and, therefore, has limited value. Natural Language Processing (NLP) algorithms can make free text machine-interpretable by attaching ontology concepts to it. However, implementations of NLP algorithms are not evaluated consistently. Therefore, the objective of this study was to review the current methods used for developing and evaluating NLP algorithms that map clinical text fragments onto ontology concepts. To standardize the evaluation of algorithms and reduce heterogeneity between studies, we propose a list of recommendations. METHODS Two reviewers examined publications indexed by Scopus, IEEE, MEDLINE, EMBASE, the ACM Digital Library, and the ACL Anthology. Publications reporting on NLP for mapping clinical text from EHRs to ontology concepts were included. Year, country, setting, objective, evaluation and validation methods, NLP algorithms, terminology systems, dataset size and language, performance measures, reference standard, generalizability, operational use, and source code availability were extracted. The studies' objectives were categorized by way of induction. These results were used to define recommendations. RESULTS Two thousand three hundred fifty five unique studies were identified. Two hundred fifty six studies reported on the development of NLP algorithms for mapping free text to ontology concepts. Seventy-seven described development and evaluation. Twenty-two studies did not perform a validation on unseen data and 68 studies did not perform external validation. Of 23 studies that claimed that their algorithm was generalizable, 5 tested this by external validation. A list of sixteen recommendations regarding the usage of NLP systems and algorithms, usage of data, evaluation and validation, presentation of results, and generalizability of results was developed. CONCLUSION We found many heterogeneous approaches to the reporting on the development and evaluation of NLP algorithms that map clinical text to ontology concepts. Over one-fourth of the identified publications did not perform an evaluation. In addition, over one-fourth of the included studies did not perform a validation, and 88% did not perform external validation. We believe that our recommendations, alongside an existing reporting standard, will increase the reproducibility and reusability of future studies and NLP algorithms in medicine.
Collapse
Affiliation(s)
- Martijn G. Kersloot
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
- Castor EDC, Amsterdam, The Netherlands
| | - Florentien J. P. van Putten
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - Ameen Abu-Hanna
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - Ronald Cornet
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - Derk L. Arts
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
- Castor EDC, Amsterdam, The Netherlands
| |
Collapse
|
10
|
Senders JT, Karhade AV, Cote DJ, Mehrtash A, Lamba N, DiRisio A, Muskens IS, Gormley WB, Smith TR, Broekman MLD, Arnaout O. Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports. JCO Clin Cancer Inform 2020; 3:1-9. [PMID: 31002562 DOI: 10.1200/cci.18.00138] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
PURPOSE Although the bulk of patient-generated health data are increasing exponentially, their use is impeded because most data come in unstructured format, namely as free-text clinical reports. A variety of natural language processing (NLP) methods have emerged to automate the processing of free text ranging from statistical to deep learning-based models; however, the optimal approach for medical text analysis remains to be determined. The aim of this study was to provide a head-to-head comparison of novel NLP techniques and inform future studies about their utility for automated medical text analysis. PATIENTS AND METHODS Magnetic resonance imaging reports of patients with brain metastases treated in two tertiary centers were retrieved and manually annotated using a binary classification (single metastasis v two or more metastases). Multiple bag-of-words and sequence-based NLP models were developed and compared after randomly splitting the annotated reports into training and test sets in an 80:20 ratio. RESULTS A total of 1,479 radiology reports of patients diagnosed with brain metastases were retrieved. The least absolute shrinkage and selection operator (LASSO) regression model demonstrated the best overall performance on the hold-out test set with an area under the receiver operating characteristic curve of 0.92 (95% CI, 0.89 to 0.94), accuracy of 83% (95% CI, 80% to 87%), calibration intercept of -0.06 (95% CI, -0.14 to 0.01), and calibration slope of 1.06 (95% CI, 0.95 to 1.17). CONCLUSION Among various NLP techniques, the bag-of-words approach combined with a LASSO regression model demonstrated the best overall performance in extracting binary outcomes from free-text clinical reports. This study provides a framework for the development of machine learning-based NLP models as well as a clinical vignette of patients diagnosed with brain metastases.
Collapse
Affiliation(s)
- Joeky T Senders
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Aditya V Karhade
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - David J Cote
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Alireza Mehrtash
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Nayan Lamba
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Aislyn DiRisio
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Ivo S Muskens
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | | | - Timothy R Smith
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | | | - Omar Arnaout
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| |
Collapse
|
11
|
Odisho AY, Bridge M, Webb M, Ameli N, Eapen RS, Stauf F, Cowan JE, Washington SL, Herlemann A, Carroll PR, Cooperberg MR. Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research. JCO Clin Cancer Inform 2019; 3:1-8. [PMID: 31314550 PMCID: PMC6874052 DOI: 10.1200/cci.18.00084] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/17/2019] [Indexed: 01/19/2023] Open
Abstract
PURPOSE Cancer pathology findings are critical for many aspects of care but are often locked away as unstructured free text. Our objective was to develop a natural language processing (NLP) system to extract prostate pathology details from postoperative pathology reports and a parallel structured data entry process for use by urologists during routine documentation care and compare accuracy when compared with manual abstraction and concordance between NLP and clinician-entered approaches. MATERIALS AND METHODS From February 2016, clinicians used note templates with custom structured data elements (SDEs) during routine clinical care for men with prostate cancer. We also developed an NLP algorithm to parse radical prostatectomy pathology reports and extract structured data. We compared accuracy of clinician-entered SDEs and NLP-parsed data to manual abstraction as a gold standard and compared concordance (Cohen's κ) between approaches assuming no gold standard. RESULTS There were 523 patients with NLP-extracted data, 319 with SDE data, and 555 with manually abstracted data. For Gleason scores, NLP and clinician SDE accuracy was 95.6% and 95.8%, respectively, compared with manual abstraction, with concordance of 0.93 (95% CI, 0.89 to 0.98). For margin status, extracapsular extension, and seminal vesicle invasion, stage, and lymph node status, NLP accuracy was 94.8% to 100%, SDE accuracy was 87.7% to 100%, and concordance between NLP and SDE ranged from 0.92 to 1.0. CONCLUSION We show that a real-world deployment of an NLP algorithm to extract pathology data and structured data entry by clinicians during routine clinical care in a busy clinical practice can generate accurate data when compared with manual abstraction for some, but not all, components of a prostate pathology report.
Collapse
Affiliation(s)
| | - Mark Bridge
- University of California, San Francisco, San Francisco, CA
| | - Mitchell Webb
- University of California, San Francisco Medical Center, San Francisco, CA
| | - Niloufar Ameli
- University of California, San Francisco, San Francisco, CA
| | - Renu S Eapen
- University of California, San Francisco, San Francisco, CA
| | - Frank Stauf
- University of California, San Francisco, San Francisco, CA
| | - Janet E Cowan
- University of California, San Francisco, San Francisco, CA
| | | | | | | | | |
Collapse
|
12
|
Kang SK, Garry K, Chung R, Moore WH, Iturrate E, Swartz JL, Kim DC, Horwitz LI, Blecker S. Natural Language Processing for Identification of Incidental Pulmonary Nodules in Radiology Reports. J Am Coll Radiol 2019; 16:1587-1594. [PMID: 31132331 DOI: 10.1016/j.jacr.2019.04.026] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 04/29/2019] [Indexed: 12/26/2022]
Abstract
PURPOSE To develop natural language processing (NLP) to identify incidental lung nodules (ILNs) in radiology reports for assessment of management recommendations. METHODS AND MATERIALS We searched the electronic health records for patients who underwent chest CT during 2014 and 2017, before and after implementation of a department-wide dictation macro of the Fleischner Society recommendations. We randomly selected 950 unstructured chest CT reports and reviewed manually for ILNs. An NLP tool was trained and validated against the manually reviewed set, for the task of automated detection of ILNs with exclusion of previously known or definitively benign nodules. For ILNs found in the training and validation sets, we assessed whether reported management recommendations agreed with Fleischner Society guidelines. The guideline concordance of management recommendations was compared between 2014 and 2017. RESULTS The NLP tool identified ILNs with sensitivity and specificity of 91.1% and 82.2%, respectively, in the validation set. Positive and negative predictive values were 59.7% and 97.0%. In reports of ILNs in the training and validation sets before versus after introduction of a Fleischner reporting macro, there was no difference in the proportion of reports with ILNs (108 of 500 [21.6%] versus 101 of 450 [22.4%]; P = .8), or in the proportion of reports with ILNs containing follow-up recommendations (75 of 108 [69.4%] versus 80 of 101 [79.2%]; P = .2]. Rates of recommendation guideline concordance were not significantly different before and after implementation of the standardized macro (52 of 75 [69.3%] versus 60 of 80 [75.0%]; P = .43). CONCLUSION NLP reliably automates identification of ILNs in unstructured reports, pertinent to quality improvement efforts for ILN management.
Collapse
Affiliation(s)
- Stella K Kang
- Department of Radiology, NYU Langone Health, New York, New York; Department of Population Health, NYU Langone Health, New York, New York.
| | - Kira Garry
- Department of Population Health, NYU Langone Health, New York, New York
| | - Ryan Chung
- Department of Radiology, NYU Langone Health, New York, New York
| | - William H Moore
- Department of Radiology, NYU Langone Health, New York, New York
| | | | - Jordan L Swartz
- Department of Emergency Medicine, NYU Langone Health, New York, New York
| | - Danny C Kim
- Department of Radiology, NYU Langone Health, New York, New York
| | - Leora I Horwitz
- Department of Population Health, NYU Langone Health, New York, New York; Department of Medicine, NYU Langone Health, New York, New York
| | - Saul Blecker
- Department of Population Health, NYU Langone Health, New York, New York; Department of Medicine, NYU Langone Health, New York, New York
| |
Collapse
|
13
|
Goulart BHL, Silgard ET, Baik CS, Bansal A, Sun Q, Durbin EB, Hands I, Shah D, Arnold SM, Ramsey SD, Kavuluru R, Schwartz SM. Validity of Natural Language Processing for Ascertainment of EGFR and ALK Test Results in SEER Cases of Stage IV Non-Small-Cell Lung Cancer. JCO Clin Cancer Inform 2019; 3:1-15. [PMID: 31058542 PMCID: PMC6874053 DOI: 10.1200/cci.18.00098] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/29/2019] [Indexed: 01/03/2023] Open
Abstract
PURPOSE SEER registries do not report results of epidermal growth factor receptor (EGFR) and anaplastic lymphoma kinase (ALK) mutation tests. To facilitate population-based research in molecularly defined subgroups of non-small-cell lung cancer (NSCLC), we assessed the validity of natural language processing (NLP) for the ascertainment of EGFR and ALK testing from electronic pathology (e-path) reports of NSCLC cases included in two SEER registries: the Cancer Surveillance System (CSS) and the Kentucky Cancer Registry (KCR). METHODS We obtained 4,278 e-path reports from 1,634 patients who were diagnosed with stage IV nonsquamous NSCLC from September 1, 2011, to December 31, 2013, included in CSS. We used 855 CSS reports to train NLP systems for the ascertainment of EGFR and ALK test status (reported v not reported) and test results (positive v negative). We assessed sensitivity, specificity, and positive and negative predictive values in an internal validation sample of 3,423 CSS e-path reports and repeated the analysis in an external sample of 1,041 e-path reports from 565 KCR patients. Two oncologists manually reviewed all e-path reports to generate gold-standard data sets. RESULTS NLP systems yielded internal validity metrics that ranged from 0.95 to 1.00 for EGFR and ALK test status and results in CSS e-path reports. NLP showed high internal accuracy for the ascertainment of EGFR and ALK in CSS patients-F scores of 0.95 and 0.96, respectively. In the external validation analysis, NLP yielded metrics that ranged from 0.02 to 0.96 in KCR reports and F scores of 0.70 and 0.72, respectively, in KCR patients. CONCLUSION NLP is an internally valid method for the ascertainment of EGFR and ALK test information from e-path reports available in SEER registries, but future work is necessary to increase NLP external validity.
Collapse
Affiliation(s)
| | | | - Christina S. Baik
- Fred Hutchinson Cancer Research Center, Seattle, WA
- University of Washington, Seattle, WA
| | | | - Qin Sun
- Fred Hutchinson Cancer Research Center, Seattle, WA
| | | | | | | | | | - Scott D. Ramsey
- Fred Hutchinson Cancer Research Center, Seattle, WA
- University of Washington, Seattle, WA
| | | | - Stephen M. Schwartz
- Fred Hutchinson Cancer Research Center, Seattle, WA
- University of Washington, Seattle, WA
| |
Collapse
|