1
|
Levy J, Vattikonda N, Haudenschild C, Christensen B, Vaickus L. Comparison of Machine-Learning Algorithms for the Prediction of Current Procedural Terminology (CPT) Codes from Pathology Reports. J Pathol Inform 2022; 13:3. [PMID: 35127232 PMCID: PMC8802304 DOI: 10.4103/jpi.jpi_52_21] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 11/20/2021] [Accepted: 11/30/2021] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Pathology reports serve as an auditable trial of a patient's clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to uncover textual patterns that inform clinical endpoints and biomarker information. Although deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine-learning methods in extracting key insights for the prediction of medical procedure information, which is used to inform reimbursement for pathology departments. In addition, the utility of combining and ranking information from multiple report subfields as compared with exclusively using the diagnostic field for the prediction of Current Procedural Terminology (CPT) codes and signing pathologists remains unclear. METHODS After preprocessing pathology reports, we utilized advanced topic modeling to identify topics that characterize a cohort of 93,039 pathology reports at the Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM). We separately compared XGBoost, SVM, and BERT (Bidirectional Encoder Representation from Transformers) methodologies for the prediction of primary CPT codes (CPT 88302, 88304, 88305, 88307, 88309) as well as 38 ancillary CPT codes, using both the diagnostic text alone and text from all subfields. We performed similar analyses for characterizing text from a group of the 20 pathologists with the most pathology report sign-outs. Finally, we uncovered important report subcomponents by using model explanation techniques. RESULTS We identified 20 topics that pertained to diagnostic and procedural information. Operating on diagnostic text alone, BERT outperformed XGBoost for the prediction of primary CPT codes. When utilizing all report subfields, XGBoost outperformed BERT for the prediction of primary CPT codes. Utilizing additional subfields of the pathology report increased prediction accuracy across ancillary CPT codes, and performance gains for using additional report subfields were high for the XGBoost model for primary CPT codes. Misclassifications of CPT codes were between codes of a similar complexity, and misclassifications between pathologists were subspecialty related. CONCLUSIONS Our approach generated CPT code predictions with an accuracy that was higher than previously reported. Although diagnostic text is an important source of information, additional insights may be extracted from other report subfields. Although BERT approaches performed comparably to the XGBoost approaches, they may lend valuable information to pipelines that combine image, text, and -omics information. Future resource-saving opportunities exist to help hospitals detect mis-billing, standardize report text, and estimate productivity metrics that pertain to pathologist compensation (RVUs).
Collapse
Affiliation(s)
- Joshua Levy
- Emerging Diagnostic and Investigative Technologies, Clinical Genomics and Advanced Technologies, Department of Pathology and Laboratory Medicine, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA,Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA,Program in Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA,Corresponding author at: Emerging Diagnostic and Investigative Technologies, Clinical Genomics and Advanced Technologies, Department of Pathology and Laboratory Medicine, Dartmouth Hitchcock Medical Center, 1 Medical Center Drive, Borwell Building 4th Floor, Lebanon NH 03766, USA.
| | - Nishitha Vattikonda
- Thomas Jefferson High School for Science and Technology, Alexandria, VA, USA
| | | | - Brock Christensen
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA,Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA,Department of Community and Family Medicine, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Louis Vaickus
- Emerging Diagnostic and Investigative Technologies, Clinical Genomics and Advanced Technologies, Department of Pathology and Laboratory Medicine, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA
| |
Collapse
|
2
|
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022; 6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open
Abstract
PURPOSE The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.
Collapse
Affiliation(s)
- Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Xiaoyang Ruan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sungrim Moon
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Michelle Mai
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Irbaz B. Riaz
- Department of Hematology/Oncology, Mayo Clinic, Scottsdale, AZ
| | - Nan Wang
- Department of Computer Science and Engineering, College of Science and Engineering, University of Minnesota, Minneapolis, MN
| | - Ping Yang
- Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX
| | - Jeremy L. Warner
- Departments of Medicine (Hematology/Oncology), Vanderbilt University, Nashville, TN
- Department Biomedical Informatics, Vanderbilt University, Nashville, TN
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| |
Collapse
|
3
|
Bitterman DS, Miller TA, Mak RH, Savova GK. Clinical Natural Language Processing for Radiation Oncology: A Review and Practical Primer. Int J Radiat Oncol Biol Phys 2021; 110:641-655. [PMID: 33545300 DOI: 10.1016/j.ijrobp.2021.01.044] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 12/22/2020] [Accepted: 01/23/2021] [Indexed: 02/07/2023]
Abstract
Natural language processing (NLP), which aims to convert human language into expressions that can be analyzed by computers, is one of the most rapidly developing and widely used technologies in the field of artificial intelligence. Natural language processing algorithms convert unstructured free text data into structured data that can be extracted and analyzed at scale. In medicine, this unlocking of the rich, expressive data within clinical free text in electronic medical records will help untap the full potential of big data for research and clinical purposes. Recent major NLP algorithmic advances have significantly improved the performance of these algorithms, leading to a surge in academic and industry interest in developing tools to automate information extraction and phenotyping from clinical texts. Thus, these technologies are poised to transform medical research and alter clinical practices in the future. Radiation oncology stands to benefit from NLP algorithms if they are appropriately developed and deployed, as they may enable advances such as automated inclusion of radiation therapy details into cancer registries, discovery of novel insights about cancer care, and improved patient data curation and presentation at the point of care. However, challenges remain before the full value of NLP is realized, such as the plethora of jargon specific to radiation oncology, nonstandard nomenclature, a lack of publicly available labeled data for model development, and interoperability limitations between radiation oncology data silos. Successful development and implementation of high quality and high value NLP models for radiation oncology will require close collaboration between computer scientists and the radiation oncology community. Here, we present a primer on artificial intelligence algorithms in general and NLP algorithms in particular; provide guidance on how to assess the performance of such algorithms; review prior research on NLP algorithms for oncology; and describe future avenues for NLP in radiation oncology research and clinics.
Collapse
Affiliation(s)
- Danielle S Bitterman
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, Massachusetts; Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts; Artificial Intelligence in Medicine Program, Brigham and Women's Hospital, Boston, Massachusetts.
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
| | - Raymond H Mak
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, Massachusetts; Artificial Intelligence in Medicine Program, Brigham and Women's Hospital, Boston, Massachusetts
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
| |
Collapse
|
4
|
Mowery DL, Kawamoto K, Bradshaw R, Kohlmann W, Schiffman JD, Weir C, Borbolla D, Chapman WW, Del Fiol G. Determining Onset for Familial Breast and Colorectal Cancer from Family History Comments in the Electronic Health Record. AMIA Jt Summits Transl Sci Proc 2019; 2019:173-181. [PMID: 31258969 PMCID: PMC6568127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Background. Family health history (FHH) can be used to identify individuals at elevated risk for familial cancers. Risk criteria for common cancers rely on age of onset, which is documented inconsistently as structured and unstructured data in electronic health records (EHRs). Objective. To investigate a natural language processing (NLP) approach to extract age of onset and age of death from free-text EHR fields. Methods. Using 474,651 FHH entries from 89,814 patients, we investigated two methods - frequent patterns (baseline) and NLP classifier. Results. For age of onset, the NLP classifier outperformed the baseline in precision (96% vs. 83%; 95% CI [94, 97] and [80, 86]) with equivalent recall (both 93%; 95% CI [91, 95]). When applied to the full dataset, the NLP approach increased the percentage of FHH entries for which cancer risk criteria could be applied from 10% to 15%. Conclusion. NLP combined with structured data may improve the computation of familial cancer risk criteria for various use cases.
Collapse
Affiliation(s)
- Danielle L Mowery
- Biomedical Informatics
- Informatics, Decision-Enhancement, and Analytic Sciences (IDEAS) Center, Veterans Affairs Salt Lake City Health Care System, Salt Lake City, UT
- Biostatistics, Epidemiology, & Informatics
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA
| | | | | | | | | | | | | | - Wendy W Chapman
- Biomedical Informatics
- Informatics, Decision-Enhancement, and Analytic Sciences (IDEAS) Center, Veterans Affairs Salt Lake City Health Care System, Salt Lake City, UT
| | | |
Collapse
|
5
|
Si Y, Roberts K. A Frame-Based NLP System for Cancer-Related Information Extraction. AMIA Annu Symp Proc 2018; 2018:1524-1533. [PMID: 30815198 PMCID: PMC6371330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We propose a frame-based natural language processing (NLP) method that extracts cancer-related information from clinical narratives. We focus on three frames: cancer diagnosis, cancer therapeutic procedure, and tumor description. We utilize a deep learning-based approach, bidirectional Long Short-term Memory (LSTM) Conditional Random Field (CRF), which uses both character and word embeddings. The system consists of two constituent sequence classifiers: a frame identification (lexical unit) classifier and a frame element classifier. The classifier achieves an F1 of 93.70 for cancer diagnosis, 96.33 for therapeutic procedure, and 87.18 for tumor description. These represent improvements of 10.72, 0.85, and 8.04 over a baseline heuristic, respectively. Additionally, we demonstrate that the combination of both GloVe and MIMIC-III embeddings has the best representational effect. Overall, this study demonstrates the effectiveness of deep learning methods to extract frame semantic information from clinical narratives.
Collapse
Affiliation(s)
- Yuqi Si
- School of Biomedical Informatics The University of Texas Health Science Center at Houston Houston, TX, USA
| | - Kirk Roberts
- School of Biomedical Informatics The University of Texas Health Science Center at Houston Houston, TX, USA
| |
Collapse
|
6
|
Chapman AB, Mowery DL, Swords DS, Chapman WW, Bucher BT. Detecting Evidence of Intra-abdominal Surgical Site Infections from Radiology Reports Using Natural Language Processing. AMIA Annu Symp Proc 2018; 2017:515-524. [PMID: 29854116 PMCID: PMC5977582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Free-text reports in electronic health records (EHRs) contain medically significant information - signs, symptoms, findings, diagnoses - recorded by clinicians during patient encounters. These reports contain rich clinical information which can be leveraged for surveillance of disease and occurrence of adverse events. In order to gain meaningful knowledge from these text reports to support surveillance efforts, information must first be converted into a structured, computable format. Traditional methods rely on manual review of charts, which can be costly and inefficient. Natural language processing (NLP) methods offer an efficient, alternative approach to extracting the information and can achieve a similar level of accuracy. We developed an NLP system to automatically identify mentions of surgical site infections in radiology reports and classify reports containing evidence of surgical site infections leveraging these mentions. We evaluated our system using a reference standard of reports annotated by domain experts, administrative data generated for each patient encounter, and a machine learning-based approach.
Collapse
Affiliation(s)
- Alec B Chapman
- Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT
| | - Danielle L Mowery
- Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT
- IDEAS Center, George E. Wahlen Veterans Affairs Medical Center, Salt Lake City, UT
| | - Douglas S Swords
- Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT
| | - Wendy W Chapman
- Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT
- IDEAS Center, George E. Wahlen Veterans Affairs Medical Center, Salt Lake City, UT
| | - Brian T Bucher
- Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT
- Pediatric Surgery, University of Utah School of Medicine, Salt Lake City, UT
| |
Collapse
|
7
|
Conway M, Khojoyan A, Fana F, Scuba W, Castine M, Mowery D, Chapman W, Jupp S. Developing a web-based SKOS editor. J Biomed Semantics 2016; 7:5. [PMID: 27047653 PMCID: PMC4819276 DOI: 10.1186/s13326-015-0043-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 12/21/2015] [Indexed: 12/03/2022] Open
Abstract
Background The Simple Knowledge Organization System (SKOS) was introduced to the wider research community by a 2005 World Wide Web Consortium (W3C) working draft, and further developed and refined in a 2009 W3C recommendation. Since then, SKOS has become the de facto standard for representing and sharing thesauri, lexicons, vocabularies, taxonomies, and classification schemes. In this paper, we describe the development of a web-based, free, open-source SKOS editor built for the development, curation, and management of small to medium-sized lexicons for health-related Natural Language Processing (NLP). Results The web-based SKOS editor allows users to create, curate, version, manage, and visualise SKOS resources. We tested the system against five widely-used, publicly-available SKOS vocabularies of various sizes and found that the editor is suitable for the development and management of small to medium-size lexicons. Qualitative testing has focussed on using the editor to develop lexical resources to drive NLP applications in two domains. First, developing a lexicon to support an Electronic Health Record-based NLP system for the automatic identification of pneumonia symptoms. Second, creating a taxonomy of lexical cues associated with Diagnostic and Statistical Manual of Mental Disorders (DSM-5) diagnoses with the goal of facilitating the automatic identification of symptoms associated with depression from short, informal texts. Conclusions The SKOS editor we have developed is — to the best of our knowledge — the first free, open-source, web-based, SKOS editor capable of creating, curating, versioning, managing, and visualising SKOS lexicons. Electronic supplementary material The online version of this article (doi:10.1186/s13326-015-0043-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mike Conway
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, 84108 UT United States
| | | | - Fariba Fana
- CALIT2, University of California San Diego, 9500 Gilman Drive, La Jolla, 92093 CA United States
| | - William Scuba
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, 84108 UT United States
| | - Melissa Castine
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, 84108 UT United States
| | - Danielle Mowery
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, 84108 UT United States
| | - Wendy Chapman
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, 84108 UT United States
| | - Simon Jupp
- European Bioinformatics Institute, Hinxton, CB10 1SD Cambridgeshire United Kingdom
| |
Collapse
|
8
|
LaFleur J, DuVall SL, Willson T, Ginter T, Patterson O, Cheng Y, Knippenberg K, Haroldsen C, Adler RA, Curtis JR, Agodoa I, Nelson RE. Analysis of osteoporosis treatment patterns with bisphosphonates and outcomes among postmenopausal veterans. Bone 2015; 78:174-85. [PMID: 25896952 DOI: 10.1016/j.bone.2015.04.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 03/24/2015] [Accepted: 04/14/2015] [Indexed: 01/22/2023]
Abstract
PURPOSE Adherence and persistence with bisphosphonates are frequently poor, and stopping, restarting, or switching bisphosphonates is common. We evaluated bisphosphonate change behaviors (switching, discontinuing, or reinitiating) over time, as well as fractures and costs, among a large, national cohort of postmenopausal veterans. METHODS Female veterans aged 50+ treated with bisphosphonates during 2003-2011 were identified in Veterans Health Administration (VHA) datasets. Bisphosphonate change behaviors were characterized using pharmacy refill records. Patients' baseline disease severity was characterized based on age, T-score, and prior fracture. Cox Proportional Hazard analysis was used to evaluate characteristics associated with discontinuation and the relationship between change behaviors and fracture outcomes. Generalized estimating equations were used to evaluate the relationship between change behaviors and cost outcomes. RESULTS A total of 35,650 patients met eligibility criteria. Over 6800 patients (19.1%) were non-switchers. The remaining patients were in the change cohort; at least half displayed more than one change behavior over time. A strong, significant predictor of discontinuation was ≥5 healthcare visits in the prior year (11-23% more likely to discontinue), and discontinuation risk decreased with increasing age. No change behaviors were associated with increased fracture risk. Total costs were significantly higher in patients with change behaviors (4.7-19.7% higher). Change-behavior patients mostly had significantly lower osteoporosis-related costs than non-switchers (22%-118% lower). CONCLUSIONS Most bisphosphonate patients discontinue treatment at some point, which did not significantly increase the risk of fracture in this majority non-high risk population. Bisphosphonate change behaviors were associated with significantly lower osteoporosis costs, but significantly higher total costs.
Collapse
Affiliation(s)
- J LaFleur
- Pharmacotherapy Outcomes Research Center, University of Utah, 30 South 2000 East, Salt Lake City, UT 84112, USA; VA Salt Lake City Heath Care System, 500 Foothill Drive, Salt Lake City, UT 84148, USA.
| | - S L DuVall
- Pharmacotherapy Outcomes Research Center, University of Utah, 30 South 2000 East, Salt Lake City, UT 84112, USA; VA Salt Lake City Heath Care System, 500 Foothill Drive, Salt Lake City, UT 84148, USA
| | - T Willson
- Pharmacotherapy Outcomes Research Center, University of Utah, 30 South 2000 East, Salt Lake City, UT 84112, USA; VA Salt Lake City Heath Care System, 500 Foothill Drive, Salt Lake City, UT 84148, USA
| | - T Ginter
- VA Salt Lake City Heath Care System, 500 Foothill Drive, Salt Lake City, UT 84148, USA
| | - O Patterson
- VA Salt Lake City Heath Care System, 500 Foothill Drive, Salt Lake City, UT 84148, USA
| | - Y Cheng
- Pharmacotherapy Outcomes Research Center, University of Utah, 30 South 2000 East, Salt Lake City, UT 84112, USA; VA Salt Lake City Heath Care System, 500 Foothill Drive, Salt Lake City, UT 84148, USA
| | - K Knippenberg
- Pharmacotherapy Outcomes Research Center, University of Utah, 30 South 2000 East, Salt Lake City, UT 84112, USA
| | - C Haroldsen
- VA Salt Lake City Heath Care System, 500 Foothill Drive, Salt Lake City, UT 84148, USA; Department of Internal Medicine, University of Utah, 30 North 1900 East, Salt Lake City, UT 84132, USA
| | - R A Adler
- Hunter Holmes McGuire Veterans Affairs Medical Center, 1201 Broad Rock Boulevard, Richmond, VA 23224, USA
| | - J R Curtis
- Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, 1825 University Boulevard, Birmingham, AL 35294-2182, USA
| | - I Agodoa
- Amgen, Inc., 1 Amgen Center Drive, Thousand Oaks, CA 91320, USA
| | - R E Nelson
- VA Salt Lake City Heath Care System, 500 Foothill Drive, Salt Lake City, UT 84148, USA; Department of Internal Medicine, University of Utah, 30 North 1900 East, Salt Lake City, UT 84132, USA
| |
Collapse
|
9
|
Ping XO, Tseng YJ, Chung Y, Wu YL, Hsu CW, Yang PM, Huang GT, Lai F, Liang JD. Information extraction for tracking liver cancer patients' statuses: from mixture of clinical narrative report types. Telemed J E Health 2013; 19:704-10. [PMID: 23869395 DOI: 10.1089/tmj.2012.0241] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE To provide an efficient way for tracking patients' condition over long periods of time and to facilitate the collection of clinical data from different types of narrative reports, it is critical to develop an efficient method for smoothly analyzing the clinical data accumulated in narrative reports. MATERIALS AND METHODS To facilitate liver cancer clinical research, a method was developed for extracting clinical factors from various types of narrative clinical reports, including ultrasound reports, radiology reports, pathology reports, operation notes, admission notes, and discharge summaries. An information extraction (IE) module was developed for tracking disease progression in liver cancer patients over time, and a rule-based classifier was developed for answering whether patients met the clinical research eligibility criteria. The classifier provided the answers and direct/indirect evidence (evidence sentences) for the clinical questions. To evaluate the implemented IE module and the classifier, the gold-standard annotations and answers were developed manually, and the results of the implemented system were compared with the gold standard. RESULTS The IE model achieved an F-score from 92.40% to 99.59%, and the classifier achieved accuracy from 96.15% to 100%. CONCLUSIONS The application was successfully applied to the various types of narrative clinical reports. It might be applied to the key extraction for other types of cancer patients.
Collapse
Affiliation(s)
- Xiao-Ou Ping
- 1 Department of Computer Science and Information Engineering, National Taiwan University , Taipei, Taiwan
| | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Womack JA, Scotch M, Leung SN, Skanderson M, Bathulapalli H, Haskell SG, Brandt CA. Use of structured and unstructured data to identify contraceptive use in women veterans. Perspect Health Inf Manag 2013; 10:1e. [PMID: 23861675 PMCID: PMC3709878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Contraceptive use among women Veterans may not be adequately captured using administrative and pharmacy codes. Clinical progress notes may provide a useful alternative. The objectives of this study were to validate the use of administrative and pharmacy codes to identify contraceptive use in Veterans Health Administration data, and to determine the feasibility and validity of identifying contraceptive use in clinical progress notes. The study included women Veterans who participated in the Women Veterans Cohort Study, enrolled in the Veterans Affairs Connecticut Health Care System, completed a baseline survey, and had clinical progress notes from one year prior to survey completion. Contraceptive ICD-9-CM codes, V-codes, CPT codes, and pharmacy codes were identified. Progress notes were annotated to identify contraceptive use. Self-reported contraceptive use was identified from a baseline survey of health habits and healthcare practices and utilization. Sensitivity, specificity, and positive predictive value were calculated comparing administrative and pharmacy contraceptive codes and progress note-based contraceptive information to self-report survey data. Results showed that administrative and pharmacy codes were specific but not sensitive for identifying contraceptive use. For example, oral contraceptive pill codes were highly specific (1.00) but not sensitive (0.41). Data from clinical progress notes demonstrated greater sensitivity and comparable specificity. For example, for oral contraceptive pills, progress notes were both specific (0.85) and sensitive (0.73). Results suggest that the best approach for identifying contraceptive use, through either administrative codes or progress notes, depends on the research question.
Collapse
|
11
|
Xu H, Fu Z, Shah A, Chen Y, Peterson NB, Chen Q, Mani S, Levy MA, Dai Q, Denny JC. Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases. AMIA Annu Symp Proc 2011; 2011:1564-1572. [PMID: 22195222 PMCID: PMC3243156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Identification of a cohort of patients with specific diseases is an important step for clinical research that is based on electronic health records (EHRs). Informatics approaches combining structured EHR data, such as billing records, with narrative text data have demonstrated utility for such tasks. This paper describes an algorithm combining machine learning and natural language processing to detect patients with colorectal cancer (CRC) from entire EHRs at Vanderbilt University Hospital. We developed a general case detection method that consists of two steps: 1) extraction of positive CRC concepts from all clinical notes (document-level concept identification); and 2) determination of CRC cases using aggregated information from both clinical narratives and structured billing data (patient-level case determination). For each step, we compared performance of rule-based and machine-learning-based approaches. Using a manually reviewed data set containing 300 possible CRC patients (150 for training and 150 for testing), we showed that our method achieved F-measures of 0.996 for document level concept identification, and 0.93 for patient level case detection.
Collapse
Affiliation(s)
- Hua Xu
- Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, TN, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Chapman BE, Lee S, Kang HP, Chapman WW. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. J Biomed Inform 2011; 44:728-37. [PMID: 21459155 DOI: 10.1016/j.jbi.2011.03.011] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Revised: 03/08/2011] [Accepted: 03/27/2011] [Indexed: 11/28/2022]
Abstract
In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes' classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes' classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes' classifier using bigrams.
Collapse
Affiliation(s)
- Brian E Chapman
- Division of Biomedical Informatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093-0728, USA.
| | | | | | | |
Collapse
|