1
|
Zheng C, Ackerson B, Qiu S, Sy LS, Daily LIV, Song J, Qian L, Luo Y, Ku JH, Cheng Y, Wu J, Tseng HF. Natural Language Processing Versus Diagnosis Code-Based Methods for Postherpetic Neuralgia Identification: Algorithm Development and Validation. JMIR Med Inform 2024; 12:e57949. [PMID: 39254589 PMCID: PMC11407135 DOI: 10.2196/57949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 07/02/2024] [Accepted: 07/08/2024] [Indexed: 09/11/2024] Open
Abstract
Background Diagnosis codes and prescription data are used in algorithms to identify postherpetic neuralgia (PHN), a debilitating complication of herpes zoster (HZ). Because of the questionable accuracy of codes and prescription data, manual chart review is sometimes used to identify PHN in electronic health records (EHRs), which can be costly and time-consuming. Objective This study aims to develop and validate a natural language processing (NLP) algorithm for automatically identifying PHN from unstructured EHR data and to compare its performance with that of code-based methods. Methods This retrospective study used EHR data from Kaiser Permanente Southern California, a large integrated health care system that serves over 4.8 million members. The source population included members aged ≥50 years who received an incident HZ diagnosis and accompanying antiviral prescription between 2018 and 2020 and had ≥1 encounter within 90-180 days of the incident HZ diagnosis. The study team manually reviewed the EHR and identified PHN cases. For NLP development and validation, 500 and 800 random samples from the source population were selected, respectively. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F-score, and Matthews correlation coefficient (MCC) of NLP and the code-based methods were evaluated using chart-reviewed results as the reference standard. Results The NLP algorithm identified PHN cases with a 90.9% sensitivity, 98.5% specificity, 82% PPV, and 99.3% NPV. The composite scores of the NLP algorithm were 0.89 (F-score) and 0.85 (MCC). The prevalences of PHN in the validation data were 6.9% (reference standard), 7.6% (NLP), and 5.4%-13.1% (code-based). The code-based methods achieved a 52.7%-61.8% sensitivity, 89.8%-98.4% specificity, 27.6%-72.1% PPV, and 96.3%-97.1% NPV. The F-scores and MCCs ranged between 0.45 and 0.59 and between 0.32 and 0.61, respectively. Conclusions The automated NLP-based approach identified PHN cases from the EHR with good accuracy. This method could be useful in population-based PHN research.
Collapse
Affiliation(s)
- Chengyi Zheng
- Department of Research & Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, United States, 1 626-986-8665, 1 626-564-7872
| | - Bradley Ackerson
- South Bay Medical Center, Kaiser Permanente Southern California, Harbor City, CA, United States
| | - Sijia Qiu
- Department of Research & Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, United States, 1 626-986-8665, 1 626-564-7872
| | - Lina S Sy
- Department of Research & Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, United States, 1 626-986-8665, 1 626-564-7872
| | - Leticia I Vega Daily
- Department of Research & Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, United States, 1 626-986-8665, 1 626-564-7872
| | - Jeannie Song
- Department of Research & Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, United States, 1 626-986-8665, 1 626-564-7872
| | - Lei Qian
- Department of Research & Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, United States, 1 626-986-8665, 1 626-564-7872
| | - Yi Luo
- Department of Research & Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, United States, 1 626-986-8665, 1 626-564-7872
| | - Jennifer H Ku
- Department of Research & Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, United States, 1 626-986-8665, 1 626-564-7872
| | - Yanjun Cheng
- Department of Research & Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, United States, 1 626-986-8665, 1 626-564-7872
| | - Jun Wu
- Department of Research & Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, United States, 1 626-986-8665, 1 626-564-7872
| | - Hung Fu Tseng
- Department of Research & Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, United States, 1 626-986-8665, 1 626-564-7872
- Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA, United States
| |
Collapse
|
2
|
Gergi M, Wilkinson K, Plante TB, Zakai NA. Ascertaining accurate exposure to aspirin and other antithrombotic medications using structured electronic health record data. Res Pract Thromb Haemost 2024; 8:102513. [PMID: 39192871 PMCID: PMC11347841 DOI: 10.1016/j.rpth.2024.102513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/21/2024] [Accepted: 06/28/2024] [Indexed: 08/29/2024] Open
Abstract
Background Ascertaining accurately the exposure to antithrombotic medications for both research and quality initiatives has been challenging due to a multitude of reasons: aspirin, the most commonly used antithrombotic, is available over the counter in the United States. Additionally, antithrombotic medications are frequently interrupted for bleeding and procedures. Objectives We aimed to develop and validate an algorithm to capture accurately the longitudinal exposure to antithrombotic medications including aspirin using the electronic health record. Methods We used the Medical Inpatient Thrombosis and Hemostasis cohort, which consists of primary care patients at a university medical center followed for a median of 6.2 years. Exposure to antithrombotic medications was captured using the medication reconciliation data linked to each ambulatory encounter. We developed an algorithm that used the taking "yes" or "no" tab as well as start and stop dates to define the duration of exposure for each medication. Eighty charts were reviewed and compared with results of the algorithm for validation. We estimated the sensitivity, specificity, and positive and negative predictive values. Results The algorithm was 97% (95% CI, 94%-100%) sensitive and 95% (95% CI, 90%-100%) specific in identifying exposure to any antithrombotic medication. This translated to a 93% (95% CI, 85%-100%) positive predictive value and 98% (95% CI, 96%-100%) negative predictive value. When looking at aspirin alone, the sensitivity and the positive predictive value were 95% (95% CI, 88%-100%) and 87% (95% CI, 71%-100%). Conclusion This current algorithm provides a new and easily adaptable strategy to capture accurately exposure to aspirin and other antithrombotic medications.
Collapse
Affiliation(s)
- Mansour Gergi
- Department of Medicine, Larner College of Medicine, University of Vermont, Burlington, Vermont, USA
- Department of Medicine, University of Vermont Medical Center, Burlington, Vermont, USA
| | - Katherine Wilkinson
- Department of Pathology and Laboratory Medicine, University of Vermont, Burlington, Vermont, USA
| | - Timothy B. Plante
- Department of Medicine, Larner College of Medicine, University of Vermont, Burlington, Vermont, USA
- Department of Medicine, University of Vermont Medical Center, Burlington, Vermont, USA
| | - Neil A. Zakai
- Department of Medicine, Larner College of Medicine, University of Vermont, Burlington, Vermont, USA
- Department of Medicine, University of Vermont Medical Center, Burlington, Vermont, USA
- Department of Pathology and Laboratory Medicine, University of Vermont, Burlington, Vermont, USA
| |
Collapse
|
3
|
Alamer KA, Holden RJ, Chui MA, Stone JA, Campbell NL. Home medication inventory method to assess over-the-counter (OTC) medication possession and use: A pilot study on the feasibility of in-person and remote modalities with older adults. Res Social Adm Pharm 2024; 20:443-450. [PMID: 38320947 PMCID: PMC10947788 DOI: 10.1016/j.sapharm.2024.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/17/2023] [Accepted: 01/08/2024] [Indexed: 02/08/2024]
Abstract
BACKGROUND There is a need for reproducible methods to measure over-the-counter (OTC) medication possession and use. This is because OTC medications are self-managed, variably monitored by healthcare professionals, and in certain populations such as older adults some OTC medications may introduce risk and cause more harm than benefit. OBJECTIVE (s): To develop and assess the feasibility of the Home Medication Inventory Method (HMIM), a novel method to measure possession and use of OTC medications. METHODS We benchmarked, adapted, and standardized prior approaches to medication inventory to develop a method capable of addressing the limitations of existing methods. We then conducted a pilot study of the HMIM among older adults. Eligible participants were aged ≥60 years, reported purchasing or considering purchasing OTC medication, and screened for normal cognition. Interviews were conducted both in person and remotely. When possible, photographs of all OTC medications were obtained with participant consent and completion times were recorded for both in-person and remote modalities. RESULTS In total 51 participants completed the pilot study. Home medication inventories were conducted in-person (n = 15) and remotely (n = 36). Inventories were completed in a mean (SD) of 20.2 min (12.7), and 96 % of inventories completed within 45 min. A total of 390 OTC medications were possessed by participants, for a mean (SD) of 7.6 (6.3) per participant. No differences in duration of interviews or number of medications reported were identified between in-person and remote modalities. Anticholinergic medications, a class targeted in the pilot as potentially harmful to older adults, were possessed by 31 % of participants, and 14 % of all participants reported use of such a medication within the previous 2 weeks. CONCLUSIONS Implementing the HMIM using in-person and remote modalities is a feasible and ostensibly reproducible method for collecting OTC medication possession and use information. Larger studies are necessary to further generalize HMIM feasibility and reliability in diverse populations.
Collapse
Affiliation(s)
- Khalid A Alamer
- Department of Pharmacy Practice, Purdue University College of Pharmacy, West Lafayette, IN, USA; Department of Pharmacy Practice, Imam Abdulrahman bin Faisal University College of Clinical Pharmacy, Dammam, Saudi Arabia.
| | - Richard J Holden
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA; Center for Aging Research, Regenstrief Institute, Inc., Indianapolis, IN, USA; Center for Health Innovation and Implementation Science, Indiana University School of Medicine and Regenstrief Institute, Inc., Indianapolis, IN, USA
| | - Michelle A Chui
- Division of Social and Administrative Sciences, University of Wisconsin-Madison, School of Pharmacy, Madison, WI, USA; Sonderegger Research Center for Improved Medication Outcomes, University of Wisconsin-Madison, School of Pharmacy, Madison, WI, USA
| | - Jamie A Stone
- Division of Social and Administrative Sciences, University of Wisconsin-Madison, School of Pharmacy, Madison, WI, USA; Sonderegger Research Center for Improved Medication Outcomes, University of Wisconsin-Madison, School of Pharmacy, Madison, WI, USA
| | - Noll L Campbell
- Department of Pharmacy Practice, Purdue University College of Pharmacy, West Lafayette, IN, USA; Center for Aging Research, Regenstrief Institute, Inc., Indianapolis, IN, USA; Center for Health Innovation and Implementation Science, Indiana University School of Medicine and Regenstrief Institute, Inc., Indianapolis, IN, USA; Eskenazi Health, Indianapolis, IN, USA
| |
Collapse
|
4
|
Zheng C, Lee MS, Bansal N, Go AS, Chen C, Harrison TN, Fan D, Allen A, Garcia E, Lidgard B, Singer D, An J. Identification of recurrent atrial fibrillation using natural language processing applied to electronic health records. EUROPEAN HEART JOURNAL. QUALITY OF CARE & CLINICAL OUTCOMES 2024; 10:77-88. [PMID: 36997334 PMCID: PMC10785579 DOI: 10.1093/ehjqcco/qcad021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 03/14/2023] [Accepted: 03/29/2023] [Indexed: 04/01/2023]
Abstract
AIMS This study aimed to develop and apply natural language processing (NLP) algorithms to identify recurrent atrial fibrillation (AF) episodes following rhythm control therapy initiation using electronic health records (EHRs). METHODS AND RESULTS We included adults with new-onset AF who initiated rhythm control therapies (ablation, cardioversion, or antiarrhythmic medication) within two US integrated healthcare delivery systems. A code-based algorithm identified potential AF recurrence using diagnosis and procedure codes. An automated NLP algorithm was developed and validated to capture AF recurrence from electrocardiograms, cardiac monitor reports, and clinical notes. Compared with the reference standard cases confirmed by physicians' adjudication, the F-scores, sensitivity, and specificity were all above 0.90 for the NLP algorithms at both sites. We applied the NLP and code-based algorithms to patients with incident AF (n = 22 970) during the 12 months after initiating rhythm control therapy. Applying the NLP algorithms, the percentages of patients with AF recurrence for sites 1 and 2 were 60.7% and 69.9% (ablation), 64.5% and 73.7% (cardioversion), and 49.6% and 55.5% (antiarrhythmic medication), respectively. In comparison, the percentages of patients with code-identified AF recurrence for sites 1 and 2 were 20.2% and 23.7% for ablation, 25.6% and 28.4% for cardioversion, and 20.0% and 27.5% for antiarrhythmic medication, respectively. CONCLUSION When compared with a code-based approach alone, this study's high-performing automated NLP method identified significantly more patients with recurrent AF. The NLP algorithms could enable efficient evaluation of treatment effectiveness of AF therapies in large populations and help develop tailored interventions.
Collapse
Affiliation(s)
- Chengyi Zheng
- Research and Evaluation Department, Kaiser Permanente Southern California,100 S Los Robles Ave, 2nd Floor, Pasadena, CA 91101, USA
| | - Ming-sum Lee
- Department of Cardiology, Kaiser Permanente Los Angeles Medical Center, Los Angeles, CA 90027, USA
| | - Nisha Bansal
- Kidney Research Institute, Division of Nephrology, University of Washington, Seattle, WA 98104, USA
| | - Alan S Go
- Division of Research, Kaiser Permanente Northern California, Oakland, CA 94612, USA
- Department of Health Systems Science, Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA 91101, USA
- Department of Medicine and Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94158, USA
- Departments of Medicine, Stanford University, Palo Alto, CA 94305, USA
| | - Cheng Chen
- Department of Cardiology, Kaiser Permanente Fontana Medical Center, Fontana, CA 92335, USA
| | - Teresa N Harrison
- Research and Evaluation Department, Kaiser Permanente Southern California,100 S Los Robles Ave, 2nd Floor, Pasadena, CA 91101, USA
| | - Dongjie Fan
- Division of Research, Kaiser Permanente Northern California, Oakland, CA 94612, USA
| | - Amanda Allen
- Division of Research, Kaiser Permanente Northern California, Oakland, CA 94612, USA
| | - Elisha Garcia
- Division of Research, Kaiser Permanente Northern California, Oakland, CA 94612, USA
| | - Ben Lidgard
- Department of Health Systems Science, Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA 91101, USA
| | - Daniel Singer
- Clinical Epidemiology Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jaejin An
- Research and Evaluation Department, Kaiser Permanente Southern California,100 S Los Robles Ave, 2nd Floor, Pasadena, CA 91101, USA
- Department of Health Systems Science, Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA 91101, USA
| |
Collapse
|
5
|
Scharp D, Hobensack M, Davoudi A, Topaz M. Natural Language Processing Applied to Clinical Documentation in Post-acute Care Settings: A Scoping Review. J Am Med Dir Assoc 2024; 25:69-83. [PMID: 37838000 PMCID: PMC10792659 DOI: 10.1016/j.jamda.2023.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 09/05/2023] [Accepted: 09/07/2023] [Indexed: 10/16/2023]
Abstract
OBJECTIVES To determine the scope of the application of natural language processing to free-text clinical notes in post-acute care and provide a foundation for future natural language processing-based research in these settings. DESIGN Scoping review; reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews guidelines. SETTING AND PARTICIPANTS Post-acute care (ie, home health care, long-term care, skilled nursing facilities, and inpatient rehabilitation facilities). METHODS PubMed, Cumulative Index of Nursing and Allied Health Literature, and Embase were searched in February 2023. Eligible studies had quantitative designs that used natural language processing applied to clinical documentation in post-acute care settings. The quality of each study was appraised. RESULTS Twenty-one studies were included. Almost all studies were conducted in home health care settings. Most studies extracted data from electronic health records to examine the risk for negative outcomes, including acute care utilization, medication errors, and suicide mortality. About half of the studies did not report age, sex, race, or ethnicity data or use standardized terminologies. Only 8 studies included variables from socio-behavioral domains. Most studies fulfilled all quality appraisal indicators. CONCLUSIONS AND IMPLICATIONS The application of natural language processing is nascent in post-acute care settings. Future research should apply natural language processing using standardized terminologies to leverage free-text clinical notes in post-acute care to promote timely, comprehensive, and equitable care. Natural language processing could be integrated with predictive models to help identify patients who are at risk of negative outcomes. Future research should incorporate socio-behavioral determinants and diverse samples to improve health equity in informatics tools.
Collapse
Affiliation(s)
| | | | - Anahita Davoudi
- VNS Health, Center for Home Care Policy & Research, New York, NY, USA
| | - Maxim Topaz
- Columbia University School of Nursing, New York, NY, USA
| |
Collapse
|
6
|
Zheng C, Sun BC, Wu YL, Ferencik M, Lee MS, Redberg RF, Kawatkar AA, Musigdilok VV, Sharp AL. Automated interpretation of stress echocardiography reports using natural language processing. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2022; 3:626-637. [PMID: 36710893 PMCID: PMC9779789 DOI: 10.1093/ehjdh/ztac047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 08/08/2022] [Indexed: 02/01/2023]
Abstract
Aims Stress echocardiography (SE) findings and interpretations are commonly documented in free-text reports. Reusing SE results requires laborious manual reviews. This study aimed to develop and validate an automated method for abstracting SE reports in a large cohort. Methods and results This study included adult patients who had SE within 30 days of their emergency department visit for suspected acute coronary syndrome in a large integrated healthcare system. An automated natural language processing (NLP) algorithm was developed to abstract SE reports and classify overall SE results into normal, non-diagnostic, infarction, and ischaemia categories. Randomly selected reports (n = 140) were double-blindly reviewed by cardiologists to perform criterion validity of the NLP algorithm. Construct validity was tested on the entire cohort using abstracted SE data and additional clinical variables. The NLP algorithm abstracted 6346 consecutive SE reports. Cardiologists had good agreements on the overall SE results on the 140 reports: Kappa (0.83) and intraclass correlation coefficient (0.89). The NLP algorithm achieved 98.6% specificity and negative predictive value, 95.7% sensitivity, positive predictive value, and F-score on the overall SE results and near-perfect scores on ischaemia findings. The 30-day acute myocardial infarction or death outcomes were highest among patients with ischaemia (5.0%), followed by infarction (1.4%), non-diagnostic (0.8%), and normal (0.3%) results. We found substantial variations in the format and quality of SE reports, even within the same institution. Conclusions Natural language processing is an accurate and efficient method for abstracting unstructured SE reports. This approach creates new opportunities for research, public health measures, and care improvement.
Collapse
Affiliation(s)
- Chengyi Zheng
- Corresponding author. Tel: 1-626-376-7029, Fax: 626-564-3694,
| | - Benjamin C Sun
- Department of Emergency Medicine and Leonard Davis Institute, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yi-Lin Wu
- Research and Evaluation Department, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA 91101, USA
| | - Maros Ferencik
- Oregon Health and Science University, Knight Cardiovascular Institute, Portland, OR 97239, USA
| | - Ming-Sum Lee
- Division of Cardiology, Kaiser Permanente Southern California, Los Angeles Medical Center, Los Angeles, CA 90027, USA
| | - Rita F Redberg
- Division of Cardiology, University of California, San Francisco, CA 94143, USA
| | - Aniket A Kawatkar
- Research and Evaluation Department, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA 91101, USA
| | - Visanee V Musigdilok
- Research and Evaluation Department, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA 91101, USA
| | - Adam L Sharp
- Research and Evaluation Department, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA 91101, USA,Clinical Science Department, Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA 91101, USA
| |
Collapse
|
7
|
Zheng C, Sun BC, Wu YL, Ferencik M, Lee MS, Redberg RF, Kawatkar AA, Musigdilok VV, Sharp AL. Automated abstraction of myocardial perfusion imaging reports using natural language processing. J Nucl Cardiol 2022; 29:1178-1187. [PMID: 33155169 PMCID: PMC8096860 DOI: 10.1007/s12350-020-02401-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 09/29/2020] [Indexed: 01/11/2023]
Abstract
BACKGROUND Findings and interpretations of myocardial perfusion imaging (MPI) studies are documented in free-text MPI reports. MPI results are essential for research, but manual review is prohibitively time consuming. This study aimed to develop and validate an automated method to abstract MPI reports. METHODS We developed a natural language processing (NLP) algorithm to abstract MPI reports. Randomly selected reports were double-blindly reviewed by two cardiologists to validate the NLP algorithm. Secondary analyses were performed to describe patient outcomes based on abstracted-MPI results on 16,957 MPI tests from adult patients evaluated for suspected ACS. RESULTS The NLP algorithm achieved high sensitivity (96.7%) and specificity (98.9%) on the MPI categorical results and had a similar degree of agreement compared to the physician reviewers. Patients with abnormal MPI results had higher rates of 30-day acute myocardial infarction or death compared to patients with normal results. We identified issues related to the quality of the reports that not only affect communication with referring physicians but also challenges for automated abstraction. CONCLUSION NLP is an accurate and efficient strategy to abstract results from the free-text MPI reports. Our findings will facilitate future research to understand the benefits of MPI studies but requires validation in other settings.
Collapse
Affiliation(s)
- Chengyi Zheng
- Research and Evaluation Department, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, USA.
| | - Benjamin C Sun
- Department of Emergency Medicine and Leonard Davis Institute, University of Pennsylvania, Philadelphia, PA, USA
| | - Yi-Lin Wu
- Research and Evaluation Department, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, USA
| | - Maros Ferencik
- Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA
| | - Ming-Sum Lee
- Division of Cardiology, Kaiser Permanente Southern California, Los Angeles Medical Center, Los Angeles, CA, USA
| | - Rita F Redberg
- Division of Cardiology, University of California, San Francisco, San Francisco, CA, USA
| | - Aniket A Kawatkar
- Research and Evaluation Department, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, USA
| | - Visanee V Musigdilok
- Research and Evaluation Department, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, USA
| | - Adam L Sharp
- Research and Evaluation Department, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd Floor, Pasadena, CA, 91101, USA
| |
Collapse
|
8
|
Shi J, Gao X, Kinsman WC, Ha C, Gao GG, Chen Y. DI++: A deep learning system for patient condition identification in clinical notes. Artif Intell Med 2022; 123:102224. [PMID: 34998515 PMCID: PMC8832473 DOI: 10.1016/j.artmed.2021.102224] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 11/05/2020] [Accepted: 11/11/2021] [Indexed: 01/03/2023]
Abstract
Accurately recording a patient's medical conditions in an EHR system is the basis of effectively documenting patient health status, coding for billing, and supporting data-driven clinical decision making. However, patient conditions are often not fully captured in structured EHR systems, but may be documented in unstructured clinical notes. The challenge is that not all disease mentions in clinical notes actually refer to a patient's conditions. We developed a two-step workflow for identifying patient's conditions from clinical notes: disease mention extraction and disease mention classification. We implemented this workflow in a prototype system, DI++, for Disease Identification. An advanced deep learning model, CLSTM-Attention model, is developed for disease mention classification in DI++. Extensive empirical evaluation on about one million pages of de-identified clinical notes demonstrates that DI++ has significant performance advantage over existing systems on F1 Score, Area Under the Curve metrics, and efficiency. The proposed CLSTM-Attention model outperforms the existing deep learning models for disease mention classification.
Collapse
Affiliation(s)
- Jinhe Shi
- New Jersey Institute of Technology, Newark, NJ, United States of America
| | - Xiangyu Gao
- New Jersey Institute of Technology, Newark, NJ, United States of America
| | | | - Chenyu Ha
- Inovalon, Bowie, MD, United States of America
| | | | - Yi Chen
- New Jersey Institute of Technology, Newark, NJ, United States of America,Corresponding author. (Y. Chen)
| |
Collapse
|
9
|
Natural Language Processing to Identify Pulmonary Nodules and Extract Nodule Characteristics From Radiology Reports. Chest 2021; 160:1902-1914. [PMID: 34089738 DOI: 10.1016/j.chest.2021.05.048] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 03/20/2021] [Accepted: 05/11/2021] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND There is an urgent need for population-based studies on managing patients with pulmonary nodules. RESEARCH QUESTION Is it possible to identify pulmonary nodules and associated characteristics using an automated method? STUDY DESIGN AND METHODS We revised and refined an existing natural language processing (NLP) algorithm to identify radiology transcripts with pulmonary nodules and greatly expanded its functionality to identify the characteristics of the largest nodule, when present, including size, lobe, laterality, attenuation, calcification, and edge. We compared NLP results with a reference standard of manual transcript review in a random test sample of 200 radiology transcripts. We applied the final automated method to a larger cohort of patients who underwent chest CT scan in an integrated health care system from 2006 to 2016, and described their demographic and clinical characteristics. RESULTS In the test sample, the NLP algorithm had very high sensitivity (98.6%; 95% CI, 95.0%-99.8%) and specificity (100%; 95% CI, 93.9%-100%) for identifying pulmonary nodules. For attenuation, edge, and calcification, the NLP algorithm achieved similar accuracies, and it correctly identified the diameter of the largest nodule in 135 of 141 cases (95.7%; 95% CI, 91.0%-98.4%). In the larger cohort, the NLP found 217,771 reports with nodules among 717,304 chest CT reports (30.4%). From 2006 to 2016, the number of reports with nodules increased by 150%, and the mean size of the largest nodule gradually decreased from 11 to 8.9 mm. Radiologists documented the laterality and lobe (90%-95%) more often than the attenuation, calcification, and edge characteristics (11%-14%). INTERPRETATION The NLP algorithm identified pulmonary nodules and associated characteristics with high accuracy. In our community practice settings, the documentation of nodule characteristics is incomplete. Our results call for better documentation of nodule findings. The NLP algorithm can be used in population-based studies to identify pulmonary nodules, avoiding labor-intensive chart review.
Collapse
|
10
|
Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records. CARDIOVASCULAR DIGITAL HEALTH JOURNAL 2021; 2:156-163. [PMID: 35265904 PMCID: PMC8890044 DOI: 10.1016/j.cvdhj.2021.03.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Background Objective Methods Results Conclusion
Collapse
|
11
|
Mahan M, Rafter D, Casey H, Engelking M, Abdallah T, Truwit C, Oswood M, Samadani U. tbiExtractor: A framework for extracting traumatic brain injury common data elements from radiology reports. PLoS One 2020; 15:e0214775. [PMID: 32609723 PMCID: PMC7329124 DOI: 10.1371/journal.pone.0214775] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 05/18/2020] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND The manual extraction of valuable data from electronic medical records is cumbersome, error-prone, and inconsistent. By automating extraction in conjunction with standardized terminology, the quality and consistency of data utilized for research and clinical purposes would be substantially improved. Here, we set out to develop and validate a framework to extract pertinent clinical conditions for traumatic brain injury (TBI) from computed tomography (CT) reports. METHODS We developed tbiExtractor, which extends pyConTextNLP, a regular expression algorithm using negation detection and contextual features, to create a framework for extracting TBI common data elements from radiology reports. The algorithm inputs radiology reports and outputs a structured summary containing 27 clinical findings with their respective annotations. Development and validation of the algorithm was completed using two physician annotators as the gold standard. RESULTS tbiExtractor displayed high sensitivity (0.92-0.94) and specificity (0.99) when compared to the gold standard. The algorithm also demonstrated a high equivalence (94.6%) with the annotators. A majority of clinical findings (85%) had minimal errors (F1 Score ≥ 0.80). When compared to annotators, tbiExtractor extracted information in significantly less time (0.3 sec vs 1.7 min per report). CONCLUSION tbiExtractor is a validated algorithm for extraction of TBI common data elements from radiology reports. This automation reduces the time spent to extract structured data and improves the consistency of data extracted. Lastly, tbiExtractor can be used to stratify subjects into groups based on visible damage by partitioning the annotations of the pertinent clinical conditions on a radiology report.
Collapse
Affiliation(s)
- Margaret Mahan
- Department of Biomedical Informatics and Computational Biology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Daniel Rafter
- Department of Biomedical Informatics and Computational Biology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Hannah Casey
- Department of Biomedical Informatics and Computational Biology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Marta Engelking
- Department of Biomedical Informatics and Computational Biology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Tessneem Abdallah
- Department of Biomedical Informatics and Computational Biology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Charles Truwit
- Diagnostic Imaging, Philips Global, Maple Grove, Minnesota, United States of America
| | - Mark Oswood
- Department of Radiology, Hennepin Healthcare, Minneapolis, Minnesota, United States of America
- Department of Radiology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Uzma Samadani
- Department of Biomedical Informatics and Computational Biology, University of Minnesota, Minneapolis, Minnesota, United States of America
- Department of Neurosurgery, Minneapolis VA Medical Center, Minneapolis, Minnesota, United States of America
| |
Collapse
|
12
|
Zheng C, Sun BC, Wu YL, Lee MS, Shen E, Redberg RF, Ferencik M, Natsui S, Kawatkar AA, Musigdilok VV, Sharp AL. Automated Identification and Extraction of Exercise Treadmill Test Results. J Am Heart Assoc 2020; 9:e014940. [PMID: 32079480 PMCID: PMC7335560 DOI: 10.1161/jaha.119.014940] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Background Noninvasive cardiac tests, including exercise treadmill tests (ETTs), are commonly utilized in the evaluation of patients in the emergency department with suspected acute coronary syndrome. However, there are ongoing debates on their clinical utility and cost‐effectiveness. It is important to be able to use ETT results for research, but manual review is prohibitively time‐consuming for large studies. We developed and validated an automated method to interpret ETT results from electronic health records. To demonstrate the algorithm's utility, we tested the associations between ETT results with 30‐day patient outcomes in a large population. Methods and Results A retrospective analysis of adult emergency department encounters resulting in an ETT within 30 days was performed. A set of randomly selected reports were double‐blind reviewed by 2 physicians to validate a natural language processing algorithm designed to categorize ETT results into normal, ischemic, nondiagnostic, and equivocal categories. Natural language processing then searched and categorized results of 5214 ETT reports. The natural language processing algorithm achieved 96.4% sensitivity and 94.8% specificity in identifying normal versus all other categories. The rates of 30‐day death or acute myocardial infarction varied (P<0.001) by categories for normal (0.08%), ischemic (1.9%), nondiagnostic (0.77%), and equivocal (0.58%) groups achieving good discrimination (C‐statistic, 0.81; 95% CI, 0.7–0.92). Conclusions Natural language processing is an accurate and efficient strategy to facilitate large‐scale outcome studies of noninvasive cardiac tests. We found that most patients are at low risk and have normal ETT results, while those with abnormal, nondiagnostic, or equivocal results have slightly higher risks and warrant future investigation.
Collapse
Affiliation(s)
- Chengyi Zheng
- Research and Evaluation Department Kaiser Permanente Southern California Pasadena CA
| | - Benjamin C Sun
- Department of Emergency Medicine University of Pennsylvania Philadelphia PA
| | - Yi-Lin Wu
- Research and Evaluation Department Kaiser Permanente Southern California Pasadena CA
| | - Ming-Sum Lee
- Division of Cardiology Kaiser Permanente Southern California, Los Angeles Medical Center Los Angeles CA
| | - Ernest Shen
- Research and Evaluation Department Kaiser Permanente Southern California Pasadena CA
| | - Rita F Redberg
- Division of Cardiology University of California, San Francisco San Francisco CA
| | - Maros Ferencik
- Knight Cardiovascular Institute Oregon Health and Science University Portland OR
| | - Shaw Natsui
- National Clinician Scholars Program Department of Emergency Medicine University of California, Los Angeles Los Angeles CA
| | - Aniket A Kawatkar
- Research and Evaluation Department Kaiser Permanente Southern California Pasadena CA
| | - Visanee V Musigdilok
- Research and Evaluation Department Kaiser Permanente Southern California Pasadena CA
| | - Adam L Sharp
- Research and Evaluation Department Kaiser Permanente Southern California Pasadena CA
| |
Collapse
|
13
|
Ascertainment of Aspirin Exposure Using Structured and Unstructured Large-scale Electronic Health Record Data. Med Care 2020; 57:e60-e64. [PMID: 30807451 PMCID: PMC6703965 DOI: 10.1097/mlr.0000000000001065] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Supplemental Digital Content is available in the text. Aspirin impacts risk for important outcomes such as cancer, cardiovascular disease, and gastrointestinal bleeding. However, ascertaining exposure to medications available both by prescription and over-the-counter such as aspirin for research and quality improvement purposes is a challenge.
Collapse
|
14
|
Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform 2019; 7:e12239. [PMID: 31066697 PMCID: PMC6528438 DOI: 10.2196/12239] [Citation(s) in RCA: 251] [Impact Index Per Article: 41.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 03/04/2019] [Accepted: 03/24/2019] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset. OBJECTIVE The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives. METHODS Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using "clinical notes," "natural language processing," and "chronic disease" and their variations as keywords to maximize coverage of the articles. RESULTS Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. CONCLUSIONS Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.
Collapse
Affiliation(s)
- Seyedmostafa Sheikhalishahi
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
- Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
| | - Riccardo Miotto
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Joel T Dudley
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Alberto Lavelli
- NLP Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| | - Fabio Rinaldi
- Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Venet Osmani
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| |
Collapse
|
15
|
An J, Niu F, Zheng C, Rashid N, Mendes RA, Dills D, Vo L, Singh P, Bruno A, Lang DT, Le PT, Jazdzewski KP, Aranda G. Warfarin Management and Outcomes in Patients with Nonvalvular Atrial Fibrillation Within an Integrated Health Care System. J Manag Care Spec Pharm 2018; 23:700-712. [PMID: 28530526 PMCID: PMC10398296 DOI: 10.18553/jmcp.2017.23.6.700] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
BACKGROUND Warfarin is a common treatment option to manage patients with nonvalvular atrial fibrillation (NVAF) in clinical practice. Understanding current pharmacist-led anticoagulation clinic management patterns and associated outcomes is important for quality improvement; however, currently little evidence associating outcomes with management patterns exists. OBJECTIVES To (a) describe warfarin management patterns and (b) evaluate associations between warfarin treatment and clinical outcomes for patients with NVAF in an integrated health care system. METHODS A retrospective cohort study was conducted among NVAF patients with warfarin therapy between January 1, 2006, and December 31, 2011, using Kaiser Permanente Southern California data, and followed until December 31, 2013. Management patterns related to international normalized ratio (INR) monitoring, anticoagulation clinic pharmacist intervention (consultation), and warfarin dose adjustments were investigated along with yearly attrition rates, time-in-therapeutic ranges (TTRs), and clinical outcomes (stroke or systemic embolism and major bleeding). Descriptive statistics and multivariable Cox proportional hazard models were used to determine associations between TTR and clinical outcomes. RESULTS A total of 32,074 NVAF patients on warfarin treatment were identified and followed for a median of 3.8 years. About half (49%) of the patients were newly initiating warfarin therapy. INR monitoring and pharmacist interventions were conducted roughly every 3 weeks after 6 months of warfarin treatment. Sixty-three percent of the study population had ≥ 1 warfarin dose adjustments with a mean (SD) of 6.7 (6.3) annual dose adjustments. Warfarin dose adjustments occurred at a median of 1 day (interquartile ranges [IQR] 1-3) after the INR measurement. Yearly attrition rate was from 3.3% to 6.3% during the follow-up, and median (IQR) TTR was 61% (46%-73%). Patients who received frequent INR monitoring (≥ 27 times per year), pharmacist interventions (≥ 24 times per year), or frequently adjusted warfarin dose (≥ 11 times per year) consistently showed poor TTRs (mean TTR for the highest quartiles was 45.3%-48.3%). A higher TTR was associated with a lower risk of clinical outcomes regardless of frequency of INR monitoring, pharmacist interventions, or number of dose adjustments. Patients whose TTRs were < 65%, even with frequent pharmacist interventions, had similar stroke or systemic embolism event rates, as compared with patients with TTRs < 65% and less frequent interventions (1.88 vs. 1.54 stroke or systemic embolism rates per 100 person-years, respectively, P = 0.78). The lowest TTR quartile (< 46%) was associated with a 3 times higher risk of stroke or systemic embolism (hazard ratio [HR] = 3.19, 95% CI = 2.71-3.77) and a 2 times higher risk of major bleeding (HR = 2.10, 95% CI = 1.96-2.24) compared with the highest TTR quartile (≥ 73%). CONCLUSIONS Despite close monitoring with timely warfarin dose adjustments, there were still a substantial number of challenging patients whose TTRs were suboptimal despite a higher number of pharmacist interventions. These patients eventually experienced more stroke or systemic embolism and bleeding events among NVAF patients managed by anticoagulation clinics. New individualized treatment or management strategies for patients who are not able to reach optimal therapeutic ranges are necessary to improve outcomes. DISCLOSURES This research and manuscript were funded by Bristol-Myers Squibb Company and Pfizer. Authors from Bristol-Myers Squibb Company and Pfizer participated in the design of the study, interpretation of the data, review/revision of the manuscript, and approval of the final version of the manuscript. An received a grant for research support from Bristol-Myers Squibb/Pfizer. Niu, Rashid, and Zheng received a grant from Bristol-Myers Squibb/Pfizer to their institutions for salary reimbursement. Vo, Singh, and Aranda are employed by Bristol-Myers Squibb; Bruno was employed by Bristol-Myers Squibb at the time of this study. Mendes and Dills are employed by Pfizer, and Mendes was a member of the Pfizer Cardiovascular and Metabolic Field Medical Team during the time of this study. Lang, Jazdzewski, and Le have no known conflicts of interest to report. Study concept and design were contributed primarily by An and Rashid, along with the other authors. Niu took the lead in data collection, along with Zheng, and data interpretation was performed by An, along with Mendes and Dills, with assistance from the other authors. The manuscript was written by An and revised by Mendes, Dills, Vo, Singh, Bruno, and Aranda, along with Lang, Le, and Jazdezewski. Part of this study's findings was presented at the CHEST 2015 Annual Meeting in Montreal, Canada, on October 28, 2015.
Collapse
Affiliation(s)
- JaeJin An
- 1 Department of Pharmacy Practice and Administration, College of Pharmacy, Western University of Health Sciences, Pomona, California
| | - Fang Niu
- 2 Drug Information Services, Kaiser Permanente Southern California, Downey
| | - Chengyi Zheng
- 4 Research and Evaluation, Kaiser Permanente Southern California, Pasadena
| | - Nazia Rashid
- 2 Drug Information Services, Kaiser Permanente Southern California, Downey
| | | | - Diana Dills
- 5 North America Medical Affairs, Pfizer, New York, New York
| | - Lien Vo
- 6 Health Economic Outcomes Research, Bristol-Myers Squibb, Plainsboro, New Jersey
| | - Prianka Singh
- 6 Health Economic Outcomes Research, Bristol-Myers Squibb, Plainsboro, New Jersey
| | - Amanda Bruno
- 6 Health Economic Outcomes Research, Bristol-Myers Squibb, Plainsboro, New Jersey
| | - Daniel T Lang
- 7 Los Angeles Medical Center, The Permanente Medical Group, Kaiser Permanente Southern California, Los Angeles
| | - Paul T Le
- 3 Medication Therapy Management, Kaiser Permanente Southern California, Downey
| | | | - Gustavus Aranda
- 6 Health Economic Outcomes Research, Bristol-Myers Squibb, Plainsboro, New Jersey
| |
Collapse
|
16
|
Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H. Clinical information extraction applications: A literature review. J Biomed Inform 2018; 77:34-49. [PMID: 29162496 PMCID: PMC5771858 DOI: 10.1016/j.jbi.2017.11.011] [Citation(s) in RCA: 364] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 11/01/2017] [Accepted: 11/17/2017] [Indexed: 12/24/2022]
Abstract
BACKGROUND With the rapid adoption of electronic health records (EHRs), it is desirable to harvest information and knowledge from EHRs to support automated systems at the point of care and to enable secondary use of EHRs for clinical and translational research. One critical component used to facilitate the secondary use of EHR data is the information extraction (IE) task, which automatically extracts and encodes clinical information from text. OBJECTIVES In this literature review, we present a review of recent published research on clinical information extraction (IE) applications. METHODS A literature search was conducted for articles published from January 2009 to September 2016 based on Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and ACM Digital Library. RESULTS A total of 1917 publications were identified for title and abstract screening. Of these publications, 263 articles were selected and discussed in this review in terms of publication venues and data sources, clinical IE tools, methods, and applications in the areas of disease- and drug-related studies, and clinical workflow optimizations. CONCLUSIONS Clinical IE has been used for a wide range of applications, however, there is a considerable gap between clinical studies using EHR data and studies using clinical IE. This study enabled us to gain a more concrete understanding of the gap and to provide potential solutions to bridge this gap.
Collapse
Affiliation(s)
- Yanshan Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Liwei Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Majid Rastegar-Mojarad
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sungrim Moon
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Feichen Shen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Naveed Afzal
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sijia Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Yuqun Zeng
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Saeed Mehrabi
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sunghwan Sohn
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.
| |
Collapse
|
17
|
Kreimeyer K, Foster M, Pandey A, Arya N, Halford G, Jones SF, Forshee R, Walderhaug M, Botsis T. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review. J Biomed Inform 2017; 73:14-29. [PMID: 28729030 DOI: 10.1016/j.jbi.2017.07.012] [Citation(s) in RCA: 313] [Impact Index Per Article: 39.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 06/07/2017] [Accepted: 07/14/2017] [Indexed: 12/24/2022]
Abstract
We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP.
Collapse
Affiliation(s)
- Kory Kreimeyer
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States.
| | - Matthew Foster
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Abhishek Pandey
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Nina Arya
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Gwendolyn Halford
- FDA Library, US Food and Drug Administration, Silver Spring, MD, United States
| | - Sandra F Jones
- Cancer Surveillance Branch, Division of Cancer Prevention and Control, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Richard Forshee
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Mark Walderhaug
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Taxiarchis Botsis
- Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| |
Collapse
|
18
|
Berger ML, Curtis MD, Smith G, Harnett J, Abernethy AP. Opportunities and challenges in leveraging electronic health record data in oncology. Future Oncol 2016; 12:1261-74. [PMID: 27096309 DOI: 10.2217/fon-2015-0043] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The widespread adoption of electronic health records (EHRs) and the growing wealth of digitized information sources about patients is ushering in an era of 'Big Data' that may revolutionize clinical research in oncology. Research will likely be more efficient and potentially more accurate than the current gold standard of manual chart review studies. However, EHRs as they exist today have significant limitations: important data elements are missing or are only captured in free text or PDF documents. Using two case studies, we illustrate the challenges of leveraging the data that are routinely collected by the healthcare system in EHRs (e.g., real-world data), specific challenges encountered in the cancer domain and opportunities that can be achieved when these are overcome.
Collapse
Affiliation(s)
- Marc L Berger
- Pfizer Inc., 235 East 42nd Street, New York, NY 10017, USA
| | | | - Gregory Smith
- Pfizer Inc., 235 East 42nd Street, New York, NY 10017, USA
| | - James Harnett
- Pfizer Inc., 235 East 42nd Street, New York, NY 10017, USA
| | | |
Collapse
|