1
|
Sim JA, Huang X, Horan MR, Stewart CM, Robison LL, Hudson MM, Baker JN, Huang IC. Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review. Artif Intell Med 2023; 146:102701. [PMID: 38042599 PMCID: PMC10693655 DOI: 10.1016/j.artmed.2023.102701] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 09/30/2023] [Accepted: 10/29/2023] [Indexed: 12/04/2023]
Abstract
OBJECTIVE Natural language processing (NLP) combined with machine learning (ML) techniques are increasingly used to process unstructured/free-text patient-reported outcome (PRO) data available in electronic health records (EHRs). This systematic review summarizes the literature reporting NLP/ML systems/toolkits for analyzing PROs in clinical narratives of EHRs and discusses the future directions for the application of this modality in clinical care. METHODS We searched PubMed, Scopus, and Web of Science for studies written in English between 1/1/2000 and 12/31/2020. Seventy-nine studies meeting the eligibility criteria were included. We abstracted and summarized information related to the study purpose, patient population, type/source/amount of unstructured PRO data, linguistic features, and NLP systems/toolkits for processing unstructured PROs in EHRs. RESULTS Most of the studies used NLP/ML techniques to extract PROs from clinical narratives (n = 74) and mapped the extracted PROs into specific PRO domains for phenotyping or clustering purposes (n = 26). Some studies used NLP/ML to process PROs for predicting disease progression or onset of adverse events (n = 22) or developing/validating NLP/ML pipelines for analyzing unstructured PROs (n = 19). Studies used different linguistic features, including lexical, syntactic, semantic, and contextual features, to process unstructured PROs. Among the 25 NLP systems/toolkits we identified, 15 used rule-based NLP, 6 used hybrid NLP, and 4 used non-neural ML algorithms embedded in NLP. CONCLUSIONS This study supports the potential utility of different NLP/ML techniques in processing unstructured PROs available in EHRs for clinical care. Though using annotation rules for NLP/ML to analyze unstructured PROs is dominant, deploying novel neural ML-based methods is warranted.
Collapse
Affiliation(s)
- Jin-Ah Sim
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States; School of AI Convergence, Hallym University, Chuncheon, Republic of Korea
| | - Xiaolei Huang
- Department of Computer Science, University of Memphis, Memphis, TN, United States
| | - Madeline R Horan
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Christopher M Stewart
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States
| | - Leslie L Robison
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Melissa M Hudson
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States; Department of Oncology, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Justin N Baker
- Department of Pediatrics, Stanford University, Stanford, CA, United States
| | - I-Chan Huang
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States.
| |
Collapse
|
2
|
Botsis T, Kreimeyer K. Improving drug safety with adverse event detection using natural language processing. Expert Opin Drug Saf 2023; 22:659-668. [PMID: 37339273 DOI: 10.1080/14740338.2023.2228197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 06/19/2023] [Indexed: 06/22/2023]
Abstract
INTRODUCTION Pharmacovigilance (PV) involves monitoring and aggregating adverse event information from a variety of data sources, including health records, biomedical literature, spontaneous adverse event reports, product labels, and patient-generated content like social media posts, but the most pertinent details in these sources are typically available in narrative free-text formats. Natural language processing (NLP) techniques can be used to extract clinically relevant information from PV texts to inform decision-making. AREAS COVERED We conducted a non-systematic literature review by querying the PubMed database to examine the uses of NLP in drug safety and distilled the findings to present our expert opinion on the topic. EXPERT OPINION New NLP techniques and approaches continue to be applied for drug safety use cases; however, systems that are fully deployed and in use in a clinical environment remain vanishingly rare. To see high-performing NLP techniques implemented in the real setting will require long-term engagement with end users and other stakeholders and revised workflows in fully formulated business plans for the targeted use cases. Additionally, we found little to no evidence of extracted information placed into standardized data models, which should be a way to make implementations more portable and adaptable.
Collapse
Affiliation(s)
- Taxiarchis Botsis
- Department of Oncology, the Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kory Kreimeyer
- Department of Oncology, the Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
3
|
Carter RR, Chum AP, Sanchez R, Guha A, Dey AK, Reinbolt R, Kim L, Otchere P, Oppong‐Nkrumah O, Abraham WT, Lustberg M, Addison D. Hypertensive events after the initiation of contemporary cancer therapies for breast cancer control. Cancer Med 2023; 12:297-305. [PMID: 35633055 PMCID: PMC9844596 DOI: 10.1002/cam4.4862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 05/03/2022] [Accepted: 05/11/2022] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Contemporary therapies improve breast cancer (BC) outcomes. Yet, many of these therapies have been increasingly linked with serious cardiotoxicity, including reports of profound hypertension. Yet, the incidence, predictors, and impacts of these events are largely unknown. METHODS Leveraging two large U.S.-based registries, the National Inpatient Sample (NIS) and the Food and Drug Administration Adverse Event Reporting System (FAERS) databases, we assessed the incidence, factors, and outcomes of hypertensive events among BC patients from 2007 to 2015. Differences in baseline characteristics, hypertension-related discharges, and complications were examined over time. Further, we performed a disproportionality analysis using reporting-odds-ratios (ROR) to determine the association between individual BC drugs and hypertensive events. Utilizing an ROR cutoff of >1.0, we quantified associations by drug-class, and individual drugs with the likelihood of excess hypertension. RESULTS Overall, there were 5,464,401 BC-admissions, of which 46,989 (0.8%) presented with hypertension. Hypertensive BC patients were older, and saw initially increased in-hospital mortality, which equilibrated over time. The mean incidence of hypertension-related admissions was 732 per 100,000 among BC patients, versus 96 per 100,000 among non-cancer patients (RR 7.71, p < 0.001). Moreover, in FAERS, those with hypertension versus other BC-treatment side-effects were more frequently hospitalized (40.1% vs. 36.7%, p < 0.001), and were most commonly associated with chemotherapy (45.9%). Outside of Eribulin (ROR 3.36; 95% CI 1.37-8.22), no specific drug was associated with a higher reporting of hypertension; however, collectively BC drugs were associated with a higher odds of hypertension (ROR 1.66; 95% CI 1.09-2.53). CONCLUSIONS BC therapies are associated with a substantial increase in limiting hypertension.
Collapse
Affiliation(s)
- Rebecca R. Carter
- Cardio‐Oncology Program, Division of CardiologyOhio State UniversityColumbusOhioUSA
- The Center for the Advancement of Team Science, Analytics, and Systems Thinking (CATALYST)Ohio State UniversityColumbusOhioUSA
| | - Aaron P. Chum
- Cardio‐Oncology Program, Division of CardiologyOhio State UniversityColumbusOhioUSA
| | - Reynaldo Sanchez
- Cardio‐Oncology Program, Division of CardiologyOhio State UniversityColumbusOhioUSA
| | - Avirup Guha
- Cardio‐Oncology Program, Division of CardiologyOhio State UniversityColumbusOhioUSA
- Harrington Heart and Vascular InstituteCase Western Reserve UniversityClevelandOhioUSA
| | - Amit K. Dey
- National Heart Lung and Blood InstituteBethesdaMarylandUSA
| | - Raquel Reinbolt
- Solove Research InstituteThe Ohio State University Comprehensive Cancer Center – James Cancer HospitalColumbusOhioUSA
| | - Lisa Kim
- Cardio‐Oncology Program, Division of CardiologyOhio State UniversityColumbusOhioUSA
| | - Prince Otchere
- Cardio‐Oncology Program, Division of CardiologyOhio State UniversityColumbusOhioUSA
| | - Oduro Oppong‐Nkrumah
- Cardio‐Oncology Program, Division of CardiologyOhio State UniversityColumbusOhioUSA
| | - William T. Abraham
- Cardio‐Oncology Program, Division of CardiologyOhio State UniversityColumbusOhioUSA
| | - Maryam Lustberg
- Solove Research InstituteThe Ohio State University Comprehensive Cancer Center – James Cancer HospitalColumbusOhioUSA
| | - Daniel Addison
- Cardio‐Oncology Program, Division of CardiologyOhio State UniversityColumbusOhioUSA
- Cancer Control Program, Department of MedicineOhio State University Comprehensive Cancer CenterColumbusOhioUSA
| |
Collapse
|
4
|
Lindvall C, Deng CY, Agaronnik ND, Kwok A, Samineni S, Umeton R, Mackie-Jenkins W, Kehl KL, Tulsky JA, Enzinger AC. Deep Learning for Cancer Symptoms Monitoring on the Basis of Electronic Health Record Unstructured Clinical Notes. JCO Clin Cancer Inform 2022; 6:e2100136. [PMID: 35714301 PMCID: PMC9232368 DOI: 10.1200/cci.21.00136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Symptoms are vital outcomes for cancer clinical trials, observational research, and population-level surveillance. Patient-reported outcomes (PROs) are valuable for monitoring symptoms, yet there are many challenges to collecting PROs at scale. We sought to develop, test, and externally validate a deep learning model to extract symptoms from unstructured clinical notes in the electronic health record. METHODS We randomly selected 1,225 outpatient progress notes from among patients treated at the Dana-Farber Cancer Institute between January 2016 and December 2019 and used 1,125 notes as our training/validation data set and 100 notes as our test data set. We evaluated the performance of 10 deep learning models for detecting 80 symptoms included in the National Cancer Institute's Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) framework. Model performance as compared with manual chart abstraction was assessed using standard metrics, and the highest performer was externally validated on a sample of 100 physician notes from a different clinical context. RESULTS In our training and test data sets, 75 of the 80 candidate symptoms were identified. The ELECTRA-small model had the highest performance for symptom identification at the token level (ie, at the individual symptom level), with an F1 of 0.87 and a processing time of 3.95 seconds per note. For the 10 most common symptoms in the test data set, the F1 score ranged from 0.98 for anxious to 0.86 for fatigue. For external validation of the same symptoms, the note-level performance ranged from F1 = 0.97 for diarrhea and dizziness to F1 = 0.73 for swelling. CONCLUSION Training a deep learning model to identify a wide range of electronic health record-documented symptoms relevant to cancer care is feasible. This approach could be used at the health system scale to complement to electronic PROs.
Collapse
Affiliation(s)
- Charlotta Lindvall
- Dana-Farber Cancer Institute, Boston, MA.,Harvard Medical School, Boston, MA.,Brigham and Women's Hospital, Boston, MA
| | | | - Nicole D Agaronnik
- Dana-Farber Cancer Institute, Boston, MA.,Harvard Medical School, Boston, MA
| | - Anne Kwok
- Dana-Farber Cancer Institute, Boston, MA
| | | | | | | | - Kenneth L Kehl
- Dana-Farber Cancer Institute, Boston, MA.,Harvard Medical School, Boston, MA.,Brigham and Women's Hospital, Boston, MA
| | - James A Tulsky
- Dana-Farber Cancer Institute, Boston, MA.,Harvard Medical School, Boston, MA.,Brigham and Women's Hospital, Boston, MA
| | - Andrea C Enzinger
- Dana-Farber Cancer Institute, Boston, MA.,Harvard Medical School, Boston, MA.,Brigham and Women's Hospital, Boston, MA
| |
Collapse
|
5
|
Shin H, Kim N, Cha J, Kim GJ, Kim JH, Kim JY, Lee S. Geriatrics on beers criteria medications at risk of adverse drug events using real-world data. Int J Med Inform 2021; 154:104542. [PMID: 34411951 DOI: 10.1016/j.ijmedinf.2021.104542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 05/31/2021] [Accepted: 06/17/2021] [Indexed: 11/30/2022]
Abstract
OBJECTIVES The established Beers Criteria consider side effects and safety concerns when prescribing drugs to the elderly. As the criteria suggest that attention should be paid toward prescriptions rather than prescription prohibition lists, these Beers Criteria medications (BCMs) are used appropriately under unavoidable circumstances. METHODS Patients aged ≥ 65 years and with an experience of being prescribed inappropriate medications at Konyang University Hospital, South Korea, were selected. We analyzed data from the Korea Adverse Event Reporting System (KAERS) and the Food and Drug Administration Adverse Event Reporting System (FAERS) of the United States to identify medication-induced adverse drug events (ADEs). The actual incidence was predicted by multiplying the incidence and number of BCMs prescribed to the patients. The proportional reporting ratio (PRR) and reporting odds ratio (ROR) were calculated using KAERS and FAERS data. RESULTS We predicted that the incidence of ADEs would be higher for metoclopramide, chlorpheniramine, and amitriptyline in patients using medications for more than 1 day and metoclopramide, chlorpheniramine, and ketoprofen in patients using medications only for 1 day. Among the ADEs reported to KAERS and FAERS, significant ROR and PRR values were noted for clonazepam (drowsiness), nortriptyline (sleepiness), and zolpidem (amnesia, somnambulism, agitation, dependence, nightmare, and dysgeusia). CONCLUSION This study highlighted the actual status of BCM prescriptions in clinical institutions and predicted the incidence of ADEs. We concluded that greater care must be taken while prescribing BCMs to the elderly and indicators, such as PRR and ROR should be monitored regularly.
Collapse
Affiliation(s)
- Hyunah Shin
- Health Care Data Science Center, Konyang University Hospital, Daejeon, Republic of Korea
| | - Nanyeong Kim
- Departments of Biomedical Informatics, Konyang University College of Medicine, Daejeon, Republic of Korea
| | - Jaehun Cha
- Health Care Data Science Center, Konyang University Hospital, Daejeon, Republic of Korea
| | - Grace Juyun Kim
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Ju Han Kim
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Jong-Yeup Kim
- Health Care Data Science Center, Konyang University Hospital, Daejeon, Republic of Korea; Departments of Biomedical Informatics, Konyang University College of Medicine, Daejeon, Republic of Korea
| | - Suehyun Lee
- Health Care Data Science Center, Konyang University Hospital, Daejeon, Republic of Korea; Departments of Biomedical Informatics, Konyang University College of Medicine, Daejeon, Republic of Korea.
| |
Collapse
|
6
|
Ni Y, Bachtel A, Nause K, Beal S. Automated detection of substance use information from electronic health records for a pediatric population. J Am Med Inform Assoc 2021; 28:2116-2127. [PMID: 34333636 PMCID: PMC8449626 DOI: 10.1093/jamia/ocab116] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 05/06/2021] [Accepted: 05/26/2021] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE Substance use screening in adolescence is unstandardized and often documented in clinical notes, rather than in structured electronic health records (EHRs). The objective of this study was to integrate logic rules with state-of-the-art natural language processing (NLP) and machine learning technologies to detect substance use information from both structured and unstructured EHR data. MATERIALS AND METHODS Pediatric patients (10-20 years of age) with any encounter between July 1, 2012, and October 31, 2017, were included (n = 3890 patients; 19 478 encounters). EHR data were extracted at each encounter, manually reviewed for substance use (alcohol, tobacco, marijuana, opiate, any use), and coded as lifetime use, current use, or family use. Logic rules mapped structured EHR indicators to screening results. A knowledge-based NLP system and a deep learning model detected substance use information from unstructured clinical narratives. System performance was evaluated using positive predictive value, sensitivity, negative predictive value, specificity, and area under the receiver-operating characteristic curve (AUC). RESULTS The dataset included 17 235 structured indicators and 27 141 clinical narratives. Manual review of clinical narratives captured 94.0% of positive screening results, while structured EHR data captured 22.0%. Logic rules detected screening results from structured data with 1.0 and 0.99 for sensitivity and specificity, respectively. The knowledge-based system detected substance use information from clinical narratives with 0.86, 0.79, and 0.88 for AUC, sensitivity, and specificity, respectively. The deep learning model further improved detection capacity, achieving 0.88, 0.81, and 0.85 for AUC, sensitivity, and specificity, respectively. Finally, integrating predictions from structured and unstructured data achieved high detection capacity across all cases (0.96, 0.85, and 0.87 for AUC, sensitivity, and specificity, respectively). CONCLUSIONS It is feasible to detect substance use screening and results among pediatric patients using logic rules, NLP, and machine learning technologies.
Collapse
Affiliation(s)
- Yizhao Ni
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, Ohio, USA
- Corresponding Author: Yizhao Ni, PhD, Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Department of Pediatrics, University of Cincinnati, 3333 Burnet Avenue, Cincinnati, OH 45229-3039, USA;
| | - Alycia Bachtel
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| | - Katie Nause
- Division of Psychology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| | - Sarah Beal
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, Ohio, USA
- Division of Psychology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| |
Collapse
|
7
|
Geva A, Abman SH, Manzi SF, Ivy DD, Mullen MP, Griffin J, Lin C, Savova GK, Mandl KD. Adverse drug event rates in pediatric pulmonary hypertension: a comparison of real-world data sources. J Am Med Inform Assoc 2021; 27:294-300. [PMID: 31769835 PMCID: PMC7025334 DOI: 10.1093/jamia/ocz194] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 10/08/2019] [Accepted: 10/21/2019] [Indexed: 11/14/2022] Open
Abstract
Objective Real-world data (RWD) are increasingly used for pharmacoepidemiology and regulatory innovation. Our objective was to compare adverse drug event (ADE) rates determined from two RWD sources, electronic health records and administrative claims data, among children treated with drugs for pulmonary hypertension. Materials and Methods Textual mentions of medications and signs/symptoms that may represent ADEs were identified in clinical notes using natural language processing. Diagnostic codes for the same signs/symptoms were identified in our electronic data warehouse for the patients with textual evidence of taking pulmonary hypertension-targeted drugs. We compared rates of ADEs identified in clinical notes to those identified from diagnostic code data. In addition, we compared putative ADE rates from clinical notes to those from a healthcare claims dataset from a large, national insurer. Results Analysis of clinical notes identified up to 7-fold higher ADE rates than those ascertained from diagnostic codes. However, certain ADEs (eg, hearing loss) were more often identified in diagnostic code data. Similar results were found when ADE rates ascertained from clinical notes and national claims data were compared. Discussion While administrative claims and clinical notes are both increasingly used for RWD-based pharmacovigilance, ADE rates substantially differ depending on data source. Conclusion Pharmacovigilance based on RWD may lead to discrepant results depending on the data source analyzed. Further work is needed to confirm the validity of identified ADEs, to distinguish them from disease effects, and to understand tradeoffs in sensitivity and specificity between data sources.
Collapse
Affiliation(s)
- Alon Geva
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Anaesthesia, Harvard Medical School, Boston, Massachusetts, USA
| | - Steven H Abman
- Division of Pediatric Pulmonary Medicine, Children's Hospital Colorado, Aurora, Colorado, USA.,Department of Pediatrics, University of Colorado School of Medicine, Aurora, Colorado, USA
| | - Shannon F Manzi
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Division of Genetics & Genomics, Clinical Pharmacogenomics Service, Department of Pharmacy, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Dunbar D Ivy
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, Colorado, USA.,Division of Cardiology, Heart Institute, Children's Hospital Colorado, Aurora, Colorado, USA
| | - Mary P Mullen
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Cardiology, Boston Children's Hospital, Boston, Massachusetts, USA
| | - John Griffin
- Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Chen Lin
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
8
|
Yu Y, Ruddy KJ, Wen A, Zong N, Tsuji S, Chen J, Shah ND, Jiang G. Integrating Electronic Health Record Data into the ADEpedia-on-OHDSI Platform for Improved Signal Detection: A Case Study of Immune-related Adverse Events. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020; 2020:710-719. [PMID: 32477694 PMCID: PMC7233056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
With widespread adoption of electronic health records (EHRs), Real World Data and Real World Evidence (RWE) have been increasingly used by FDA for evaluating drug safety and effectiveness. However, integration of heterogeneous drug safety data sources and systems remains an impediment for effective pharmacovigilance studies. In an ongoing project, we have developed a next generation pharmacovigilance signal detection framework known as ADEpedia-on-OHDSI using the OMOP common data model (CDM). The objective of the study is to demonstrate the feasibility of the framework for integrating both spontaneous reporting data and EHR data for improved signal detection with a case study of immune-related adverse events. We first loaded the OMOP CDM with both recent and legacy FAERS (FDA Adverse Event Reporting System) data (from the time period between Jan. 2004 and Dec. 2018). We also integrated the clinical data from the Mayo Clinic EHR system for six oncological immunotherapy drugs. We implemented a signal detection algorithm and compared the timelines of positive signals detected from both FAERS and EHR data. We found that the signals detected from EHRs are 4 months earlier than signals detected from FAERS database (depending on the signal detection methods used) for the ipilimumab-induced hypopituitarism. Our CDM-based approach would be useful to provide a scalable solution to integrate both drug safety data and EHR data to generate RWE for improved signal detection.
Collapse
Affiliation(s)
- Yue Yu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | | | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Nansu Zong
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Shintaro Tsuji
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Jun Chen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Nilay D Shah
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
- Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| |
Collapse
|
9
|
Ni Y, Barzman D, Bachtel A, Griffey M, Osborn A, Sorter M. Finding warning markers: Leveraging natural language processing and machine learning technologies to detect risk of school violence. Int J Med Inform 2020; 139:104137. [PMID: 32361146 DOI: 10.1016/j.ijmedinf.2020.104137] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 02/20/2020] [Accepted: 03/28/2020] [Indexed: 10/24/2022]
Abstract
INTRODUCTION School violence has a far-reaching effect, impacting the entire school population including staff, students and their families. Among youth attending the most violent schools, studies have reported higher dropout rates, poor school attendance, and poor scholastic achievement. It was noted that the largest crime-prevention results occurred when youth at elevated risk were given an individualized prevention program. However, much work is needed to establish an effective approach to identify at-risk subjects. OBJECTIVE In our earlier research, we developed a risk assessment program to interview subjects, identify risk and protective factors, and evaluate risk for school violence. This study focused on developing natural language processing (NLP) and machine learning technologies to automate the risk assessment process. MATERIAL AND METHODS We prospectively recruited 131 students with or without behavioral concerns from 89 schools between 05/01/2015 and 04/30/2018. The subjects were interviewed with two risk assessment scales and a questionnaire, and their risk of violence were determined by pediatric psychiatrists based on clinical judgment. Using NLP technologies, different types of linguistic features were extracted from the interview content. Machine learning classifiers were then applied to predict risk of school violence for individual subjects. A two-stage feature selection was implemented to identify violence-related predictors. The performance was validated on the psychiatrist-generated reference standard of risk levels, where positive predictive value (PPV), sensitivity (SEN), negative predictive value (NPV), specificity (SPEC) and area under the ROC curve (AUC) were assessed. RESULTS Compared to subjects' sociodemographic information, use of linguistic features significantly improved classifiers' predictive performance (P < 0.01). The best-performing classifier with n-gram features achieved 86.5 %/86.5 %/85.7 %/85.7 %/94.0 % (PPV/SEN/NPV/SPEC/AUC) on the cross-validation set and 83.3 %/93.8 %/91.7 %/78.6 %/94.6 % (PPV/SEN/NPV/SPEC/AUC) on the test data. The feature selection process identified a set of predictors covering the discussion of subjects' thoughts, perspectives, behaviors, individual characteristics, peers and family dynamics, and protective factors. CONCLUSIONS By analyzing the content from subject interviews, the NLP and machine learning algorithms showed good capacity for detecting risk of school violence. The feature selection uncovered multiple warning markers that could deliver useful clinical insights to assist personalizing intervention. Consequently, the developed approach offered the promise of an accurate and scalable computerized screening service for preventing school violence.
Collapse
Affiliation(s)
- Yizhao Ni
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States; Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States.
| | - Drew Barzman
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States; Division of Child and Adolescent Psychiatry, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Alycia Bachtel
- Division of Child and Adolescent Psychiatry, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Marcus Griffey
- Division of Child and Adolescent Psychiatry, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Alexander Osborn
- Division of Child and Adolescent Psychiatry, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Michael Sorter
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States; Division of Child and Adolescent Psychiatry, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| |
Collapse
|
10
|
Koleck TA, Dreisbach C, Bourne PE, Bakken S. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc 2020; 26:364-379. [PMID: 30726935 DOI: 10.1093/jamia/ocy173] [Citation(s) in RCA: 182] [Impact Index Per Article: 45.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/20/2018] [Accepted: 11/27/2018] [Indexed: 12/26/2022] Open
Abstract
OBJECTIVE Natural language processing (NLP) of symptoms from electronic health records (EHRs) could contribute to the advancement of symptom science. We aim to synthesize the literature on the use of NLP to process or analyze symptom information documented in EHR free-text narratives. MATERIALS AND METHODS Our search of 1964 records from PubMed and EMBASE was narrowed to 27 eligible articles. Data related to the purpose, free-text corpus, patients, symptoms, NLP methodology, evaluation metrics, and quality indicators were extracted for each study. RESULTS Symptom-related information was presented as a primary outcome in 14 studies. EHR narratives represented various inpatient and outpatient clinical specialties, with general, cardiology, and mental health occurring most frequently. Studies encompassed a wide variety of symptoms, including shortness of breath, pain, nausea, dizziness, disturbed sleep, constipation, and depressed mood. NLP approaches included previously developed NLP tools, classification methods, and manually curated rule-based processing. Only one-third (n = 9) of studies reported patient demographic characteristics. DISCUSSION NLP is used to extract information from EHR free-text narratives written by a variety of healthcare providers on an expansive range of symptoms across diverse clinical specialties. The current focus of this field is on the development of methods to extract symptom information and the use of symptom information for disease classification tasks rather than the examination of symptoms themselves. CONCLUSION Future NLP studies should concentrate on the investigation of symptoms and symptom documentation in EHR free-text narratives. Efforts should be undertaken to examine patient characteristics and make symptom-related NLP algorithms or pipelines and vocabularies openly available.
Collapse
Affiliation(s)
| | - Caitlin Dreisbach
- School of Nursing, University of Virginia, Charlottesville, Virginia, USA.,Data Science Institute, University of Virginia, Charlottesville, Virginia, USA
| | - Philip E Bourne
- Data Science Institute, University of Virginia, Charlottesville, Virginia, USA
| | - Suzanne Bakken
- School of Nursing, Columbia University, New York, New York, USA.,Department of Biomedical Informatics, Columbia University, New York, New York, USA.,Data Science Institute, Columbia University, New York, New York, USA
| |
Collapse
|
11
|
Wong A, Plasek JM, Montecalvo SP, Zhou L. Natural Language Processing and Its Implications for the Future of Medication Safety: A Narrative Review of Recent Advances and Challenges. Pharmacotherapy 2018; 38:822-841. [PMID: 29884988 DOI: 10.1002/phar.2151] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The safety of medication use has been a priority in the United States since the late 1930s. Recently, it has gained prominence due to the increasing amount of data suggesting that a large amount of patient harm is preventable and can be mitigated with effective risk strategies that have not been sufficiently adopted. Adverse events from medications are part of clinical practice, but the ability to identify a patient's risk and to minimize that risk must be a priority. The ability to identify adverse events has been a challenge due to limitations of available data sources, which are often free text. The use of natural language processing (NLP) may help to address these limitations. NLP is the artificial intelligence domain of computer science that uses computers to manipulate unstructured data (i.e., narrative text or speech data) in the context of a specific task. In this narrative review, we illustrate the fundamentals of NLP and discuss NLP's application to medication safety in four data sources: electronic health records, Internet-based data, published literature, and reporting systems. Given the magnitude of available data from these sources, a growing area is the use of computer algorithms to help automatically detect associations between medications and adverse effects. The main benefit of NLP is in the time savings associated with automation of various medication safety tasks such as the medication reconciliation process facilitated by computers, as well as the potential for near-real-time identification of adverse events for postmarketing surveillance such as those posted on social media that would otherwise go unanalyzed. NLP is limited by a lack of data sharing between health care organizations due to insufficient interoperability capabilities, inhibiting large-scale adverse event monitoring across populations. We anticipate that future work in this area will focus on the integration of data sources from different domains to improve the ability to identify potential adverse events more quickly and to improve clinical decision support with regard to a patient's estimated risk for specific adverse events at the time of medication prescription or review.
Collapse
Affiliation(s)
- Adrian Wong
- Department of Pharmacy and Therapeutics, University of Pittsburgh, Pittsburgh, Pennsylvania.,Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts
| | - Joseph M Plasek
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts.,Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, Utah
| | | | - Li Zhou
- Harvard Medical School, Boston, Massachusetts
| |
Collapse
|