1
|
Classification of neurologic outcomes from medical notes using natural language processing. EXPERT SYSTEMS WITH APPLICATIONS 2023; 214:119171. [PMID: 36865787 PMCID: PMC9974159 DOI: 10.1016/j.eswa.2022.119171] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Neurologic disability level at hospital discharge is an important outcome in many clinical research studies. Outside of clinical trials, neurologic outcomes must typically be extracted by labor intensive manual review of clinical notes in the electronic health record (EHR). To overcome this challenge, we set out to develop a natural language processing (NLP) approach that automatically reads clinical notes to determine neurologic outcomes, to make it possible to conduct larger scale neurologic outcomes studies. We obtained 7314 notes from 3632 patients hospitalized at two large Boston hospitals between January 2012 and June 2020, including discharge summaries (3485), occupational therapy (1472) and physical therapy (2357) notes. Fourteen clinical experts reviewed notes to assign scores on the Glasgow Outcome Scale (GOS) with 4 classes, namely 'good recovery', 'moderate disability', 'severe disability', and 'death' and on the Modified Rankin Scale (mRS), with 7 classes, namely 'no symptoms', 'no significant disability', 'slight disability', 'moderate disability', 'moderately severe disability', 'severe disability', and 'death'. For 428 patients' notes, 2 experts scored the cases generating interrater reliability estimates for GOS and mRS. After preprocessing and extracting features from the notes, we trained a multiclass logistic regression model using LASSO regularization and 5-fold cross validation for hyperparameter tuning. The model performed well on the test set, achieving a micro average area under the receiver operating characteristic and F-score of 0.94 (95% CI 0.93-0.95) and 0.77 (0.75-0.80) for GOS, and 0.90 (0.89-0.91) and 0.59 (0.57-0.62) for mRS, respectively. Our work demonstrates that an NLP algorithm can accurately assign neurologic outcomes based on free text clinical notes. This algorithm increases the scale of research on neurological outcomes that is possible with EHR data.
Collapse
|
2
|
Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition. Database (Oxford) 2023; 2023:baac108. [PMID: 36734300 PMCID: PMC9896308 DOI: 10.1093/database/baac108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 10/28/2022] [Accepted: 12/13/2022] [Indexed: 02/04/2023]
Abstract
This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user's timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user's timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/. The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.
Collapse
|
3
|
MedLexSp - a medical lexicon for Spanish medical natural language processing. J Biomed Semantics 2023; 14:2. [PMID: 36732862 PMCID: PMC9892682 DOI: 10.1186/s13326-022-00281-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 12/03/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Medical lexicons enable the natural language processing (NLP) of health texts. Lexicons gather terms and concepts from thesauri and ontologies, and linguistic data for part-of-speech (PoS) tagging, lemmatization or natural language generation. To date, there is no such type of resource for Spanish. CONSTRUCTION AND CONTENT This article describes an unified medical lexicon for Medical Natural Language Processing in Spanish. MedLexSp includes terms and inflected word forms with PoS information and Unified Medical Language System[Formula: see text] (UMLS) semantic types, groups and Concept Unique Identifiers (CUIs). To create it, we used NLP techniques and domain corpora (e.g. MedlinePlus). We also collected terms from the Dictionary of Medical Terms from the Spanish Royal Academy of Medicine, the Medical Subject Headings (MeSH), the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT), the Medical Dictionary for Regulatory Activities Terminology (MedDRA), the International Classification of Diseases vs. 10, the Anatomical Therapeutic Chemical Classification, the National Cancer Institute (NCI) Dictionary, the Online Mendelian Inheritance in Man (OMIM) and OrphaData. Terms related to COVID-19 were assembled by applying a similarity-based approach with word embeddings trained on a large corpus. MedLexSp includes 100 887 lemmas, 302 543 inflected forms (conjugated verbs, and number/gender variants), and 42 958 UMLS CUIs. We report two use cases of MedLexSp. First, applying the lexicon to pre-annotate a corpus of 1200 texts related to clinical trials. Second, PoS tagging and lemmatizing texts about clinical cases. MedLexSp improved the scores for PoS tagging and lemmatization compared to the default Spacy and Stanza python libraries. CONCLUSIONS The lexicon is distributed in a delimiter-separated value file; an XML file with the Lexical Markup Framework; a lemmatizer module for the Spacy and Stanza libraries; and complementary Lexical Record (LR) files. The embeddings and code to extract COVID-19 terms, and the Spacy and Stanza lemmatizers enriched with medical terms are provided in a public repository.
Collapse
|
4
|
MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022 Monkeypox Outbreak, Findings from Analysis of Tweets, and Open Research Questions. Infect Dis Rep 2022; 14:855-883. [PMID: 36412745 PMCID: PMC9680479 DOI: 10.3390/idr14060087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/13/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022] Open
Abstract
The mining of Tweets to develop datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters has been of significant interest to the scientific community in the recent past, as such datasets serve as a rich data resource for the investigation of different research questions. Furthermore, the virus outbreaks of the past, such as COVID-19, Ebola, Zika virus, and flu, just to name a few, were associated with various works related to the analysis of the multimodal components of Tweets to infer the different characteristics of conversations on Twitter related to these respective outbreaks. The ongoing outbreak of the monkeypox virus, declared a Global Public Health Emergency (GPHE) by the World Health Organization (WHO), has resulted in a surge of conversations about this outbreak on Twitter, which is resulting in the generation of tremendous amounts of Big Data. There has been no prior work in this field thus far that has focused on mining such conversations to develop a Twitter dataset. Furthermore, no prior work has focused on performing a comprehensive analysis of Tweets about this ongoing outbreak. To address these challenges, this work makes three scientific contributions to this field. First, it presents an open-access dataset of 556,427 Tweets about monkeypox that have been posted on Twitter since the first detected case of this outbreak. A comparative study is also presented that compares this dataset with 36 prior works in this field that focused on the development of Twitter datasets to further uphold the novelty, relevance, and usefulness of this dataset. Second, the paper reports the results of a comprehensive analysis of the Tweets of this dataset. This analysis presents several novel findings; for instance, out of all the 34 languages supported by Twitter, English has been the most used language to post Tweets about monkeypox, about 40,000 Tweets related to monkeypox were posted on the day WHO declared monkeypox as a GPHE, a total of 5470 distinct hashtags have been used on Twitter about this outbreak out of which #monkeypox is the most used hashtag, and Twitter for iPhone has been the leading source of Tweets about the outbreak. The sentiment analysis of the Tweets was also performed, and the results show that despite a lot of discussions, debate, opinions, information, and misinformation, on Twitter on various topics in this regard, such as monkeypox and the LGBTQI+ community, monkeypox and COVID-19, vaccines for monkeypox, etc., "neutral" sentiment was present in most of the Tweets. It was followed by "negative" and "positive" sentiments, respectively. Finally, to support research and development in this field, the paper presents a list of 50 open research questions related to the outbreak in the areas of Big Data, Data Mining, Natural Language Processing, and Machine Learning that may be investigated based on this dataset.
Collapse
|
5
|
Natural language processing in toxicology: Delineating adverse outcome pathways and guiding the application of new approach methodologies. BIOMATERIALS AND BIOSYSTEMS 2022; 7:100061. [PMID: 36824484 PMCID: PMC9934466 DOI: 10.1016/j.bbiosy.2022.100061] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 07/26/2022] [Accepted: 07/27/2022] [Indexed: 11/24/2022] Open
Abstract
Adverse Outcome Pathways (AOPs) are conceptual frameworks that tie an initial perturbation (molecular initiating event) to a phenotypic toxicological manifestation (adverse outcome), through a series of steps (key events). They provide therefore a standardized way to map and organize toxicological mechanistic information. As such, AOPs inform on key events underlying toxicity, thus supporting the development of New Approach Methodologies (NAMs), which aim to reduce the use of animal testing for toxicology purposes. However, the establishment of a novel AOP relies on the gathering of multiple streams of evidence and information, from available literature to knowledge databases. Often, this information is in the form of free text, also called unstructured text, which is not immediately digestible by a computer. This information is thus both tedious and increasingly time-consuming to process manually with the growing volume of data available. The advancement of machine learning provides alternative solutions to this challenge. To extract and organize information from relevant sources, it seems valuable to employ deep learning Natural Language Processing techniques. We review here some of the recent progress in the NLP field, and show how these techniques have already demonstrated value in the biomedical and toxicology areas. We also propose an approach to efficiently and reliably extract and combine relevant toxicological information from text. This data can be used to map underlying mechanisms that lead to toxicological effects and start building quantitative models, in particular AOPs, ultimately allowing animal-free human-based hazard and risk assessment.
Collapse
|
6
|
Deep neural networks for simultaneously capturing public topics and sentiments during a pandemic. Application to a COVID-19 tweet dataset. JMIR Med Inform 2022; 10:e34306. [PMID: 35533390 PMCID: PMC9135113 DOI: 10.2196/34306] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Revised: 02/14/2022] [Accepted: 04/21/2022] [Indexed: 11/24/2022] Open
Abstract
Background Public engagement is a key element for mitigating pandemics, and a good understanding of public opinion could help to encourage the successful adoption of public health measures by the population. In past years, deep learning has been increasingly applied to the analysis of text from social networks. However, most of the developed approaches can only capture topics or sentiments alone but not both together. Objective Here, we aimed to develop a new approach, based on deep neural networks, for simultaneously capturing public topics and sentiments and applied it to tweets sent just after the announcement of the COVID-19 pandemic by the World Health Organization (WHO). Methods A total of 1,386,496 tweets were collected, preprocessed, and split with a ratio of 80:20 into training and validation sets, respectively. We combined lexicons and convolutional neural networks to improve sentiment prediction. The trained model achieved an overall accuracy of 81% and a precision of 82% and was able to capture simultaneously the weighted words associated with a predicted sentiment intensity score. These outputs were then visualized via an interactive and customizable web interface based on a word cloud representation. Using word cloud analysis, we captured the main topics for extreme positive and negative sentiment intensity scores. Results In reaction to the announcement of the pandemic by the WHO, 6 negative and 5 positive topics were discussed on Twitter. Twitter users seemed to be worried about the international situation, economic consequences, and medical situation. Conversely, they seemed to be satisfied with the commitment of medical and social workers and with the collaboration between people. Conclusions We propose a new method based on deep neural networks for simultaneously extracting public topics and sentiments from tweets. This method could be helpful for monitoring public opinion during crises such as pandemics.
Collapse
|
7
|
Applications of quantitative social media listening to patient-centric drug development. Drug Discov Today 2022; 27:1523-1530. [PMID: 35114364 DOI: 10.1016/j.drudis.2022.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 08/13/2021] [Accepted: 01/26/2022] [Indexed: 11/27/2022]
Abstract
Social media listening has been increasingly acknowledged as a tool with applications in many stages of the drug development process. These applications were created to meet the need for patient-centric therapies that are fit-for-purpose and meaningful to patients. Such applications, however, require the leverage of new quantitative approaches and analytical methods that draw from developments in artificial intelligence and real-world data (RWD) analysis. Here, we review the state-of-the-art in quantitative social media listening (QSML) methods applied to drug discovery from the perspective of the pharmaceutical industry.
Collapse
|
8
|
Active neural networks to detect mentions of changes to medication treatment in social media. J Am Med Inform Assoc 2021; 28:2551-2561. [PMID: 34613417 PMCID: PMC8633624 DOI: 10.1093/jamia/ocab158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 04/13/2021] [Accepted: 07/23/2021] [Indexed: 12/30/2022] Open
Abstract
Objective We address a first step toward using social media data to supplement current efforts in monitoring population-level medication nonadherence: detecting changes to medication treatment. Medication treatment changes, like changes to dosage or to frequency of intake, that are not overseen by physicians are, by that, nonadherence to medication. Despite the consequences, including worsening health conditions or death, 50% of patients are estimated to not take medications as indicated. Current methods to identify nonadherence have major limitations. Direct observation may be intrusive or expensive, and indirect observation through patient surveys relies heavily on patients’ memory and candor. Using social media data in these studies may address these limitations. Methods We annotated 9830 tweets mentioning medications and trained a convolutional neural network (CNN) to find mentions of medication treatment changes, regardless of whether the change was recommended by a physician. We used active and transfer learning from 12 972 reviews we annotated from WebMD to address the class imbalance of our Twitter corpus. To validate our CNN and explore future directions, we annotated 1956 positive tweets as to whether they reflect nonadherence and categorized the reasons given. Results Our CNN achieved 0.50 F1-score on this new corpus. The manual analysis of positive tweets revealed that nonadherence is evident in a subset with 9 categories of reasons for nonadherence. Conclusion We showed that social media users publicly discuss medication treatment changes and may explain their reasons including when it constitutes nonadherence. This approach may be useful to supplement current efforts in adherence monitoring.
Collapse
|
9
|
An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:7937573. [PMID: 34795792 PMCID: PMC8594978 DOI: 10.1155/2021/7937573] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 10/11/2021] [Indexed: 01/03/2023]
Abstract
Semantic mining is always a challenge for big biomedical text data. Ontology has been widely proved and used to extract semantic information. However, the process of ontology-based semantic similarity calculation is so complex that it cannot measure the similarity for big text data. To solve this problem, we propose a parallelized semantic similarity measurement method based on Hadoop MapReduce for big text data. At first, we preprocess and extract the semantic features from documents. Then, we calculate the document semantic similarity based on ontology network structure under MapReduce framework. Finally, based on the generated semantic document similarity, document clusters are generated via clustering algorithms. To validate the effectiveness, we use two kinds of open datasets. The experimental results show that the traditional methods can hardly work for more than ten thousand biomedical documents. The proposed method keeps efficient and accurate for big dataset and is of high parallelism and scalability.
Collapse
|
10
|
A Neural Network Approach for Understanding Patient Experiences of Chronic Obstructive Pulmonary Disease (COPD): Retrospective, Cross-sectional Study of Social Media Content. JMIR Med Inform 2021; 9:e26272. [PMID: 34762056 PMCID: PMC8663584 DOI: 10.2196/26272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 04/18/2021] [Accepted: 09/20/2021] [Indexed: 11/29/2022] Open
Abstract
Background The abundance of online content contributed by patients is a rich source of insight about the lived experience of disease. Patients share disease experiences with other members of the patient and caregiver community and do so using their own lexicon of words and phrases. This lexicon and the topics that are communicated using words and phrases belonging to the lexicon help us better understand disease burden. Insights from social media may ultimately guide clinical development in ways that ensure that future treatments are fit for purpose from the patient’s perspective. Objective We sought insights into the patient experience of chronic obstructive pulmonary disease (COPD) by analyzing a substantial corpus of social media content. The corpus was sufficiently large to make manual review and manual coding all but impossible to perform in a consistent and systematic fashion. Advanced analytics were applied to the corpus content in the search for associations between symptoms and impacts across the entire text corpus. Methods We conducted a retrospective, cross-sectional study of 5663 posts sourced from open blogs and online forum posts published by COPD patients between February 2016 and August 2019. We applied a novel neural network approach to identify a lexicon of community words and phrases used by patients to describe their symptoms. We used this lexicon to explore the relationship between COPD symptoms and disease-related impacts. Results We identified a diverse lexicon of community words and phrases for COPD symptoms, including gasping, wheezy, mucus-y, and muck. These symptoms were mentioned in association with specific words and phrases for disease impact such as frightening, breathing discomfort, and difficulty exercising. Furthermore, we found an association between mucus hypersecretion and moderate disease severity, which distinguished mucus from the other main COPD symptoms, namely breathlessness and cough. Conclusions We demonstrated the potential of neural networks and advanced analytics to gain patient-focused insights about how each distinct COPD symptom contributes to the burden of chronic and acute respiratory illness. Using a neural network approach, we identified words and phrases for COPD symptoms that were specific to the patient community. Identifying patterns in the association between symptoms and impacts deepened our understanding of the patient experience of COPD. This approach can be readily applied to other disease areas.
Collapse
|
11
|
Explainable ICD multi-label classification of EHRs in Spanish with convolutional attention. Int J Med Inform 2021; 157:104615. [PMID: 34741890 DOI: 10.1016/j.ijmedinf.2021.104615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/23/2021] [Accepted: 10/08/2021] [Indexed: 01/16/2023]
Abstract
BACKGROUND This work deals with Natural Language Processing applied to Electronic Health Records (EHRs). EHRs are coded following the International Classification of Diseases (ICD) leading to a multi-label classification problem. Previously proposed approaches act as black-boxes without giving further insights. Explainable Artificial Intelligence (XAI) helps to clarify what brought the model to make the predictions. GOAL This work aims to obtain explainable predictions of the diseases and procedures contained in EHRs. As an application, we show visualizations of the attention stored and propose a prototype of a Decision Support System (DSS) that highlights the text that motivated the choice of each of the proposed ICD codes. METHODS Convolutional Neural Networks (CNNs) with attention mechanisms were used. Attention mechanisms allow to detect which part of the input (EHRs) motivate the output (medical codes), producing explainable predictions. RESULTS We successfully applied methods in a Spanish corpus getting challenging results. Finally, we presented the idea of extracting the chronological order of the ICDs in a given EHR by anchoring the codes to different stages of the clinical admission. CONCLUSIONS We found that explainable deep learning models applied to predict medical codes store helpful information that could be used to assist medical experts while reaching a solid performance. In particular, we show that the information stored in the attention mechanisms enables DSS and a shallow chronology of diagnoses.
Collapse
|
12
|
A Proposed Approach for Conducting Studies That Use Data From Social Media Platforms. Mayo Clin Proc 2021; 96:2218-2229. [PMID: 34353473 DOI: 10.1016/j.mayocp.2021.02.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 02/06/2021] [Accepted: 02/16/2021] [Indexed: 11/27/2022]
Abstract
The prominence of social media in contemporary society has extended significantly into the health care arena, where both patients and health care providers have used social media platforms to gather, communicate, learn, and share medical content and personal experience in real time. The medical literature has also seen an exponential increase in the number of studies that use data derived from social media coverage of various medical issues and topics. In this guide, we present a step-by-step framework for health care professionals and researchers to conduct studies that use data from social media platforms. We present 6 overarching steps: focus on framing a question that is appropriate for social media evaluation, identification of social media outlet and selection criteria of content, systematic data extraction, assessment of quality of content and sources of bias, analysis of data, and interpretation of study findings. Each step is illustrated with published examples.
Collapse
|
13
|
DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter. J Am Med Inform Assoc 2021; 28:2184-2192. [PMID: 34270701 PMCID: PMC8449608 DOI: 10.1093/jamia/ocab114] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 05/20/2021] [Accepted: 06/08/2021] [Indexed: 11/17/2022] Open
Abstract
Objective Research on pharmacovigilance from social media data has focused on mining adverse drug events (ADEs) using annotated datasets, with publications generally focusing on 1 of 3 tasks: ADE classification, named entity recognition for identifying the span of ADE mentions, and ADE mention normalization to standardized terminologies. While the common goal of such systems is to detect ADE signals that can be used to inform public policy, it has been impeded largely by limited end-to-end solutions for large-scale analysis of social media reports for different drugs. Materials and Methods We present a dataset for training and evaluation of ADE pipelines where the ADE distribution is closer to the average ‘natural balance’ with ADEs present in about 7% of the tweets. The deep learning architecture involves an ADE extraction pipeline with individual components for all 3 tasks. Results The system presented achieved state-of-the-art performance on comparable datasets and scored a classification performance of F1 = 0.63, span extraction performance of F1 = 0.44 and an end-to-end entity resolution performance of F1 = 0.34 on the presented dataset. Discussion The performance of the models continues to highlight multiple challenges when deploying pharmacovigilance systems that use social media data. We discuss the implications of such models in the downstream tasks of signal detection and suggest future enhancements. Conclusion Mining ADEs from Twitter posts using a pipeline architecture requires the different components to be trained and tuned based on input data imbalance in order to ensure optimal performance on the end-to-end resolution task.
Collapse
|
14
|
Incorporating Unstructured Patient Narratives and Health Insurance Claims Data in Pharmacovigilance: Natural Language Processing Analysis of Patient-Generated Texts About Systemic Lupus Erythematosus. JMIR Public Health Surveill 2021; 7:e29238. [PMID: 34255719 PMCID: PMC8278300 DOI: 10.2196/29238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 05/12/2021] [Accepted: 05/19/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Gaining insights that cannot be obtained from health care databases from patients has become an important topic in pharmacovigilance. OBJECTIVE Our objective was to demonstrate a use case, in which patient-generated data were incorporated in pharmacovigilance, to understand the epidemiology and burden of illness in Japanese patients with systemic lupus erythematosus. METHODS We used data on systemic lupus erythematosus, an autoimmune disease that substantially impairs quality of life, from 2 independent data sets. To understand the disease's epidemiology, we analyzed a Japanese health insurance claims database. To understand the disease's burden, we analyzed text data collected from Japanese disease blogs (tōbyōki) written by patients with systemic lupus erythematosus. Natural language processing was applied to these texts to identify frequent patient-level complaints, and term frequency-inverse document frequency was used to explore patient burden during treatment. We explored health-related quality of life based on patient descriptions. RESULTS We analyzed data from 4694 and 635 patients with systemic lupus erythematosus in the health insurance claims database and tōbyōki blogs, respectively. Based on health insurance claims data, the prevalence of systemic lupus erythematosus is 107.70 per 100,000 persons. Tōbyōki text data analysis showed that pain-related words (eg, pain, severe pain, arthralgia) became more important after starting treatment. We also found an increase in patients' references to mobility and self-care over time, which indicated increased attention to physical disability due to disease progression. CONCLUSIONS A classical medical database represents only a part of a patient's entire treatment experience, and analysis using solely such a database cannot represent patient-level symptoms or patient concerns about treatments. This study showed that analysis of tōbyōki blogs can provide added information on patient-level details, advancing patient-centric pharmacovigilance.
Collapse
|
15
|
A New Era in Pharmacovigilance: Toward Real-World Data and Digital Monitoring. Clin Pharmacol Ther 2021; 109:1197-1202. [PMID: 33492663 PMCID: PMC8058244 DOI: 10.1002/cpt.2172] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 01/08/2021] [Indexed: 12/20/2022]
Abstract
Adverse drug reactions (ADRs) are a major concern for patients, clinicians, and regulatory agencies. The discovery of serious ADRs leading to substantial morbidity and mortality has resulted in mandatory phase IV clinical trials, black box warnings, and withdrawal of drugs from the market. Real‐world data, data collected during routine clinical care, is being adopted by innovators, regulators, payors, and providers to inform decision making throughout the product life cycle. We outline several different approaches to modern pharmacovigilance, including spontaneous reporting databases, electronic health record monitoring and research frameworks, social media surveillance, and the use of digital devices. Some of these platforms are well‐established while others are still emerging or experimental. We highlight both the potential opportunity, as well as the existing challenges within these pharmacovigilance systems that have already begun to impact the drug development process, as well as the landscape of postmarket drug safety monitoring. Further research and investment into different and complementary pharmacovigilance systems is needed to ensure the continued safety of pharmacotherapy.
Collapse
|
16
|
Addressing Extreme Imbalance for Detecting Medications Mentioned in Twitter User Timelines. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-77211-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
17
|
A comprehensive survey of deep learning in the field of medical imaging and medical natural language processing: Challenges and research directions. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2021. [DOI: 10.1016/j.jksuci.2021.01.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
18
|
Abstract
Despite huge technological advances in the capabilities to capture, store, link and analyse data electronically, there has been some but limited impact on routine pharmacovigilance. We discuss emerging research in the use of artificial intelligence, machine learning and automation across the pharmacovigilance lifecycle including pre-licensure. Reasons are provided on why adoption is challenging and we also provide a perspective on changes needed to accelerate adoption, and thereby improve patient safety. Last, we make clear that while technologies could be superimposed on existing pharmacovigilance processes for incremental improvements, these great societal advances in data and technology also provide us with a timely opportunity to reconsider everything we do in pharmacovigilance operations to maximise the benefit of these advances.
Collapse
|
19
|
An insight analysis and detection of drug-abuse risk behavior on Twitter with self-taught deep learning. COMPUTATIONAL SOCIAL NETWORKS 2019. [DOI: 10.1186/s40649-019-0071-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Abstract
Drug abuse continues to accelerate towards becoming the most severe public health problem in the United States. The ability to detect drug-abuse risk behavior at a population scale, such as among the population of Twitter users, can help us to monitor the trend of drug-abuse incidents. Unfortunately, traditional methods do not effectively detect drug-abuse risk behavior, given tweets. This is because: (1) tweets usually are noisy and sparse and (2) the availability of labeled data is limited. To address these challenging problems, we propose a deep self-taught learning system to detect and monitor drug-abuse risk behaviors in the Twitter sphere, by leveraging a large amount of unlabeled data. Our models automatically augment annotated data: (i) to improve the classification performance and (ii) to capture the evolving picture of drug abuse on online social media. Our extensive experiments have been conducted on three million drug-abuse-related tweets with geo-location information. Results show that our approach is highly effective in detecting drug-abuse risk behaviors.
Collapse
|