1
|
He L, Yin T, Zheng K. They May not Work! An Evaluation of Eleven Sentiment Analysis Tools on Seven Social Media Datasets. J Biomed Inform 2022; 132:104142. [PMID: 35835437 DOI: 10.1016/j.jbi.2022.104142] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 07/05/2022] [Accepted: 07/07/2022] [Indexed: 11/29/2022]
Abstract
OBJECTIVE Sentiment analysis is an important method for understanding emotions and opinions expressed through social media exchanges. Little work has been done to evaluate the performance of existing sentiment analysis tools on social media datasets, particularly those related to health, healthcare, or public health. This study aims to address the gap. MATERIAL AND METHODS We evaluated 11 commonly used sentiment analysis tools on five health-related social media datasets curated in previously published studies. These datasets include Human Papillomavirus Vaccine, Health Care Reform, COVID-19 Masking, Vitals.com Physician Reviews, and the Breast Cancer Forum from MedHelp.org. For comparison, we also analyzed two non-health datasets based on movie reviews and generic tweets. We conducted a qualitative error analysis on the social media posts that were incorrectly classified by all tools. RESULTS The existing sentiment analysis tools performed poorly with an average weighted F1 score below 0.6. The inter-tool agreement was also low; the average Fleiss Kappa score is 0.066. The qualitative error analysis identified two major causes for misclassification: (1) correct sentiment but on wrong subject(s) and (2) failure to properly interpret inexplicit/indirect sentiment expressions. DISCUSSION and Conclusion: The performance of the existing sentiment analysis tools is insufficient to generate accurate sentiment classification results. The low inter-tool agreement suggests that the conclusion of a study could be entirely driven by the idiosyncrasies of the tool selected, rather than by the data. This is very concerning especially if the results may be used to inform important policy decisions such as mask or vaccination mandates.
Collapse
Affiliation(s)
- Lu He
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, United States
| | - Tingjue Yin
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, United States
| | - Kai Zheng
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, United States; Department of Emergency Medicine, School of Medicine, University of California, Irvine, Irvine, California, United States.
| |
Collapse
|
2
|
Dehdarirad T, Yaghtin M. Gender differences in citation sentiment: A case study in life sciences and biomedicine. J Inf Sci 2022. [DOI: 10.1177/01655515221074327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In this study, we investigated whether female and male authors in the field of life sciences and biomedicine differed in their tendency for citation and citation sentiment. The data comprised two sets, cited set and citing set. Cited set comprised 17,237 articles whereas citing set comprised 115,935 articles. The cited set which is from the area Life Sciences & Biomedicine and published during 2012–2016 was retrieved from the Web of Science Medline. The citing set and its citation contexts were retrieved using the Colil database. The analysis was done using a combination of homophily analysis, regression analysis and a chi-square test. The covariates in the regression analyses were features related to authors, journal, institution, country and abstract readability. The homophily analysis showed a significant tendency for female (8%) and male (14%) authorship teams to cite papers by the same gender composition teams. In addition, the results of regression analysis (Model 1) and pairwise comparisons showed that male-authored papers received a significant higher positive sentiment compared with female-authored papers. The results of regression analysis (Model 2) showed a small significant positive association between gender similarity of cited and citing authorship teams and the sentiment score. However, further analysis using the chi-square test showed a significant lower tendency for women to use positive terms when citing the research findings of papers with the same gender composition. Men, in contrast, used significantly more positive terms when citing papers with the same gender composition. Finally, lay summary for a cited paper, country similarity and the venue of cited publication when it was a mega journal had a positive significant association with the sentiment score received.
Collapse
Affiliation(s)
- Tahereh Dehdarirad
- Department of Communication and Learning in Science, Chalmers University of Technology, Sweden
| | - Maryam Yaghtin
- Department of Scientometrics, Islamic World Science Citation Center (ISC), Iran
| |
Collapse
|
3
|
He L, Yin T, Hu Z, Chen Y, Hanauer DA, Zheng K. Developing a standardized protocol for computational sentiment analysis research using health-related social media data. J Am Med Inform Assoc 2021; 28:1125-1134. [PMID: 33355353 DOI: 10.1093/jamia/ocaa298] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 12/04/2020] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVE Sentiment analysis is a popular tool for analyzing health-related social media content. However, existing studies exhibit numerous methodological issues and inconsistencies with respect to research design and results reporting, which could lead to biased data, imprecise or incorrect conclusions, or incomparable results across studies. This article reports a systematic analysis of the literature with respect to such issues. The objective was to develop a standardized protocol for improving the research validity and comparability of results in future relevant studies. MATERIALS AND METHODS We developed the Protocol of Analysis of senTiment in Health (PATH) based on a systematic review that analyzed common research design choices and how such choices were made, or reported, among eligible studies published 2010-2019. RESULTS Of 409 articles screened, 89 met the inclusion criteria. A total of 16 distinctive research design choices were identified, 9 of which have significant methodological or reporting inconsistencies among the articles reviewed, ranging from how relevance of study data was determined to how the sentiment analysis tool selected was validated. Based on this result, we developed the PATH protocol that encompasses all these distinctive design choices and highlights the ones for which careful consideration and detailed reporting are particularly warranted. CONCLUSIONS A substantial degree of methodological and reporting inconsistencies exist in the extant literature that applied sentiment analysis to analyzing health-related social media data. The PATH protocol developed through this research may contribute to mitigating such issues in future relevant studies.
Collapse
Affiliation(s)
- Lu He
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, USA
| | - Tingjue Yin
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, USA
| | - Zhaoxian Hu
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, USA
| | - Yunan Chen
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, USA
| | - David A Hanauer
- Department of Learning Health Sciences, School of Medicine, University of Michigan, Ann Arbor, Michigan, USA.,Department of Pediatrics, School of Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | - Kai Zheng
- Department of Informatics, Donald Bren School of Information and Computer Science, University of California, Irvine, Irvine, California, USA.,Department of Emergency Medicine, School of Medicine, University of California, Irvine, Irvine, California, USA
| |
Collapse
|
4
|
Balakrishnan A, Idicula SM, Jones J. Deep learning based analysis of sentiment dynamics in online cancer community forums: An experience. Health Informatics J 2021; 27:14604582211007537. [PMID: 33832380 DOI: 10.1177/14604582211007537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Online health communities (OHC) provide various opportunities for patients with chronic or life-threatening illnesses, especially for cancer patients and survivors. A better understanding of the sentiment dynamics of patients in OHCs can help in the precise formulation of the needs during their treatment. The current study investigated the sentiment dynamics in patients' narratives in a Breast Cancer community group (Breastcancer.org) to identify the changes in emotions, thoughts, stress, and coping mechanisms while undergoing treatment options, particularly chemotherapy, radiation, and surgery. Sentiment dynamics of users' posts was performed using a deep learning model. A sentiment change analysis was performed to measure change in the satisfaction level of the users. The deep learning model BiLSTM with sentiment embedding features provided a better F1-score of 91.9%. Sentiment dynamics can assess the difference in satisfaction level the users acquire by interacting with other users in the forum. A comparison of the proposed model with existing models revealed the effectiveness of this methodology.
Collapse
|
5
|
Glinert LH. Communicative and Discursive Perspectives on the Medication Experience. PHARMACY 2021; 9:pharmacy9010042. [PMID: 33671135 PMCID: PMC8006053 DOI: 10.3390/pharmacy9010042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 02/01/2021] [Accepted: 02/11/2021] [Indexed: 11/29/2022] Open
Abstract
Taking the ‘medication experience’ in the broad sense of what individuals hear and say about their medication, as well as how they experience it, this paper explores diverse research on medication information available to patients and their modes and capacities for interaction, including personal circles, doctors and pharmacists, labeling and promotion, websites, and the patient’s own inner conversations and self-expression. The goal is to illustrate, for nonspecialists in communication, how the actors, messages, mediums, genres, and contextual factors within a standard ethnographic and social semiotic model of discourse and communication are operating, not always effectively or beneficially, to mediate or construct a patient’s medication experience. We also suggest how disparate insights can be integrated through such a model and might generate new research questions.
Collapse
Affiliation(s)
- Lewis H Glinert
- Middle Eastern Studies and Linguistics, Dartmouth College, NH 03755, USA
| |
Collapse
|
6
|
Colón-Ruiz C, Segura-Bedmar I. Comparing deep learning architectures for sentiment analysis on drug reviews. J Biomed Inform 2020; 110:103539. [PMID: 32818665 DOI: 10.1016/j.jbi.2020.103539] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 08/08/2020] [Accepted: 08/10/2020] [Indexed: 11/24/2022]
Abstract
Since the turn of the century, as millions of user's opinions are available on the web, sentiment analysis has become one of the most fruitful research fields in Natural Language Processing (NLP). Research on sentiment analysis has covered a wide range of domains such as economy, polity, and medicine, among others. In the pharmaceutical field, automatic analysis of online user reviews allows for the analysis of large amounts of user's opinions and to obtain relevant information about the effectiveness and side effects of drugs, which could be used to improve pharmacovigilance systems. Throughout the years, approaches for sentiment analysis have progressed from simple rules to advanced machine learning techniques such as deep learning, which has become an emerging technology in many NLP tasks. Sentiment analysis is not oblivious to this success, and several systems based on deep learning have recently demonstrated their superiority over former methods, achieving state-of-the-art results on standard sentiment analysis datasets. However, prior work shows that very few attempts have been made to apply deep learning to sentiment analysis of drug reviews. We present a benchmark comparison of various deep learning architectures such as Convolutional Neural Networks (CNN) and Long short-term memory (LSTM) recurrent neural networks. We propose several combinations of these models and also study the effect of different pre-trained word embedding models. As transformers have revolutionized the NLP field achieving state-of-art results for many NLP tasks, we also explore Bidirectional Encoder Representations from Transformers (BERT) with a Bi-LSTM for the sentiment analysis of drug reviews. Our experiments show that the usage of BERT obtains the best results, but with a very high training time. On the other hand, CNN achieves acceptable results while requiring less training time.
Collapse
Affiliation(s)
- Cristóbal Colón-Ruiz
- Computer Science Department, University Carlos III of Madrid, Avenida de la Universidad 30, 28911, Leganés, Madrid, Spain.
| | - Isabel Segura-Bedmar
- Computer Science Department, University Carlos III of Madrid, Avenida de la Universidad 30, 28911, Leganés, Madrid, Spain
| |
Collapse
|
7
|
Le N, Wiley M, Loza A, Hristidis V, El-Kareh R. Prediction of Medical Concepts in Electronic Health Records: Similar Patient Analysis. JMIR Med Inform 2020; 8:e16008. [PMID: 32706678 PMCID: PMC7395257 DOI: 10.2196/16008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 03/01/2020] [Accepted: 03/28/2020] [Indexed: 11/13/2022] Open
Abstract
Background Medicine 2.0—the adoption of Web 2.0 technologies such as social networks in health care—creates the need for apps that can find other patients with similar experiences and health conditions based on a patient’s electronic health record (EHR). Concurrently, there is an increasing number of longitudinal EHR data sets with rich information, which are essential to fulfill this need. Objective This study aimed to evaluate the hypothesis that we can leverage similar EHRs to predict possible future medical concepts (eg, disorders) from a patient’s EHR. Methods We represented patients’ EHRs using time-based prefixes and suffixes, where each prefix or suffix is a set of medical concepts from a medical ontology. We compared the prefixes of other patients in the collection with the state of the current patient using various interpatient distance measures. The set of similar prefixes yields a set of suffixes, which we used to determine probable future concepts for the current patient’s EHR. Results We evaluated our methods on the Multiparameter Intelligent Monitoring in Intensive Care II data set of patients, where we achieved precision up to 56.1% and recall up to 69.5%. For a limited set of clinically interesting concepts, specifically a set of procedures, we found that 86.9% (353/406) of the true-positives are clinically useful, that is, these procedures were actually performed later on the patient, and only 4.7% (19/406) of true-positives were completely irrelevant. Conclusions These initial results indicate that predicting patients’ future medical concepts is feasible. Effectively predicting medical concepts can have several applications, such as managing resources in a hospital.
Collapse
Affiliation(s)
- Nhat Le
- Department of Computer Science & Engineering, University of California, Riverside, Riverside, CA, United States
| | - Matthew Wiley
- Department of Computer Science & Engineering, University of California, Riverside, Riverside, CA, United States
| | - Antonio Loza
- School of Medicine, University of California, Riverside, Riverside, CA, United States
| | - Vagelis Hristidis
- Department of Computer Science & Engineering, University of California, Riverside, Riverside, CA, United States
| | - Robert El-Kareh
- Department of Medicine, University of California, San Diego, San Diego, CA, United States
| |
Collapse
|
8
|
Rivas R, Sadah SA, Guo Y, Hristidis V. Classification of Health-Related Social Media Posts: Evaluation of Post Content-Classifier Models and Analysis of User Demographics. JMIR Public Health Surveill 2020; 6:e14952. [PMID: 32234706 PMCID: PMC7160708 DOI: 10.2196/14952] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 08/06/2019] [Accepted: 01/27/2020] [Indexed: 11/23/2022] Open
Abstract
Background The increasing volume of health-related social media activity, where users connect, collaborate, and engage, has increased the significance of analyzing how people use health-related social media. Objective The aim of this study was to classify the content (eg, posts that share experiences and seek support) of users who write health-related social media posts and study the effect of user demographics on post content. Methods We analyzed two different types of health-related social media: (1) health-related online forums—WebMD and DailyStrength—and (2) general online social networks—Twitter and Google+. We identified several categories of post content and built classifiers to automatically detect these categories. These classifiers were used to study the distribution of categories for various demographic groups. Results We achieved an accuracy of at least 84% and a balanced accuracy of at least 0.81 for half of the post content categories in our experiments. In addition, 70.04% (4741/6769) of posts by male WebMD users asked for advice, and male users’ WebMD posts were more likely to ask for medical advice than female users’ posts. The majority of posts on DailyStrength shared experiences, regardless of the gender, age group, or location of their authors. Furthermore, health-related posts on Twitter and Google+ were used to share experiences less frequently than posts on WebMD and DailyStrength. Conclusions We studied and analyzed the content of health-related social media posts. Our results can guide health advocates and researchers to better target patient populations based on the application type. Given a research question or an outreach goal, our results can be used to choose the best online forums to answer the question or disseminate a message.
Collapse
Affiliation(s)
- Ryan Rivas
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, United States
| | - Shouq A Sadah
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, United States
| | - Yuhang Guo
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, United States
| | - Vagelis Hristidis
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, United States
| |
Collapse
|
9
|
Padmavathy P, Pakkir Mohideen S. An efficient two-pass classifier system for patient opinion mining to analyze drugs satisfaction. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2019.101755] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
10
|
Zunic A, Corcoran P, Spasic I. Sentiment Analysis in Health and Well-Being: Systematic Review. JMIR Med Inform 2020; 8:e16023. [PMID: 32012057 PMCID: PMC7013658 DOI: 10.2196/16023] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 10/26/2019] [Accepted: 10/27/2019] [Indexed: 12/22/2022] Open
Abstract
Background Sentiment analysis (SA) is a subfield of natural language processing whose aim is to automatically classify the sentiment expressed in a free text. It has found practical applications across a wide range of societal contexts including marketing, economy, and politics. This review focuses specifically on applications related to health, which is defined as “a state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity.” Objective This study aimed to establish the state of the art in SA related to health and well-being by conducting a systematic review of the recent literature. To capture the perspective of those individuals whose health and well-being are affected, we focused specifically on spontaneously generated content and not necessarily that of health care professionals. Methods Our methodology is based on the guidelines for performing systematic reviews. In January 2019, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified a total of 86 relevant studies and extracted data about the datasets analyzed, discourse topics, data creators, downstream applications, algorithms used, and their evaluation. Results The majority of data were collected from social networking and Web-based retailing platforms. The primary purpose of online conversations is to exchange information and provide social support online. These communities tend to form around health conditions with high severity and chronicity rates. Different treatments and services discussed include medications, vaccination, surgery, orthodontic services, individual physicians, and health care services in general. We identified 5 roles with respect to health and well-being among the authors of the types of spontaneously generated narratives considered in this review: a sufferer, an addict, a patient, a carer, and a suicide victim. Out of 86 studies considered, only 4 reported the demographic characteristics. A wide range of methods were used to perform SA. Most common choices included support vector machines, naïve Bayesian learning, decision trees, logistic regression, and adaptive boosting. In contrast with general trends in SA research, only 1 study used deep learning. The performance lags behind the state of the art achieved in other domains when measured by F-score, which was found to be below 60% on average. In the context of SA, the domain of health and well-being was found to be resource poor: few domain-specific corpora and lexica are shared publicly for research purposes. Conclusions SA results in the area of health and well-being lag behind those in other domains. It is yet unclear if this is because of the intrinsic differences between the domains and their respective sublanguages, the size of training datasets, the lack of domain-specific sentiment lexica, or the choice of algorithms.
Collapse
Affiliation(s)
- Anastazia Zunic
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Padraig Corcoran
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Irena Spasic
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
11
|
Audeh B, Calvier FE, Bellet F, Beyens MN, Pariente A, Lillo-Le Louet A, Bousquet C. Pharmacology and social media: Potentials and biases of web forums for drug mention analysis-case study of France. Health Informatics J 2019; 26:1253-1272. [PMID: 31566468 DOI: 10.1177/1460458219865128] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The aim of this study is to analyze drug mentions in web forums to evaluate the utility of this data source for drug post-marketing studies. We automatically annotated over 60 million posts extracted from 21 French web forums. Drug mentions detected in this corpus were matched to drug names in a French drug database (Theriaque®). Our analysis showed that a high proportion of the most frequent drug mentions in the selected web forums correspond to drugs that are usually prescribed to young women, such as combined oral contraceptives. The most mentioned drugs in our corpus correlated weakly to the most prescribed drugs in France but seemed to be influenced by events widely reported in traditional media. In this article, we conclude that web forums have high potential for post-marketing drug-related studies, such as pharmacovigilance, and observation of drug utilization. However, the bias related to forum selection and the corresponding population representativeness should always be taken into account.
Collapse
Affiliation(s)
- Bissan Audeh
- Sorbonne Université and Université Paris 13, France
| | | | | | | | | | | | - Cedric Bousquet
- Sorbonne Université and Université Paris 13, France; CHU University Hospital of Saint-Etienne, France
| |
Collapse
|
12
|
Harnessing social media data for pharmacovigilance: a review of current state of the art, challenges and future directions. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2019. [DOI: 10.1007/s41060-019-00175-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
13
|
Rathore AK, Kar AK, Ilavarasan PV. Social Media Analytics: Literature Review and Directions for Future Research. DECISION ANALYSIS 2017. [DOI: 10.1287/deca.2017.0355] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Affiliation(s)
- Ashish K. Rathore
- Department of Management Studies, Indian Institute of Technology Delhi, New Delhi, Delhi 110016 India
| | - Arpan K. Kar
- Department of Management Studies, Indian Institute of Technology Delhi, New Delhi, Delhi 110016 India
| | - P. Vigneswara Ilavarasan
- Department of Management Studies, Indian Institute of Technology Delhi, New Delhi, Delhi 110016 India
| |
Collapse
|
14
|
Gonzalez-Hernandez G, Sarker A, O’Connor K, Savova G. Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text. Yearb Med Inform 2017; 26:214-227. [PMID: 29063568 PMCID: PMC6250990 DOI: 10.15265/iy-2017-029] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background: Natural Language Processing (NLP) methods are increasingly being utilized to mine knowledge from unstructured health-related texts. Recent advances in noisy text processing techniques are enabling researchers and medical domain experts to go beyond the information encapsulated in published texts (e.g., clinical trials and systematic reviews) and structured questionnaires, and obtain perspectives from other unstructured sources such as Electronic Health Records (EHRs) and social media posts. Objectives: To review the recently published literature discussing the application of NLP techniques for mining health-related information from EHRs and social media posts. Methods: Literature review included the research published over the last five years based on searches of PubMed, conference proceedings, and the ACM Digital Library, as well as on relevant publications referenced in papers. We particularly focused on the techniques employed on EHRs and social media data. Results: A set of 62 studies involving EHRs and 87 studies involving social media matched our criteria and were included in this paper. We present the purposes of these studies, outline the key NLP contributions, and discuss the general trends observed in the field, the current state of research, and important outstanding problems. Conclusions: Over the recent years, there has been a continuing transition from lexical and rule-based systems to learning-based approaches, because of the growth of annotated data sets and advances in data science. For EHRs, publicly available annotated data is still scarce and this acts as an obstacle to research progress. On the contrary, research on social media mining has seen a rapid growth, particularly because the large amount of unlabeled data available via this resource compensates for the uncertainty inherent to the data. Effective mechanisms to filter out noise and for mapping social media expressions to standard medical concepts are crucial and latent research problems. Shared tasks and other competitive challenges have been driving factors behind the implementation of open systems, and they are likely to play an imperative role in the development of future systems.
Collapse
Affiliation(s)
- G. Gonzalez-Hernandez
- Department of Epidemiology, Biostatistics, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - A. Sarker
- Department of Epidemiology, Biostatistics, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - K. O’Connor
- Department of Epidemiology, Biostatistics, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - G. Savova
- Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
15
|
|
16
|
Piryani R, Madhavi D, Singh V. Analytical mapping of opinion mining and sentiment analysis research during 2000–2015. Inf Process Manag 2017. [DOI: 10.1016/j.ipm.2016.07.001] [Citation(s) in RCA: 107] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
17
|
Dias-Souza MV. Strategies for Expanding Access and Improving the Quality of Pharmaceutical Services. PHARMACEUTICAL SCIENCES 2017. [DOI: 10.4018/978-1-5225-1762-7.ch014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Pharmaceutical services are among the most accessible healthcare assistance systems worldwide, being provided generally in enterprises like Drugstores and Compounding Pharmacies. Pharmacists are highly accessible healthcare professionals considering also the availability, geographic distribution and location of pharmaceutical enterprises. However, there are several challenges for providing these services for patients with limitations such as low education, difficulties on reaching the Pharmacist, and the need for individualized monitoring (due to the complexity of therapy). Reports of low quality services are growing worldwide, and in order to expand access and improve the quality of pharmaceutical services, Pharmacists must move from being medication dispensers with focus in administrative management to a clinically-oriented practice with a humanistic view. The aim of this chapter is to make an approach on the implementation of effective strategies and ways to improve the quality of Pharmacists' work as specialized healthcare providers.
Collapse
|
18
|
Lim S, Tucker CS, Kumara S. An unsupervised machine learning model for discovering latent infectious diseases using social media data. J Biomed Inform 2016; 66:82-94. [PMID: 28034788 DOI: 10.1016/j.jbi.2016.12.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Revised: 12/03/2016] [Accepted: 12/14/2016] [Indexed: 10/20/2022]
Abstract
INTRODUCTION The authors of this work propose an unsupervised machine learning model that has the ability to identify real-world latent infectious diseases by mining social media data. In this study, a latent infectious disease is defined as a communicable disease that has not yet been formalized by national public health institutes and explicitly communicated to the general public. Most existing approaches to modeling infectious-disease-related knowledge discovery through social media networks are top-down approaches that are based on already known information, such as the names of diseases and their symptoms. In existing top-down approaches, necessary but unknown information, such as disease names and symptoms, is mostly unidentified in social media data until national public health institutes have formalized that disease. Most of the formalizing processes for latent infectious diseases are time consuming. Therefore, this study presents a bottom-up approach for latent infectious disease discovery in a given location without prior information, such as disease names and related symptoms. METHODS Social media messages with user and temporal information are extracted during the data preprocessing stage. An unsupervised sentiment analysis model is then presented. Users' expressions about symptoms, body parts, and pain locations are also identified from social media data. Then, symptom weighting vectors for each individual and time period are created, based on their sentiment and social media expressions. Finally, latent-infectious-disease-related information is retrieved from individuals' symptom weighting vectors. DATASETS AND RESULTS Twitter data from August 2012 to May 2013 are used to validate this study. Real electronic medical records for 104 individuals, who were diagnosed with influenza in the same period, are used to serve as ground truth validation. The results are promising, with the highest precision, recall, and F1 score values of 0.773, 0.680, and 0.724, respectively. CONCLUSION This work uses individuals' social media messages to identify latent infectious diseases, without prior information, quicker than when the disease(s) is formalized by national public health institutes. In particular, the unsupervised machine learning model using user, textual, and temporal information in social media data, along with sentiment analysis, identifies latent infectious diseases in a given location.
Collapse
Affiliation(s)
- Sunghoon Lim
- Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Conrad S Tucker
- School of Engineering Design, Technology, and Professional Programs, The Pennsylvania State University, University Park, PA 16802, USA; Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA.
| | - Soundar Kumara
- Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
19
|
Sadah SA, Shahbazi M, Wiley MT, Hristidis V. Demographic-Based Content Analysis of Web-Based Health-Related Social Media. J Med Internet Res 2016; 18:e148. [PMID: 27296242 PMCID: PMC4923586 DOI: 10.2196/jmir.5327] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Revised: 03/16/2016] [Accepted: 04/11/2016] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND An increasing number of patients from diverse demographic groups share and search for health-related information on Web-based social media. However, little is known about the content of the posted information with respect to the users' demographics. OBJECTIVE The aims of this study were to analyze the content of Web-based health-related social media based on users' demographics to identify which health topics are discussed in which social media by which demographic groups and to help guide educational and research activities. METHODS We analyze 3 different types of health-related social media: (1) general Web-based social networks Twitter and Google+; (2) drug review websites; and (3) health Web forums, with a total of about 6 million users and 20 million posts. We analyzed the content of these posts based on the demographic group of their authors, in terms of sentiment and emotion, top distinctive terms, and top medical concepts. RESULTS The results of this study are: (1) Pregnancy is the dominant topic for female users in drug review websites and health Web forums, whereas for male users, it is cardiac problems, HIV, and back pain, but this is not the case for Twitter; (2) younger users (0-17 years) mainly talk about attention-deficit hyperactivity disorder (ADHD) and depression-related drugs, users aged 35-44 years discuss about multiple sclerosis (MS) drugs, and middle-aged users (45-64 years) talk about alcohol and smoking; (3) users from the Northeast United States talk about physical disorders, whereas users from the West United States talk about mental disorders and addictive behaviors; (4) Users with higher writing level express less anger in their posts. CONCLUSION We studied the popular topics and the sentiment based on users' demographics in Web-based health-related social media. Our results provide valuable information, which can help create targeted and effective educational campaigns and guide experts to reach the right users on Web-based social chatter.
Collapse
Affiliation(s)
- Shouq A Sadah
- University of California, Riverside, Department of Computer Science and Engineering, Riverside, CA, United States.
| | | | | | | |
Collapse
|
20
|
Moccia M, Lavorgna L, Lanzillo R, Brescia Morra V, Tedeschi G, Bonavita S. The Dress: Transforming a web viral event into a scientific survey. Mult Scler Relat Disord 2016; 7:41-6. [DOI: 10.1016/j.msard.2016.03.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2016] [Accepted: 03/01/2016] [Indexed: 10/22/2022]
|
21
|
Chen LS, Lin ZC, Chang JR. FIR: An Effective Scheme for Extracting Useful Metadata from Social Media. J Med Syst 2015; 39:139. [PMID: 26330225 DOI: 10.1007/s10916-015-0333-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 08/21/2015] [Indexed: 11/27/2022]
Abstract
Recently, the use of social media for health information exchange is expanding among patients, physicians, and other health care professionals. In medical areas, social media allows non-experts to access, interpret, and generate medical information for their own care and the care of others. Researchers paid much attention on social media in medical educations, patient-pharmacist communications, adverse drug reactions detection, impacts of social media on medicine and healthcare, and so on. However, relatively few papers discuss how to extract useful knowledge from a huge amount of textual comments in social media effectively. Therefore, this study aims to propose a Fuzzy adaptive resonance theory network based Information Retrieval (FIR) scheme by combining Fuzzy adaptive resonance theory (ART) network, Latent Semantic Indexing (LSI), and association rules (AR) discovery to extract knowledge from social media. In our FIR scheme, Fuzzy ART network firstly has been employed to segment comments. Next, for each customer segment, we use LSI technique to retrieve important keywords. Then, in order to make the extracted keywords understandable, association rules mining is presented to organize these extracted keywords to build metadata. These extracted useful voices of customers will be transformed into design needs by using Quality Function Deployment (QFD) for further decision making. Unlike conventional information retrieval techniques which acquire too many keywords to get key points, our FIR scheme can extract understandable metadata from social media.
Collapse
|
22
|
Sadah SA, Shahbazi M, Wiley MT, Hristidis V. A Study of the Demographics of Web-Based Health-Related Social Media Users. J Med Internet Res 2015; 17:e194. [PMID: 26250986 PMCID: PMC4705027 DOI: 10.2196/jmir.4308] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 06/15/2015] [Accepted: 07/03/2015] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The rapid spread of Web-based social media in recent years has impacted how patients share health-related information. However, little work has studied the demographics of these users. OBJECTIVE Our aim was to study the demographics of users who participate in health-related Web-based social outlets to identify possible links to health care disparities. METHODS We analyze and compare three different types of health-related social outlets: (1) general Web-based social networks, Twitter and Google+, (2) drug review websites, and (3) health Web forums. We focus on the following demographic attributes: age, gender, ethnicity, location, and writing level. We build and evaluate domain-specific classifiers to infer missing data where possible. The estimated demographic statistics are compared against various baselines, such as Internet and social networks usage of the population. RESULTS We found that (1) drug review websites and health Web forums are dominated by female users, (2) the participants of health-related social outlets are generally older with the exception of the 65+ years bracket, (3) blacks are underrepresented in health-related social networks, (4) users in areas with better access to health care participate more in Web-based health-related social outlets, and (5) the writing level of users in health-related social outlets is significantly lower than the reading level of the population. CONCLUSIONS We identified interesting and actionable disparities in the participation of various demographic groups to various types of health-related social outlets. These disparities are significantly distinct from the disparities in Internet usage or general social outlets participation.
Collapse
Affiliation(s)
- Shouq A Sadah
- Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, United States.
| | | | | | | |
Collapse
|
23
|
Abstract
Polarity classification is the main subtask of sentiment analysis and opinion mining, well-known problems in natural language processing that have attracted increasing attention in recent years. Existing approaches mainly rely on the subjective part of text in which sentiment is expressed explicitly through specific words, called sentiment words. These approaches, however, are still far from being good in the polarity classification of patients' experiences since they are often expressed without any explicit expression of sentiment, but an undesirable or desirable effect of the experience implicitly indicates a positive or negative sentiment. This paper presents a method for polarity classification of patients' experiences of drugs using domain knowledge. We first build a knowledge base of polar facts about drugs, called FactNet, using extracted patterns from Linked Data sources and relation extraction techniques. Then, we extract generalized semantic patterns of polar facts and organize them into a hierarchy in order to overcome the missing knowledge issue. Finally, we apply the extracted knowledge, i.e., polar fact instances and generalized patterns, for the polarity classification task. Different from previous approaches for personal experience classification, the proposed method explores the potential benefits of polar facts in domain knowledge aiming to improve the polarity classification performance, especially in the case of indirect implicit experiences, i.e., experiences which express the effect of one entity on other ones without any sentiment words. Using our approach, we have extracted 9703 triplets of polar facts at a precision of 92.26 percent. In addition, experiments on drug reviews demonstrate that our approach can achieve 79.78 percent precision in polarity classification task, and outperforms the state-of-the-art sentiment analysis and opinion mining methods.
Collapse
Affiliation(s)
- Samira Noferesti
- Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran.
| | - Mehrnoush Shamsfard
- Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran
| |
Collapse
|