1
|
Zammarchi G, Mola F, Conversano C. Using sentiment analysis to evaluate the impact of the COVID-19 outbreak on Italy's country reputation and stock market performance. STAT METHOD APPL-GER 2023; 32:1-22. [PMID: 37360253 PMCID: PMC10068702 DOI: 10.1007/s10260-023-00690-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/12/2023] [Indexed: 04/05/2023]
Abstract
During the recent Coronavirus disease 2019 (COVID-19) outbreak, the microblogging service Twitter has been widely used to share opinions and reactions to events. Italy was one of the first European countries to be severely affected by the outbreak and to establish lockdown and stay-at-home orders, potentially leading to country reputation damage. We resort to sentiment analysis to investigate changes in opinions about Italy reported on Twitter before and after the COVID-19 outbreak. Using different lexicons-based methods, we find a breakpoint corresponding to the date of the first established case of COVID-19 in Italy that causes a relevant change in sentiment scores used as a proxy of the country's reputation. Next, we demonstrate that sentiment scores about Italy are associated with the values of the FTSE-MIB index, the Italian Stock Exchange main index, as they serve as early detection signals of changes in the values of FTSE-MIB. Lastly, we evaluate whether different machine learning classifiers were able to determine the polarity of tweets posted before and after the outbreak with a different level of accuracy.
Collapse
Affiliation(s)
- Gianpaolo Zammarchi
- Department of Economics and Business Science, University of Cagliari, Cagliari, Italy
| | - Francesco Mola
- Department of Economics and Business Science, University of Cagliari, Cagliari, Italy
| | - Claudio Conversano
- Department of Economics and Business Science, University of Cagliari, Cagliari, Italy
| |
Collapse
|
2
|
Khademi Habibabadi S, Hallinan C, Bonomo Y, Conway M. Consumer-Generated Discourse on Cannabis as a Medicine: Scoping Review of Techniques. J Med Internet Res 2022; 24:e35974. [PMID: 36383417 PMCID: PMC9713623 DOI: 10.2196/35974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 06/16/2022] [Accepted: 07/27/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Medicinal cannabis is increasingly being used for a variety of physical and mental health conditions. Social media and web-based health platforms provide valuable, real-time, and cost-effective surveillance resources for gleaning insights regarding individuals who use cannabis for medicinal purposes. This is particularly important considering that the evidence for the optimal use of medicinal cannabis is still emerging. Despite the web-based marketing of medicinal cannabis to consumers, currently, there is no robust regulatory framework to measure clinical health benefits or individual experiences of adverse events. In a previous study, we conducted a systematic scoping review of studies that contained themes of the medicinal use of cannabis and used data from social media and search engine results. This study analyzed the methodological approaches and limitations of these studies. OBJECTIVE We aimed to examine research approaches and study methodologies that use web-based user-generated text to study the use of cannabis as a medicine. METHODS We searched MEDLINE, Scopus, Web of Science, and Embase databases for primary studies in the English language from January 1974 to April 2022. Studies were included if they aimed to understand web-based user-generated text related to health conditions where cannabis is used as a medicine or where health was mentioned in general cannabis-related conversations. RESULTS We included 42 articles in this review. In these articles, Twitter was used 3 times more than other computer-generated sources, including Reddit, web-based forums, GoFundMe, YouTube, and Google Trends. Analytical methods included sentiment assessment, thematic analysis (manual and automatic), social network analysis, and geographic analysis. CONCLUSIONS This study is the first to review techniques used by research on consumer-generated text for understanding cannabis as a medicine. It is increasingly evident that consumer-generated data offer opportunities for a greater understanding of individual behavior and population health outcomes. However, research using these data has some limitations that include difficulties in establishing sample representativeness and a lack of methodological best practices. To address these limitations, deidentified annotated data sources should be made publicly available, researchers should determine the origins of posts (organizations, bots, power users, or ordinary individuals), and powerful analytical techniques should be used.
Collapse
Affiliation(s)
- Sedigheh Khademi Habibabadi
- Department of General Practice, Melbourne Medical School, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Christine Hallinan
- Department of General Practice, Melbourne Medical School, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
- Health & Biomedical Research Information Technology Unit, The University of Melbourne, Melbourne, Australia
| | - Yvonne Bonomo
- Department of General Practice, Melbourne Medical School, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Mike Conway
- School of Computing & Information Systems, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
3
|
Jahja M, Chin A, Tibshirani RJ. Real-Time Estimation of COVID-19 Infections: Deconvolution and Sensor Fusion. Stat Sci 2022. [DOI: 10.1214/22-sts856] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Maria Jahja
- Maria Jahja is Ph.D. Candidate, Department of Statistics & Data Science, Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Andrew Chin
- Andrew Chin is Statistical Developer, Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Ryan J. Tibshirani
- Ryan J. Tibshirani is Professor, Department of Statistics & Data Science, Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
4
|
Dennis A, Robin C, Carter H. The social media response to twice-weekly mass asymptomatic testing in England. BMC Public Health 2022; 22:182. [PMID: 35081908 PMCID: PMC8791807 DOI: 10.1186/s12889-022-12605-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 01/13/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND From 9th April 2021, everyone in England has been encouraged to take two COVID-19 tests per week. This is the first time that national mass asymptomatic testing has been introduced in the UK and the effectiveness of the policy depends on uptake with testing and willingness to self-isolate following a positive test result. This paper examines attitudes towards twice-weekly testing, as well as barriers and facilitators to engaging in testing. METHODS Between 5th April and 28th May 2021 we searched Twitter, Facebook, and online news articles with publicly available comment sections to identify comments relating to twice-weekly testing. We identified 5783 comments which were then analysed using a framework analysis. RESULTS We identified nine main themes. Five themes related to barriers to engaging in testing: low perceived risk from COVID-19; mistrust in the government; concern about taking a test; perceived ineffectiveness of twice-weekly testing policy; and perceived negative impact of twice-weekly testing policy. Four themes related to facilitators to engaging in testing: wanting to protect others; positive perceptions of tests; a desire to return to normal; and perceived efficacy for reducing asymptomatic transmission. CONCLUSIONS Overall, the comments identified indicated predominately negative attitudes towards the twice weekly testing policy. Several recommendations can be made to improve engagement with twice weekly testing, including: 1) communicate openly and honestly about the purpose of testing; 2) provide information about the accuracy of tests; 3) provide financial support for those required to self-isolate, and; 4) emphasise accessibility of testing.
Collapse
Affiliation(s)
- Amelia Dennis
- Behavioural Science and Insights Unit, Emergency Response Department, Public Health England, Porton Down, Salisbury, SP4 0JG, UK.
| | - Charlotte Robin
- Behavioural Science and Insights Unit, Emergency Response Department, Public Health England, Porton Down, Salisbury, SP4 0JG, UK
| | - Holly Carter
- Behavioural Science and Insights Unit, Emergency Response Department, Public Health England, Porton Down, Salisbury, SP4 0JG, UK
| |
Collapse
|
5
|
Khademi Habibabadi S, Delir Haghighi P, Burstein F, Buttery J. Vaccine adverse event mentions in social media: Mining the language of Twitter conversations (Preprint). JMIR Med Inform 2021; 10:e34305. [PMID: 35708760 PMCID: PMC9247809 DOI: 10.2196/34305] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Revised: 02/22/2022] [Accepted: 04/11/2022] [Indexed: 11/13/2022] Open
Abstract
Background Traditional monitoring for adverse events following immunization (AEFI) relies on various established reporting systems, where there is inevitable lag between an AEFI occurring and its potential reporting and subsequent processing of reports. AEFI safety signal detection strives to detect AEFI as early as possible, ideally close to real time. Monitoring social media data holds promise as a resource for this. Objective The primary aim of this study is to investigate the utility of monitoring social media for gaining early insights into vaccine safety issues, by extracting vaccine adverse event mentions (VAEMs) from Twitter, using natural language processing techniques. The secondary aims are to document the natural language processing techniques used and identify the most effective of them for identifying tweets that contain VAEM, with a view to define an approach that might be applicable to other similar social media surveillance tasks. Methods A VAEM-Mine method was developed that combines topic modeling with classification techniques to extract maximal VAEM posts from a vaccine-related Twitter stream, with high degree of confidence. The approach does not require a targeted search for specific vaccine reaction–indicative words, but instead, identifies VAEM posts according to their language structure. Results The VAEM-Mine method isolated 8992 VAEMs from 811,010 vaccine-related Twitter posts and achieved an F1 score of 0.91 in the classification phase. Conclusions Social media can assist with the detection of vaccine safety signals as a valuable complementary source for monitoring mentions of vaccine adverse events. A social media–based VAEM data stream can be assessed for changes to detect possible emerging vaccine safety signals, helping to address the well-recognized limitations of passive reporting systems, including lack of timeliness and underreporting.
Collapse
Affiliation(s)
- Sedigheh Khademi Habibabadi
- Centre for Health Analytics, Melbourne Children's Campus, Melbourne, Australia
- Department of General Practice, University of Melbourne, Melbourne, Australia
| | - Pari Delir Haghighi
- Department of Human-Centred Computing, Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Frada Burstein
- Department of Human-Centred Computing, Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Jim Buttery
- Centre for Health Analytics, Melbourne Children's Campus, Melbourne, Australia
- Department of Paediatrics, University of Melbourne, Melbourne, Australia
| |
Collapse
|
6
|
Warastuti W. The Effect of Tiwari Onion (Eleutherine americana Merr) Tablet on Blood Pressure Stability in Diagnosed Hypertension Patients. Open Access Maced J Med Sci 2021. [DOI: 10.3889/oamjms.2021.6273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
BACKGROUND: Non-pharmacological management is carried out by adjusting the lifestyle and non-pharmacological therapy for hypertension sufferers. They use herbal therapy, which is believed to have low side effects, easy, and inexpensive, such as Eleutherine americana Merr Tea.
AIM: This study aimed to analyze Effect of E. americana Merr Tablet on blood pressure in hypertensive patients to be used as supportive therapy to reduce and stabilize blood pressure.
METHODS: This study used a quasi-experimental research design pre-post-test with a control group design, involving 30 respondents. Data collection was carried out for 1 month. Then blood pressure observations were carried out every 1 week for a month after giving E. americana tablets. The sampling technique was purposive sampling method. The data analysis used was the Independent Sample T-Test with a significance level of p < 0.05.
RESULTS: The majority of respondents according to age were in the final elderly age range; 15 people (50%) were 56–65 years old. Most of the respondents involved were female, about 18 people (60%). The respondents’ hypertension category included hypertension Grade 1 (57%) and hypertension Grade II (43%). The independent sample t-test obtained p = systole was 0.029 and diastole was 0.000 (p < 0.005). It showed a significant difference in blood pressure before and after being given E. americana tablets in hypertensive patients.
CONCLUSION: Significant decrease in systolic and diastolic blood pressure in patients with suspected hypertension who obtained E. americana tablets.
Collapse
|
7
|
Karami A, Dahl AA, Shaw G, Valappil SP, Turner-McGrievy G, Kharrazi H, Bozorgi P. Analysis of Social Media Discussions on (#)Diet by Blue, Red, and Swing States in the U.S. Healthcare (Basel) 2021; 9:healthcare9050518. [PMID: 33946659 PMCID: PMC8145395 DOI: 10.3390/healthcare9050518] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 04/08/2021] [Accepted: 04/20/2021] [Indexed: 12/14/2022] Open
Abstract
The relationship between political affiliations and diet-related discussions on social media has not been studied on a population level. This study used a cost- and -time effective framework to leverage, aggregate, and analyze data from social media. This paper enhances our understanding of diet-related discussions with respect to political orientations in U.S. states. This mixed methods study used computational methods to collect tweets containing "diet" or "#diet" shared in a year, identified tweets posted by U.S. Twitter users, disclosed topics of tweets, and compared democratic, republican, and swing states based on the weight of topics. A qualitative method was employed to code topics. We found 32 unique topics extracted from more than 800,000 tweets, including a wide range of themes, such as diet types and chronic conditions. Based on the comparative analysis of the topic weights, our results revealed a significant difference between democratic, republican, and swing states. The largest difference was detected between swing and democratic states, and the smallest difference was identified between swing and republican states. Our study provides initial insight on the association of potential political leanings with health (e.g., dietary behaviors). Our results show diet discussions differ depending on the political orientation of the state in which Twitter users reside. Understanding the correlation of dietary preferences based on political orientation can help develop targeted and effective health promotion, communication, and policymaking strategies.
Collapse
Affiliation(s)
- Amir Karami
- School of Information Science, University of South Carolina, Columbia, SC 29208, USA
- Correspondence:
| | - Alicia A. Dahl
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA; (A.A.D.); (G.S.J.)
| | - George Shaw
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA; (A.A.D.); (G.S.J.)
| | - Sruthi Puthan Valappil
- Computer Science and Engineering Department, University of South Carolina, Columbia, SC 29208, USA;
| | - Gabrielle Turner-McGrievy
- Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA; (G.T.-M.); (P.B.)
| | - Hadi Kharrazi
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA;
| | - Parisa Bozorgi
- Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA; (G.T.-M.); (P.B.)
- South Carolina Department of Health and Environmental Control, Columbia, SC 29201, USA
| |
Collapse
|
8
|
Abstract
This article describes the current landscape in the fields of social media and socio-technical systems. In particular, it analyzes the different ways in which social media are adopted in organizations, workplaces, educational and smart environments. One interesting aspect of this integration, is the use of social media for members’ participation and access to the processes and services of their organization. Those services cover many different types of daily routines and life activities, such as health, education, transports. In this survey, we compare and classify current research works according to multiple features, including: the use of Social Network Analysis and Social Capital models, users’ motivations for participation and organizational costs, adoption of the social media platform from below. Our results show that many of these current systems are developed without taking into proper consideration the social structures and processes, with some notable and positive exceptions.
Collapse
|
9
|
Xu P, Dredze M, Broniatowski DA. The Twitter Social Mobility Index: Measuring Social Distancing Practices With Geolocated Tweets. J Med Internet Res 2020; 22:e21499. [PMID: 33048823 PMCID: PMC7717895 DOI: 10.2196/21499] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 08/04/2020] [Accepted: 10/11/2020] [Indexed: 11/21/2022] Open
Abstract
Background Social distancing is an important component of the response to the COVID-19 pandemic. Minimizing social interactions and travel reduces the rate at which the infection spreads and “flattens the curve” so that the medical system is better equipped to treat infected individuals. However, it remains unclear how the public will respond to these policies as the pandemic continues. Objective The aim of this study is to present the Twitter Social Mobility Index, a measure of social distancing and travel derived from Twitter data. We used public geolocated Twitter data to measure how much users travel in a given week. Methods We collected 469,669,925 tweets geotagged in the United States from January 1, 2019, to April 27, 2020. We analyzed the aggregated mobility variance of a total of 3,768,959 Twitter users at the city and state level from the start of the COVID-19 pandemic. Results We found a large reduction (61.83%) in travel in the United States after the implementation of social distancing policies. However, the variance by state was high, ranging from 38.54% to 76.80%. The eight states that had not issued statewide social distancing orders as of the start of April ranked poorly in terms of travel reduction: Arkansas (45), Iowa (37), Nebraska (35), North Dakota (22), South Carolina (38), South Dakota (46), Oklahoma (50), Utah (14), and Wyoming (53). We are presenting our findings on the internet and will continue to update our analysis during the pandemic. Conclusions We observed larger travel reductions in states that were early adopters of social distancing policies and smaller changes in states without such policies. The results were also consistent with those based on other mobility data to a certain extent. Therefore, geolocated tweets are an effective way to track social distancing practices using a public resource, and this tracking may be useful as part of ongoing pandemic response planning.
Collapse
Affiliation(s)
- Paiheng Xu
- Malone Center for Engineering in Healthcare, Center for Language and Speech Processing, Department of Computer Science, Johns Hopkins University, Baltimore, MD, United States
| | - Mark Dredze
- Malone Center for Engineering in Healthcare, Center for Language and Speech Processing, Department of Computer Science, Johns Hopkins University, Baltimore, MD, United States
| | - David A Broniatowski
- Department of Engineering Management and Systems Engineering, The George Washington University, Washington, DC, United States
| |
Collapse
|
10
|
Safarnejad L, Xu Q, Ge Y, Bagavathi A, Krishnan S, Chen S. Identifying Influential Factors in the Discussion Dynamics of Emerging Health Issues on Social Media: Computational Study. JMIR Public Health Surveill 2020; 6:e17175. [PMID: 32348275 PMCID: PMC7420635 DOI: 10.2196/17175] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 02/08/2020] [Accepted: 03/06/2020] [Indexed: 12/23/2022] Open
Abstract
Background Social media has become a major resource for observing and understanding public opinions using infodemiology and infoveillance methods, especially during emergencies such as disease outbreaks. For public health agencies, understanding the driving forces of web-based discussions will help deliver more effective and efficient information to general users on social media and the web. Objective The study aimed to identify the major contributors that drove overall Zika-related tweeting dynamics during the 2016 epidemic. In total, 3 hypothetical drivers were proposed: (1) the underlying Zika epidemic quantified as a time series of case counts; (2) sporadic but critical real-world events such as the 2016 Rio Olympics and World Health Organization’s Public Health Emergency of International Concern (PHEIC) announcement, and (3) a few influential users’ tweeting activities. Methods All tweets and retweets (RTs) containing the keyword Zika posted in 2016 were collected via the Gnip application programming interface (API). We developed an analytical pipeline, EventPeriscope, to identify co-occurring trending events with Zika and quantify the strength of these events. We also retrieved Zika case data and identified the top influencers of the Zika discussion on Twitter. The influence of 3 potential drivers was examined via a multivariate time series analysis, signal processing, a content analysis, and text mining techniques. Results Zika-related tweeting dynamics were not significantly correlated with the underlying Zika epidemic in the United States in any of the four quarters in 2016 nor in the entire year. Instead, peaks of Zika-related tweeting activity were strongly associated with a few critical real-world events, both planned, such as the Rio Olympics, and unplanned, such as the PHEIC announcement. The Rio Olympics was mentioned in >15% of all Zika-related tweets and PHEIC occurred in 27% of Zika-related tweets around their respective peaks. In addition, the overall tweeting dynamics of the top 100 most actively tweeting users on the Zika topic, the top 100 users receiving most RTs, and the top 100 users mentioned were the most highly correlated to and preceded the overall tweeting dynamics, making these groups of users the potential drivers of tweeting dynamics. The top 100 users who retweeted the most were not critical in driving the overall tweeting dynamics. There were very few overlaps among these different groups of potentially influential users. Conclusions Using our proposed analytical workflow, EventPeriscope, we identified that Zika discussion dynamics on Twitter were decoupled from the actual disease epidemic in the United States but were closely related to and highly influenced by certain sporadic real-world events as well as by a few influential users. This study provided a methodology framework and insights to better understand the driving forces of web-based public discourse during health emergencies. Therefore, health agencies could deliver more effective and efficient web-based communications in emerging crises.
Collapse
Affiliation(s)
- Lida Safarnejad
- College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, United States
| | - Qian Xu
- School of Communications, Elon University, Elon, NC, United States
| | - Yaorong Ge
- College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, United States
| | | | - Siddharth Krishnan
- College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, United States
| | - Shi Chen
- College of Health and Human Services, University of North Carolina at Charlotte, Charlotte, NC, United States
| |
Collapse
|
11
|
Nobles AL, Leas EC, Latkin CA, Dredze M, Strathdee SA, Ayers JW. #HIV: Alignment of HIV-Related Visual Content on Instagram with Public Health Priorities in the US. AIDS Behav 2020; 24:2045-2053. [PMID: 31916098 PMCID: PMC10712936 DOI: 10.1007/s10461-019-02765-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Instagram, with more than 1 billion monthly users, is the go-to social media platform to chronicle one's life via images, but how are people using the platform to present visual content about HIV? We analyzed public Instagram posts containing the hashtag "#HIV" (because they are self-tagged as related to HIV) between January 2017 and July 2018. We described the prevalence of co-occurring hashtags and explored thematic concepts in the images using automated image recognition and topic modeling. Twenty-eight percent of all #HIV posts included hashtags focused on awareness, followed by LGBTQ (24.5%) and living with HIV (17.9%). However, specific strategies were rarely cited, including testing (10.8%), treatment (10.3%), PrEP (6.2%) and condoms (4.1%). Image analyses revealed 44.5% of posts included infographics followed by people (21.3%) thereby humanizing HIV and stigmatized populations and promoting community mobilization. Novel content such as the handwriting image-theme (3.8%) where posters shared their HIV test results appeared. We discuss how this visual content aligns with public health priorities to reduce HIV in the US and the novel, organic messages that public health could help amplify.
Collapse
Affiliation(s)
- Alicia L Nobles
- The Center for Data Driven Health at Qualcomm Institute, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, CA, USA
- Division of Infectious Diseases and Global Public Health, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Eric C Leas
- The Center for Data Driven Health at Qualcomm Institute, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, CA, USA
- Division of Health Policy, Department of Family Medicine and Public Health, University of California San Diego, La Jolla, CA, USA
| | - Carl A Latkin
- Department of Health, Behavior and Society, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, USA
| | - Mark Dredze
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Steffanie A Strathdee
- Division of Infectious Diseases and Global Public Health, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - John W Ayers
- The Center for Data Driven Health at Qualcomm Institute, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, CA, USA.
- Division of Infectious Diseases and Global Public Health, Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
12
|
Chancellor S, De Choudhury M. Methods in predictive techniques for mental health status on social media: a critical review. NPJ Digit Med 2020; 3:43. [PMID: 32219184 PMCID: PMC7093465 DOI: 10.1038/s41746-020-0233-7] [Citation(s) in RCA: 109] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 01/17/2020] [Indexed: 01/03/2023] Open
Abstract
Social media is now being used to model mental well-being, and for understanding health outcomes. Computer scientists are now using quantitative techniques to predict the presence of specific mental disorders and symptomatology, such as depression, suicidality, and anxiety. This research promises great benefits to monitoring efforts, diagnostics, and intervention design for these mental health statuses. Yet, there is no standardized process for evaluating the validity of this research and the methods adopted in the design of these studies. We conduct a systematic literature review of the state-of-the-art in predicting mental health status using social media data, focusing on characteristics of the study design, methods, and research design. We find 75 studies in this area published between 2013 and 2018. Our results outline the methods of data annotation for mental health status, data collection and quality management, pre-processing and feature selection, and model selection and verification. Despite growing interest in this field, we identify concerning trends around construct validity, and a lack of reflection in the methods used to operationalize and identify mental health status. We provide some recommendations to address these challenges, including a list of proposed reporting standards for publications and collaboration opportunities in this interdisciplinary space.
Collapse
Affiliation(s)
- Stevie Chancellor
- Department of Computer Science, Northwestern University, Evanston, IL USA
| | | |
Collapse
|
13
|
Hochberg I, Allon R, Yom-Tov E. Assessment of the Frequency of Online Searches for Symptoms Before Diagnosis: Analysis of Archival Data. J Med Internet Res 2020; 22:e15065. [PMID: 32141835 PMCID: PMC7084283 DOI: 10.2196/15065] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 10/07/2019] [Accepted: 12/16/2019] [Indexed: 12/18/2022] Open
Abstract
Background Surveys suggest that a large proportion of people use the internet to search for information on medical symptoms they experience and that around one-third of the people in the United States self-diagnose using online information. However, surveys are known to be biased, and the true rates at which people search for information on their medical symptoms before receiving a formal medical diagnosis are unknown. Objective This study aimed to estimate the rate at which people search for information on their medical symptoms before receiving a formal medical diagnosis by a health professional. Methods We collected queries made on a general-purpose internet search engine by people in the United States who self-identified their diagnosis from 1 of 20 medical conditions. We focused on conditions that have evident symptoms and are neither screened systematically nor a part of usual medical care. Thus, they are generally diagnosed after the investigation of specific symptoms. We evaluated how many of these people queried for symptoms associated with their medical condition before their formal diagnosis. In addition, we used a survey questionnaire to assess the familiarity of laypeople with the symptoms associated with these conditions. Results On average, 15.49% (1792/12,367, SD 8.4%) of people queried about symptoms associated with their medical condition before receiving a medical diagnosis. A longer duration between the first query for a symptom and the corresponding diagnosis was correlated with an increased likelihood of people querying about those symptoms (rho=0.6; P=.005); similarly, unfamiliarity with the association between a condition and its symptom was correlated with an increased likelihood of people querying about those symptoms (rho=−0.47; P=.08). In addition, worrying symptoms were 14% more likely to be queried about. Conclusions Our results indicate that there is large variability in the percentage of people who query the internet for their symptoms before a formal medical diagnosis is made. This finding has important implications for systems that attempt to screen for medical conditions.
Collapse
Affiliation(s)
- Irit Hochberg
- Institute of Endocrinology, Diabetes, and Metabolism, Rambam Health Care Campus, Haifa, Israel.,Bruce Rappaport Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel
| | - Raviv Allon
- Bruce Rappaport Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel
| | - Elad Yom-Tov
- Microsoft Research, Herzeliya, Israel.,Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Haifa, Israel
| |
Collapse
|
14
|
Tao D, Yang P, Feng H. Utilization of text mining as a big data analysis tool for food science and nutrition. Compr Rev Food Sci Food Saf 2020; 19:875-894. [PMID: 33325182 DOI: 10.1111/1541-4337.12540] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 12/26/2019] [Accepted: 01/13/2020] [Indexed: 12/21/2022]
Abstract
Big data analysis has found applications in many industries due to its ability to turn huge amounts of data into insights for informed business and operational decisions. Advanced data mining techniques have been applied in many sectors of supply chains in the food industry. However, the previous work has mainly focused on the analysis of instrument-generated data such as those from hyperspectral imaging, spectroscopy, and biometric receptors. The importance of digital text data in the food and nutrition has only recently gained attention due to advancements in big data analytics. The purpose of this review is to provide an overview of the data sources, computational methods, and applications of text data in the food industry. Text mining techniques such as word-level analysis (e.g., frequency analysis), word association analysis (e.g., network analysis), and advanced techniques (e.g., text classification, text clustering, topic modeling, information retrieval, and sentiment analysis) will be discussed. Applications of text data analysis will be illustrated with respect to food safety and food fraud surveillance, dietary pattern characterization, consumer-opinion mining, new-product development, food knowledge discovery, food supply-chain management, and online food services. The goal is to provide insights for intelligent decision-making to improve food production, food safety, and human nutrition.
Collapse
Affiliation(s)
- Dandan Tao
- Department of Food Science and Human Nutrition, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Pengkun Yang
- Department of Electrical Engineering, Princeton University, Princeton, New Jersey
| | - Hao Feng
- Department of Food Science and Human Nutrition, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois
| |
Collapse
|
15
|
Evans WD, Thomas CN, Favatas D, Smyser J, Briggs J. Digital Segmentation of Priority Populations in Public Health. HEALTH EDUCATION & BEHAVIOR 2019; 46:81-89. [PMID: 31742454 DOI: 10.1177/1090198119871246] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The rapid growth and diffusion of digital media technologies has changed the landscape of market segmentation in the last two decades, including its use in promoting prosocial and behavior change. New, population-specific and culturally appropriate prevention strategies can leverage the potential of digital media to influence health outcomes, especially for the greatest users of digital technology, including youth and young adults. Health behavior change campaigns are increasingly shifting resources to social media, creating opportunities for innovative interventions and new research methods. This article examines three case studies of digital segmentation: (1) tobacco control from the Truth Initiative, (2) community-based public health programs from the Centers for Disease Control and Prevention, and (3) substance use (including opioids) and other risk behavior prevention from Public Good Projects. These case studies of recent digital segmentation efforts in the not-for-profit, government, and academic sectors show that it increases reach and frequency of messages delivered to priority populations. The practice of digital segmentation is rapidly growing, shows early signs of effectiveness, and may enhance future public health campaigns. Additional research could optimize its use and effectiveness in promoting prosocial and behavior change campaign outcomes.
Collapse
Affiliation(s)
| | - Christopher N Thomas
- Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Nutrition, Physical Activity, and Obesity, Atlanta, GA, USA
| | | | | | | |
Collapse
|
16
|
Lossio-Ventura JA, Morzan J, Alatrista-Salas H, Hernandez-Boussard T, Bian J. Clustering and topic modeling over tweets: A comparison over a health dataset. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2019; 2019:1544-1547. [PMID: 35463811 PMCID: PMC9028681 DOI: 10.1109/bibm47256.2019.8983167] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.
Collapse
Affiliation(s)
| | | | | | | | - Jiang Bian
- Health Outcomes & Biomedical Informatics, University of Florida, USA
| |
Collapse
|
17
|
Strathdee SA, Nobles AL, Ayers JW. Harnessing digital data and data science to achieve 90-90-90 goals to end the HIV epidemic. Curr Opin HIV AIDS 2019; 14:481-485. [PMID: 31449089 PMCID: PMC6956609 DOI: 10.1097/coh.0000000000000584] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
PURPOSE OF REVIEW Effective public health interventions depend on timely, accurate surveillance. Harnessing digital data (including internet searches, social media, and online media) and data science is an emerging approach to complement traditional surveillance in public health but has been underutilized in HIV prevention and treatment. RECENT FINDINGS We highlight recent examples that illustrate how social media data can be applied to HIV surveillance and prevention interventions. SUMMARY To achieve 90-90-90 goals to end the HIV epidemic, we encourage traditional public health researchers to partner with data scientists to supplement HIV surveillance programs with social media analytics to refine estimates of HIV infections and key populations at risk and to identify subgroups and regions where prevention and treatment efforts need to be bolstered. We also encourage interdisciplinary teams to design interventions to promote HIV prevention and linkage to care by leveraging digital media, such as search engines and social media, that have the potential to reach millions of people instantaneously.
Collapse
Affiliation(s)
- Steffanie A. Strathdee
- Division of Infectious Disease and Global Public Health, Department of Medicine, UC San Diego, La Jolla, CA
| | - Alicia Lynn Nobles
- Division of Infectious Disease and Global Public Health, Department of Medicine, UC San Diego, La Jolla, CA
| | - John W. Ayers
- Division of Infectious Disease and Global Public Health, Department of Medicine, UC San Diego, La Jolla, CA
| |
Collapse
|
18
|
Daughton AR, Paul MJ. Identifying Protective Health Behaviors on Twitter: Observational Study of Travel Advisories and Zika Virus. J Med Internet Res 2019; 21:e13090. [PMID: 31094347 PMCID: PMC6535980 DOI: 10.2196/13090] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 03/18/2019] [Accepted: 04/02/2019] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND An estimated 3.9 billion individuals live in a location endemic for common mosquito-borne diseases. The emergence of Zika virus in South America in 2015 marked the largest known Zika outbreak and caused hundreds of thousands of infections. Internet data have shown promise in identifying human behaviors relevant for tracking and understanding other diseases. OBJECTIVE Using Twitter posts regarding the 2015-16 Zika virus outbreak, we sought to identify and describe considerations and self-disclosures of a specific behavior change relevant to the spread of disease-travel cancellation. If this type of behavior is identifiable in Twitter, this approach may provide an additional source of data for disease modeling. METHODS We combined keyword filtering and machine learning classification to identify first-person reactions to Zika in 29,386 English-language tweets in the context of travel, including considerations and reports of travel cancellation. We further explored demographic, network, and linguistic characteristics of users who change their behavior compared with control groups. RESULTS We found differences in the demographics, social networks, and linguistic patterns of 1567 individuals identified as changing or considering changing travel behavior in response to Zika as compared with a control sample of Twitter users. We found significant differences between geographic areas in the United States, significantly more discussion by women than men, and some evidence of differences in levels of exposure to Zika-related information. CONCLUSIONS Our findings have implications for informing the ways in which public health organizations communicate with the public on social media, and the findings contribute to our understanding of the ways in which the public perceives and acts on risks of emerging infectious diseases.
Collapse
Affiliation(s)
- Ashlynn R Daughton
- Analytics, Intelligence, and Technology, Los Alamos National Laboratory, Los Alamos, NM, United States
- Information Science, University of Colorado, Boulder, Boulder, CO, United States
| | - Michael J Paul
- Information Science, University of Colorado, Boulder, Boulder, CO, United States
| |
Collapse
|
19
|
Abstract
In recent years, there has been an exponential growth in the number of complex documentsand texts that require a deeper understanding of machine learning methods to be able to accuratelyclassify texts in many applications. Many machine learning approaches have achieved surpassingresults in natural language processing. The success of these learning algorithms relies on their capacityto understand complex models and non-linear relationships within data. However, finding suitablestructures, architectures, and techniques for text classification is a challenge for researchers. In thispaper, a brief overview of text classification algorithms is discussed. This overview covers differenttext feature extractions, dimensionality reduction methods, existing algorithms and techniques, andevaluations methods. Finally, the limitations of each technique and their application in real-worldproblems are discussed.
Collapse
|
20
|
Abstract
In recent years, there has been an exponential growth in the number of complex documentsand texts that require a deeper understanding of machine learning methods to be able to accuratelyclassify texts in many applications. Many machine learning approaches have achieved surpassingresults in natural language processing. The success of these learning algorithms relies on their capacityto understand complex models and non-linear relationships within data. However, finding suitablestructures, architectures, and techniques for text classification is a challenge for researchers. In thispaper, a brief overview of text classification algorithms is discussed. This overview covers differenttext feature extractions, dimensionality reduction methods, existing algorithms and techniques, andevaluations methods. Finally, the limitations of each technique and their application in real-worldproblems are discussed.
Collapse
|
21
|
Muralidhara S, Paul MJ. #Healthy Selfies: Exploration of Health Topics on Instagram. JMIR Public Health Surveill 2018; 4:e10150. [PMID: 29959106 PMCID: PMC6045785 DOI: 10.2196/10150] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Revised: 05/17/2018] [Accepted: 05/29/2018] [Indexed: 12/28/2022] Open
Abstract
Background Social media provides a complementary source of information for public health surveillance. The dominate data source for this type of monitoring is the microblogging platform Twitter, which is convenient due to the free availability of public data. Less is known about the utility of other social media platforms, despite their popularity. Objective This work aims to characterize the health topics that are prominently discussed in the image-sharing platform Instagram, as a step toward understanding how this data might be used for public health research. Methods The study uses a topic modeling approach to discover topics in a dataset of 96,426 Instagram posts containing hashtags related to health. We use a polylingual topic model, initially developed for datasets in different natural languages, to model different modalities of data: hashtags, caption words, and image tags automatically extracted using a computer vision tool. Results We identified 47 health-related topics in the data (kappa=.77), covering ten broad categories: acute illness, alternative medicine, chronic illness and pain, diet, exercise, health care & medicine, mental health, musculoskeletal health and dermatology, sleep, and substance use. The most prevalent topics were related to diet (8,293/96,426; 8.6% of posts) and exercise (7,328/96,426; 7.6% of posts). Conclusions A large and diverse set of health topics are discussed in Instagram. The extracted image tags were generally too coarse and noisy to be used for identifying posts but were in some cases accurate for identifying images relevant to studying diet and substance use. Instagram shows potential as a source of public health information, though limitations in data collection and metadata availability may limit its use in comparison to platforms like Twitter.
Collapse
Affiliation(s)
- Sachin Muralidhara
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, United States
| | - Michael J Paul
- Department of Information Science, University of Colorado Boulder, Boulder, CO, United States
| |
Collapse
|