1
|
Mangalik S, Eichstaedt JC, Giorgi S, Mun J, Ahmed F, Gill G, V Ganesan A, Subrahmanya S, Soni N, Clouston SAP, Schwartz HA. Robust language-based mental health assessments in time and space through social media. NPJ Digit Med 2024; 7:109. [PMID: 38698174 PMCID: PMC11065872 DOI: 10.1038/s41746-024-01100-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 04/04/2024] [Indexed: 05/05/2024] Open
Abstract
In the most comprehensive population surveys, mental health is only broadly captured through questionnaires asking about "mentally unhealthy days" or feelings of "sadness." Further, population mental health estimates are predominantly consolidated to yearly estimates at the state level, which is considerably coarser than the best estimates of physical health. Through the large-scale analysis of social media, robust estimation of population mental health is feasible at finer resolutions. In this study, we created a pipeline that used ~1 billion Tweets from 2 million geo-located users to estimate mental health levels and changes for depression and anxiety, the two leading mental health conditions. Language-based mental health assessments (LBMHAs) had substantially higher levels of reliability across space and time than available survey measures. This work presents reliable assessments of depression and anxiety down to the county-weeks level. Where surveys were available, we found moderate to strong associations between the LBMHAs and survey scores for multiple levels of granularity, from the national level down to weekly county measurements (fixed effects β = 0.34 to 1.82; p < 0.001). LBMHAs demonstrated temporal validity, showing clear absolute increases after a list of major societal events (+23% absolute change for depression assessments). LBMHAs showed improved external validity, evidenced by stronger correlations with measures of health and socioeconomic status than population surveys. This study shows that the careful aggregation of social media data yields spatiotemporal estimates of population mental health that exceed the granularity achievable by existing population surveys, and does so with generally greater reliability and validity.
Collapse
Affiliation(s)
- Siddharth Mangalik
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.
| | - Johannes C Eichstaedt
- Department of Psychology, Stanford University, Stanford, CA, USA.
- Institute for Human-Centered A.I., Stanford University, Stanford, CA, USA.
| | - Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA
| | - Jihu Mun
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Farhan Ahmed
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Gilvir Gill
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Adithya V Ganesan
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | | | - Nikita Soni
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Sean A P Clouston
- Department of Family, Population, and Preventive Medicine, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY, USA
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.
| |
Collapse
|
2
|
Bhogal AN, Berrocal VJ, Romero DM, Willis MA, Vydiswaran VGV, Veinot TC. Social Acceptability of Health Behavior Posts on Social Media: An Experiment. Am J Prev Med 2024; 66:870-876. [PMID: 38191003 DOI: 10.1016/j.amepre.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 01/03/2024] [Accepted: 01/03/2024] [Indexed: 01/10/2024]
Abstract
INTRODUCTION Social media sites like Twitter (now X) are increasingly used to create health behavior metrics for public health surveillance. Yet little is known about social norms that may bias the content of posts about health behaviors. Social norms for posts about four health behaviors (smoking tobacco, drinking alcohol, physical activity, eating food) on Twitter/X were evaluated. METHODS This was a randomized experiment delivered via web-based survey to adult, English-speaking Twitter/X users in three Michigan, USA, counties from 2020 to 2022 (n=559). Each participant viewed 24 posts presenting experimental manipulations regarding four health behaviors and answered questions about each post's social acceptability. Principal component analysis was used to combine survey responses into one perceived social acceptability measure. Linear mixed models with the Benjamini-Hochberg correction were implemented to test seven study hypotheses in 2023. RESULTS Supporting six hypotheses, posts presenting healthier (CI: 0.028, 0.454), less stigmatized behaviors (CI: 0.552, 0.157) were more socially acceptable than posts regarding unhealthier, stigmatized behaviors. Unhealthy (CI: -0.268, -0.109) and stigmatized behavior (CI: -0.261, -0.103) posts were less acceptable for more educated participants. Posts about collocated activities (CI: 0.410, 0.573) and accompanied by expressions of liking (CI: 0.906, 1.11) were more acceptable than activities undertaken alone or disliked. Contrary to one hypothesis, posts reporting unusual activities were less acceptable than usual ones (CI: -0.472, 0.312). CONCLUSIONS Perceived social acceptability may be associated with the frequency and content of health behavior posts. Users of Twitter/X and other social media platform posts to estimate health behavior prevalence should account for potential estimation biases from perceived social acceptability of posts.
Collapse
Affiliation(s)
- Ashley N Bhogal
- School of Information, University of Michigan, Ann Arbor, Michigan
| | - Veronica J Berrocal
- Department of Statistics, University of California Irvine Donald Bren School of Information and Computer Sciences, Irvine, California
| | - Daniel M Romero
- School of Information, University of Michigan, Ann Arbor, Michigan; Center for the Study of Complex Systems, University of Michigan College of Literature, Science, and the Arts, Ann Arbor, Michigan; Department of Electrical Engineering and Computer Science, College of Engineering, University of Michigan, Ann Arbor, Michigan
| | - Matthew A Willis
- School of Information, University of Michigan, Ann Arbor, Michigan
| | - V G Vinod Vydiswaran
- School of Information, University of Michigan, Ann Arbor, Michigan; Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan
| | - Tiffany C Veinot
- School of Information, University of Michigan, Ann Arbor, Michigan; Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan; Department of Health Behavior and Health Education, University of Michigan School of Public Health, Ann Arbor, Michigan.
| |
Collapse
|
3
|
Chart-Pascual JP, Montero-Torres M, Ortega MA, Mar-Barrutia L, Zorrilla Martinez I, Alvarez-Mon M, Gonzalez-Pinto A, Alvarez-Mon MA. Areas of interest and sentiment analysis towards second generation antipsychotics, lithium and mood stabilizing anticonvulsants: Unsupervised analysis using Twitter. J Affect Disord 2024; 351:649-660. [PMID: 38290587 DOI: 10.1016/j.jad.2024.01.234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 01/23/2024] [Accepted: 01/26/2024] [Indexed: 02/01/2024]
Abstract
BACKGROUND Severe mental disorders like Schizophrenia and related psychotic disorders (SRD) or Bipolar Disorder (BD) require pharmacological treatment for relapse prevention and quality of life improvement. Yet, treatment adherence is a challenge, partly due to patients' attitudes and beliefs towards their medication. Social media listening offers insights into patient experiences and preferences, particularly in severe mental disorders. METHODS All tweets posted between 2008 and 2022 mentioning the names of the main drugs used in SRD and BD were analyzed using advanced artificial intelligence techniques such as machine learning, and deep learning, along with natural language processing. RESULTS In this 15-year study analyzing 893,289 tweets, second generation antipsychotics received more mentions in English tweets, whereas mood stabilizers received more tweets in Spanish. English tweets about economic and legal aspects displayed negative emotions, while Spanish tweets seeking advice showed surprise. Moreover, a recurring theme in Spanish tweets was the shortage of medications, evoking feelings of anger among users. LIMITATIONS This study's analysis of Twitter data, while insightful, may not fully capture the nuances of discussions due to the platform's brevity. Additionally, the wide therapeutic use of the studied drugs, complicates the isolation of disorder-specific discourse. Only English and Spanish tweets were examined, limiting the cultural breadth of the findings. CONCLUSION This study emphasizes the importance of social media research in understanding user perceptions of SRD and BD treatments. The results provide valuable insights for clinicians when considering how patients and the general public view and communicate about these treatments in the digital environment.
Collapse
Affiliation(s)
- Juan Pablo Chart-Pascual
- Psychiatry Department, Osakidetza Basque Health Service, Araba University Hospital, Vitoria-Gasteiz, Spain; University of the Basque Country UPV/EHU, Vitoria-Gasteiz, Spain; Bioaraba Health Research Institute, Vitoria-Gasteiz, Spain; CIBERSAM.
| | - Maria Montero-Torres
- Ramón y Cajal Institute of Sanitary Research (IRYCIS), 28034 Madrid, Spain; Department of Medicine and Medical Specialities, University of Alcala, 28801 Alcala de Henares, Madrid, Spain
| | - Miguel Angel Ortega
- Cancer Registry and Pathology Department, Hospital Universitario Príncipe de Asturias, Alcalá de Henares, Spain; Ramón y Cajal Institute of Sanitary Research (IRYCIS), 28034 Madrid, Spain; Department of Medicine and Medical Specialities, University of Alcala, 28801 Alcala de Henares, Madrid, Spain
| | - Lorea Mar-Barrutia
- Psychiatry Department, Osakidetza Basque Health Service, Araba University Hospital, Vitoria-Gasteiz, Spain; Bioaraba Health Research Institute, Vitoria-Gasteiz, Spain; CIBERSAM
| | - Iñaki Zorrilla Martinez
- Psychiatry Department, Osakidetza Basque Health Service, Araba University Hospital, Vitoria-Gasteiz, Spain; University of the Basque Country UPV/EHU, Vitoria-Gasteiz, Spain; Bioaraba Health Research Institute, Vitoria-Gasteiz, Spain; CIBERSAM
| | - Melchor Alvarez-Mon
- Immune System Diseases-Rheumatology and Internal Medicine Service, Centro de Investigación Biomédica en Red Enfermedades Hepáticas y Digestivas, University Hospital Príncipe de Asturias, Alcala de Henares, Spain; Ramón y Cajal Institute of Sanitary Research (IRYCIS), 28034 Madrid, Spain; Department of Medicine and Medical Specialities, University of Alcala, 28801 Alcala de Henares, Madrid, Spain
| | - Ana Gonzalez-Pinto
- Psychiatry Department, Osakidetza Basque Health Service, Araba University Hospital, Vitoria-Gasteiz, Spain; University of the Basque Country UPV/EHU, Vitoria-Gasteiz, Spain; Bioaraba Health Research Institute, Vitoria-Gasteiz, Spain; CIBERSAM
| | - Miguel Angel Alvarez-Mon
- Ramón y Cajal Institute of Sanitary Research (IRYCIS), 28034 Madrid, Spain; Department of Medicine and Medical Specialities, University of Alcala, 28801 Alcala de Henares, Madrid, Spain; Department of Psychiatry and Mental Health, Hospital Universitario Infanta Leonor, Madrid, Spain
| |
Collapse
|
4
|
Kjell ONE, Kjell K, Schwartz HA. Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment. Psychiatry Res 2024; 333:115667. [PMID: 38290286 DOI: 10.1016/j.psychres.2023.115667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 02/01/2024]
Abstract
In this narrative review, we survey recent empirical evaluations of AI-based language assessments and present a case for the technology of large language models to be poised for changing standardized psychological assessment. Artificial intelligence has been undergoing a purported "paradigm shift" initiated by new machine learning models, large language models (e.g., BERT, LAMMA, and that behind ChatGPT). These models have led to unprecedented accuracy over most computerized language processing tasks, from web searches to automatic machine translation and question answering, while their dialogue-based forms, like ChatGPT have captured the interest of over a million users. The success of the large language model is mostly attributed to its capability to numerically represent words in their context, long a weakness of previous attempts to automate psychological assessment from language. While potential applications for automated therapy are beginning to be studied on the heels of chatGPT's success, here we present evidence that suggests, with thorough validation of targeted deployment scenarios, that AI's newest technology can move mental health assessment away from rating scales and to instead use how people naturally communicate, in language.
Collapse
Affiliation(s)
- Oscar N E Kjell
- Psychology Department, Lund University, Sweden; Computer Science Department, Stony Brook University, United States.
| | | | - H Andrew Schwartz
- Psychology Department, Lund University, Sweden; Computer Science Department, Stony Brook University, United States
| |
Collapse
|
5
|
Lu Y. Disease, Scapegoating, and Social Contexts: Examining Social Contexts of the Support for Racist Naming of COVID-19 on Twitter. JOURNAL OF HEALTH AND SOCIAL BEHAVIOR 2024; 65:75-93. [PMID: 37688490 DOI: 10.1177/00221465231194355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
In early 2020, when COVID-19 began to spread in the United States, many Twitter users called it the "Chinese virus," blaming racial outgroups for the pandemic. I collected tweets containing the "Chinese virus" derivatives posted from March to August 2020 by users within the United States and created a data set with 141,290 tweets published by 50,695 users. I calculated the ratio of users who supported the racist naming of COVID-19 per county and merged Twitter data with the county-level census. Multilevel regression models show that counties with higher COVID-19 mortality or infection rates have more support for the racist naming. Second, the mortality and infection rates effects are stronger in counties with faster minority growth. Moreover, it is mainly in poor counties that minority growth enlarges the effects of infection and mortality rates. These findings relate to the theories on disease-induced xenophobia and the debate between conflict and contact theories.
Collapse
Affiliation(s)
- Yun Lu
- South China University of Technology, Guangzhou, China
| |
Collapse
|
6
|
Giorgi S, Habib DRS, Bellew D, Sherman G, Curtis B. A linguistic analysis of dehumanization toward substance use across three decades of news articles. Front Public Health 2023; 11:1275975. [PMID: 38074754 PMCID: PMC10701530 DOI: 10.3389/fpubh.2023.1275975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 10/09/2023] [Indexed: 12/18/2023] Open
Abstract
Introduction Substances and the people who use them have been dehumanized for decades. As a result, lawmakers and healthcare providers have implemented policies that subjected millions to criminalization, incarceration, and inadequate resources to support health and wellbeing. While there have been recent shifts in public opinion on issues such as legalization, in the case of marijuana in the U.S., or addiction as a disease, dehumanization and stigma are still leading barriers for individuals seeking treatment. Integral to the narrative of "substance users" as thoughtless zombies or violent criminals is their portrayal in popular media, such as films and news. Methods This study attempts to quantify the dehumanization of people who use substances (PWUS) across time using a large corpus of over 3 million news articles. We apply a computational linguistic framework for measuring dehumanization across three decades of New York Times articles. Results We show that (1) levels of dehumanization remain high and (2) while marijuana has become less dehumanized over time, attitudes toward other substances such as heroin and cocaine remain stable. Discussion This work highlights the importance of a holistic view of substance use that places all substances within the context of addiction as a disease, prioritizes the humanization of PWUS, and centers around harm reduction.
Collapse
Affiliation(s)
- Salvatore Giorgi
- National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States
| | - Daniel Roy Sadek Habib
- National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| | - Douglas Bellew
- National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| | - Garrick Sherman
- National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| | - Brenda Curtis
- National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| |
Collapse
|
7
|
Carabot F, Fraile-Martínez O, Donat-Vargas C, Santoma J, Garcia-Montero C, Pinto da Costa M, Molina-Ruiz RM, Ortega MA, Alvarez-Mon M, Alvarez-Mon MA. Understanding Public Perceptions and Discussions on Opioids Through Twitter: Cross-Sectional Infodemiology Study. J Med Internet Res 2023; 25:e50013. [PMID: 37906234 PMCID: PMC10646670 DOI: 10.2196/50013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/24/2023] [Accepted: 09/05/2023] [Indexed: 11/02/2023] Open
Abstract
BACKGROUND Opioids are used for the treatment of refractory pain, but their inappropriate use has detrimental consequences for health. Understanding the current experiences and perceptions of patients in a spontaneous and colloquial environment regarding the key drugs involved in the opioid crisis is of utmost significance. OBJECTIVE The study aims to analyze Twitter content related to opioids, with objectives including characterizing users participating in these conversations, identifying prevalent topics and gauging public perception, assessing opinions on drug efficacy and tolerability, and detecting discussions related to drug dispensing, prescription, or acquisition. METHODS In this cross-sectional study, we gathered public tweets concerning major opioids posted in English or Spanish between January 1, 2019, and December 31, 2020. A total of 256,218 tweets were collected. Approximately 27% (69,222/256,218) were excluded. Subsequently, 7000 tweets were subjected to manual analysis based on a codebook developed by the researchers. The remaining databases underwent analysis using machine learning classifiers. In the codebook, the type of user was the initial classification domain. We differentiated between patients, family members and friends, health care professionals, and institutions. Next, a distinction was made between medical and nonmedical content. If it was medical in nature, we classified it according to whether it referred to the drug's efficacy or adverse effects. In nonmedical content tweets, we analyzed whether the content referred to management issues (eg, pharmacy dispensation, medical appointment prescriptions, commercial advertisements, or legal aspects) or the trivialization of the drug. RESULTS Among the entire array of scrutinized pharmaceuticals, fentanyl emerged as the predominant subject, featuring in 27% (39,997/148,335 posts) of the tweets. Concerning user categorization, roughly 70% (101,259/148,335) were classified as patients. Nevertheless, tweets posted by health care professionals obtained the highest number of retweets (37/16,956, 0.2% of their posts received over 100 retweets). We found statistically significant differences in the distribution concerning efficacy and side effects among distinct drug categories (P<.001). Nearly 60% (84,401/148,335) of the posts were devoted to nonmedical subjects. Within this category, legal facets and recreational use surfaced as the most prevalent themes, while in the medical discourse, efficacy constituted the most frequent topic, with over 90% (45,621/48,777) of instances characterizing it as poor or null. The opioid with the greatest proportion of tweets concerning legal considerations was fentanyl. Furthermore, fentanyl was the drug most frequently offered for sale on Twitter, while methadone generated the most tweets about pharmacy delivery. CONCLUSIONS The opioid crisis is present on social media, where tweets discuss legal and recreational use. Opioid users are the most active participants, prioritizing medication efficacy over side effects. Surprisingly, health care professionals generate the most engagement, indicating their positive reception. Authorities must monitor web-based opioid discussions to detect illicit acquisitions and recreational use.
Collapse
Affiliation(s)
- Federico Carabot
- Department of Medicine and Medical Specialities, University of Alcala, Alcala de Henares, Spain
- Ramón y Cajal Institute of Sanitary Research, Madrid, Spain
| | - Oscar Fraile-Martínez
- Ramón y Cajal Institute of Sanitary Research, Madrid, Spain
- Department of Medicine and Medical Specialities, University of Alcala, Alcala de Henares, Madrid, Spain
| | - Carolina Donat-Vargas
- Institute for Global Health, Barcelona, Spain
- Centro de Investigación Biomédica en Red | Epidemiología y Salud Pública (CIBER) Epidemiología y Salud Pública, Madrid, Spain
- Cardiovascular and Nutritional Epidemiology, Unit of Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden
| | - Javier Santoma
- Department of Medicine and Medical Specialities, University of Alcala, Alcala de Henares, Madrid, Spain
- Filament Consultancy Group, London, United Kingdom
| | - Cielo Garcia-Montero
- Ramón y Cajal Institute of Sanitary Research, Madrid, Spain
- Department of Medicine and Medical Specialities, University of Alcala, Alcala de Henares, Madrid, Spain
| | - Mariana Pinto da Costa
- South London and Maudsley NHS Foundation Trust, London, United Kingdom
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom
| | - Rosa M Molina-Ruiz
- Department of Psychiatry and Mental Health, San Carlos Clinical University Hospital, IdiSSC, Madrid, Spain
| | - Miguel A Ortega
- Department of Medicine and Medical Specialities, University of Alcala, Alcala de Henares, Spain
- Ramón y Cajal Institute of Sanitary Research, Madrid, Spain
| | - Melchor Alvarez-Mon
- Ramón y Cajal Institute of Sanitary Research, Madrid, Spain
- Department of Medicine and Medical Specialities, University of Alcala, Alcala de Henares, Madrid, Spain
- Immune System Diseases-Rheumatology and Internal Medicine Service, University Hospital Príncipe de Asturias, Centro de Investigación Biomédica en Red | Enfermedades Hepáticas y Digestivas (CIBEREHD), Alcalá de Henares, Spain
| | - Miguel Angel Alvarez-Mon
- Department of Medicine and Medical Specialities, University of Alcala, Alcala de Henares, Spain
- Ramón y Cajal Institute of Sanitary Research, Madrid, Spain
- Department of Psychiatry and Mental Health, Hospital Universitario Infanta Leonor, Madrid, Spain
| |
Collapse
|
8
|
Giorgi S, Eichstaedt JC, Preoţiuc-Pietro D, Gardner JR, Schwartz HA, Ungar LH. Filling in the white space: Spatial interpolation with Gaussian processes and social media data. CURRENT RESEARCH IN ECOLOGICAL AND SOCIAL PSYCHOLOGY 2023; 5:100159. [PMID: 38125747 PMCID: PMC10732585 DOI: 10.1016/j.cresp.2023.100159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Full national coverage below the state level is difficult to attain through survey-based data collection. Even the largest survey-based data collections, such as the CDC's Behavioral Risk Factor Surveillance System or the Gallup-Healthways Well-being Index (both with more than 300,000 responses p.a.) only allow for the estimation of annual averages for about 260 out of roughly U.S. 3,000 counties when a threshold of 300 responses per county is used. Using a relatively high threshold of 300 responses gives substantially higher convergent validity-higher correlations with health variables-than lower thresholds but covers a reduced and biased sample of the population. We present principled methods to interpolate spatial estimates and show that including large-scale geotagged social media data can increase interpolation accuracy. In this work, we focus on Gallup-reported life satisfaction, a widely-used measure of subjective well-being. We use Gaussian Processes (GP), a formal Bayesian model, to interpolate life satisfaction, which we optimally combine with estimates from low-count data. We interpolate over several spaces (geographic and socioeconomic) and extend these evaluations to the space created by variables encoding language frequencies of approximately 6 million geotagged Twitter users. We find that Twitter language use can serve as a rough aggregate measure of socioeconomic and cultural similarity, and improves upon estimates derived from a wide variety of socioeconomic, demographic, and geographic similarity measures. We show that applying Gaussian Processes to the limited Gallup data allows us to generate estimates for a much larger number of counties while maintaining the same level of convergent validity with external criteria (i.e., N = 1,133 vs. 2,954 counties). This work suggests that spatial coverage of psychological variables can be reliably extended through Bayesian techniques while maintaining out-of-sample prediction accuracy and that Twitter language adds important information about cultural similarity over and above traditional socio-demographic and geographic similarity measures. Finally, to facilitate the adoption of these methods, we have also open-sourced an online tool that researchers can freely use to interpolate their data across geographies.
Collapse
Affiliation(s)
- Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania, United States of America
| | - Johannes C. Eichstaedt
- Department of Psychology & Institute for Human-Centered AI, Stanford University, United States of America
| | | | - Jacob R. Gardner
- Department of Computer and Information Science, University of Pennsylvania, United States of America
| | - H. Andrew Schwartz
- Department of Computer Science, Stony Brook University, United States of America
| | - Lyle H. Ungar
- Department of Computer and Information Science, University of Pennsylvania, United States of America
| |
Collapse
|
9
|
Curtis B, Giorgi S, Ungar L, Vu H, Yaden D, Liu T, Yadeta K, Schwartz HA. AI-based analysis of social media language predicts addiction treatment dropout at 90 days. Neuropsychopharmacology 2023; 48:1579-1585. [PMID: 37095253 PMCID: PMC10517013 DOI: 10.1038/s41386-023-01585-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 04/03/2023] [Accepted: 04/05/2023] [Indexed: 04/26/2023]
Abstract
The reoccurrence of use (relapse) and treatment dropout is frequently observed in substance use disorder (SUD) treatment. In the current paper, we evaluated the predictive capability of an AI-based digital phenotype using the social media language of patients receiving treatment for substance use disorders (N = 269). We found that language phenotypes outperformed a standard intake psychometric assessment scale when predicting patients' 90-day treatment outcomes. We also use a modern deep learning-based AI model, Bidirectional Encoder Representations from Transformers (BERT) to generate risk scores using pre-treatment digital phenotype and intake clinic data to predict dropout probabilities. Nearly all individuals labeled as low-risk remained in treatment while those identified as high-risk dropped out (risk score for dropout AUC = 0.81; p < 0.001). The current study suggests the possibility of utilizing social media digital phenotypes as a new tool for intake risk assessment to identify individuals most at risk of treatment dropout and relapse.
Collapse
Affiliation(s)
- Brenda Curtis
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA.
| | - Salvatore Giorgi
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Lyle Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA
| | - Huy Vu
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - David Yaden
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Tingting Liu
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA
| | - Kenna Yadeta
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
10
|
Bonela AA, Nibali A, He Z, Riordan B, Anderson-Luxford D, Kuntsche E. The promise of zero-shot learning for alcohol image detection: comparison with a task-specific deep learning algorithm. Sci Rep 2023; 13:11891. [PMID: 37482586 PMCID: PMC10363523 DOI: 10.1038/s41598-023-39169-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 07/20/2023] [Indexed: 07/25/2023] Open
Abstract
Exposure to alcohol content in media increases alcohol consumption and related harm. With exponential growth of media content, it is important to use algorithms to automatically detect and quantify alcohol exposure. Foundation models such as Contrastive Language-Image Pretraining (CLIP) can detect alcohol exposure through Zero-Shot Learning (ZSL) without any additional training. In this paper, we evaluated the ZSL performance of CLIP against a supervised algorithm called Alcoholic Beverage Identification Deep Learning Algorithm Version-2 (ABIDLA2), which is specifically trained to recognise alcoholic beverages in images, across three tasks. We found ZSL achieved similar performance compared to ABIDLA2 in two out of three tasks. However, ABIDLA2 outperformed ZSL in a fine-grained classification task in which determining subtle differences among alcoholic beverages (including containers) are essential. We also found that phrase engineering is essential for improving the performance of ZSL. To conclude, like ABIDLA2, ZSL with little phrase engineering can achieve promising performance in identifying alcohol exposure in images. This makes it easier for researchers, with little or no programming background, to implement ZSL effectively to obtain insightful analytics from digital media. Such analytics can assist researchers and policy makers to propose regulations that can prevent alcohol exposure and eventually prevent alcohol consumption.
Collapse
Affiliation(s)
- Abraham Albert Bonela
- Centre for Alcohol Policy Research, La Trobe University, Melbourne, Australia.
- Computer Science and Information Technology, La Trobe University, Melbourne, Australia.
| | - Aiden Nibali
- Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Zhen He
- Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Benjamin Riordan
- Centre for Alcohol Policy Research, La Trobe University, Melbourne, Australia
| | | | - Emmanuel Kuntsche
- Centre for Alcohol Policy Research, La Trobe University, Melbourne, Australia
| |
Collapse
|
11
|
Lane JM, Habib D, Curtis B. Linguistic Methodologies to Surveil the Leading Causes of Mortality: Scoping Review of Twitter for Public Health Data. J Med Internet Res 2023; 25:e39484. [PMID: 37307062 PMCID: PMC10337472 DOI: 10.2196/39484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 01/26/2023] [Accepted: 02/07/2023] [Indexed: 02/10/2023] Open
Abstract
BACKGROUND Twitter has become a dominant source of public health data and a widely used method to investigate and understand public health-related issues internationally. By leveraging big data methodologies to mine Twitter for health-related data at the individual and community levels, scientists can use the data as a rapid and less expensive source for both epidemiological surveillance and studies on human behavior. However, limited reviews have focused on novel applications of language analyses that examine human health and behavior and the surveillance of several emerging diseases, chronic conditions, and risky behaviors. OBJECTIVE The primary focus of this scoping review was to provide a comprehensive overview of relevant studies that have used Twitter as a data source in public health research to analyze users' tweets to identify and understand physical and mental health conditions and remotely monitor the leading causes of mortality related to emerging disease epidemics, chronic diseases, and risk behaviors. METHODS A literature search strategy following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) extended guidelines for scoping reviews was used to search specific keywords on Twitter and public health on 5 databases: Web of Science, PubMed, CINAHL, PsycINFO, and Google Scholar. We reviewed the literature comprising peer-reviewed empirical research articles that included original research published in English-language journals between 2008 and 2021. Key information on Twitter data being leveraged for analyzing user language to study physical and mental health and public health surveillance was extracted. RESULTS A total of 38 articles that focused primarily on Twitter as a data source met the inclusion criteria for review. In total, two themes emerged from the literature: (1) language analysis to identify health threats and physical and mental health understandings about people and societies and (2) public health surveillance related to leading causes of mortality, primarily representing 3 categories (ie, respiratory infections, cardiovascular disease, and COVID-19). The findings suggest that Twitter language data can be mined to detect mental health conditions, disease surveillance, and death rates; identify heart-related content; show how health-related information is shared and discussed; and provide access to users' opinions and feelings. CONCLUSIONS Twitter analysis shows promise in the field of public health communication and surveillance. It may be essential to use Twitter to supplement more conventional public health surveillance approaches. Twitter can potentially fortify researchers' ability to collect data in a timely way and improve the early identification of potential health threats. Twitter can also help identify subtle signals in language for understanding physical and mental health conditions.
Collapse
Affiliation(s)
- Jamil M Lane
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Daniel Habib
- Technology and Translational Research Unit, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| | - Brenda Curtis
- Technology and Translational Research Unit, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| |
Collapse
|
12
|
Giorgi S, Yaden DB, Eichstaedt JC, Ungar LH, Schwartz HA, Kwarteng A, Curtis B. Predicting U.S. county opioid poisoning mortality from multi-modal social media and psychological self-report data. Sci Rep 2023; 13:9027. [PMID: 37270657 DOI: 10.1038/s41598-023-34468-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 04/30/2023] [Indexed: 06/05/2023] Open
Abstract
Opioid poisoning mortality is a substantial public health crisis in the United States, with opioids involved in approximately 75% of the nearly 1 million drug related deaths since 1999. Research suggests that the epidemic is driven by both over-prescribing and social and psychological determinants such as economic stability, hopelessness, and isolation. Hindering this research is a lack of measurements of these social and psychological constructs at fine-grained spatial and temporal resolutions. To address this issue, we use a multi-modal data set consisting of natural language from Twitter, psychometric self-reports of depression and well-being, and traditional area-based measures of socio-demographics and health-related risk factors. Unlike previous work using social media data, we do not rely on opioid or substance related keywords to track community poisonings. Instead, we leverage a large, open vocabulary of thousands of words in order to fully characterize communities suffering from opioid poisoning, using a sample of 1.5 billion tweets from 6 million U.S. county mapped Twitter users. Results show that Twitter language predicted opioid poisoning mortality better than factors relating to socio-demographics, access to healthcare, physical pain, and psychological well-being. Additionally, risk factors revealed by the Twitter language analysis included negative emotions, discussions of long work hours, and boredom, whereas protective factors included resilience, travel/leisure, and positive emotions, dovetailing with results from the psychometric self-report data. The results show that natural language from public social media can be used as a surveillance tool for both predicting community opioid poisonings and understanding the dynamic social and psychological nature of the epidemic.
Collapse
Affiliation(s)
- Salvatore Giorgi
- National Institute on Drug Abuse, Intramural Research Program, Baltimore, MD, USA
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - David B Yaden
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Johannes C Eichstaedt
- Department of Psychology, Stanford University, Stanford, CA, USA
- Institute for Human-Centered AI, Stanford University, Stanford, CA, USA
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Amy Kwarteng
- National Institute on Drug Abuse, Intramural Research Program, Baltimore, MD, USA
| | - Brenda Curtis
- National Institute on Drug Abuse, Intramural Research Program, Baltimore, MD, USA.
| |
Collapse
|
13
|
Matero M, Giorgi S, Curtis B, Ungar LH, Schwartz HA. Opioid death projections with AI-based forecasts using social media language. NPJ Digit Med 2023; 6:35. [PMID: 36882633 PMCID: PMC9992514 DOI: 10.1038/s41746-023-00776-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 02/13/2023] [Indexed: 03/09/2023] Open
Abstract
Targeting of location-specific aid for the U.S. opioid epidemic is difficult due to our inability to accurately predict changes in opioid mortality across heterogeneous communities. AI-based language analyses, having recently shown promise in cross-sectional (between-community) well-being assessments, may offer a way to more accurately longitudinally predict community-level overdose mortality. Here, we develop and evaluate, TROP (Transformer for Opiod Prediction), a model for community-specific trend projection that uses community-specific social media language along with past opioid-related mortality data to predict future changes in opioid-related deaths. TOP builds on recent advances in sequence modeling, namely transformer networks, to use changes in yearly language on Twitter and past mortality to project the following year's mortality rates by county. Trained over five years and evaluated over the next two years TROP demonstrated state-of-the-art accuracy in predicting future county-specific opioid trends. A model built using linear auto-regression and traditional socioeconomic data gave 7% error (MAPE) or within 2.93 deaths per 100,000 people on average; our proposed architecture was able to forecast yearly death rates with less than half that error: 3% MAPE and within 1.15 per 100,000 people.
Collapse
Affiliation(s)
- Matthew Matero
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.
| | - Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
- National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, USA
| | - Brenda Curtis
- National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, USA
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.
| |
Collapse
|
14
|
Culp F, Wu Y, Wu D, Ren Y, Raynor P, Hung P, Qiao S, Li X, Eichelberger K. Understanding Alcohol Use Discourse and Stigma Patterns in Perinatal Care on Twitter. Healthcare (Basel) 2022; 10:2375. [PMID: 36553899 PMCID: PMC9778089 DOI: 10.3390/healthcare10122375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 11/21/2022] [Accepted: 11/24/2022] [Indexed: 11/29/2022] Open
Abstract
(1) Background: perinatal alcohol use generates a variety of health risks. Social media platforms discuss fetal alcohol spectrum disorder (FASD) and other widespread outcomes, providing personalized user-generated content about the perceptions and behaviors related to alcohol use during pregnancy. Data collected from Twitter underscores various narrative structures and sentiments in tweets that reflect large-scale discourses and foster societal stigmas; (2) Methods: We extracted alcohol-related tweets from May 2019 to October 2021 using an official Twitter search API based on a set of keywords provided by our clinical team. Our exploratory study utilized thematic content analysis and inductive qualitative coding methods to analyze user content. Iterative line-by-line coding categorized dynamic descriptive themes from a random sample of 500 tweets; (3) Results: qualitative methods from content analysis revealed underlying patterns among inter-user engagements, outlining individual, interpersonal and population-level stigmas about perinatal alcohol use and negative sentiment towards drinking mothers. As a result, the overall silence surrounding personal experiences with alcohol use during pregnancy suggests an unwillingness and sense of reluctancy from pregnant adults to leverage the platform for support and assistance due to societal stigmas; (4) Conclusions: identifying these discursive factors will facilitate more effective public health programs that take into account specific challenges related to social media networks and develop prevention strategies to help Twitter users struggling with perinatal alcohol use.
Collapse
Affiliation(s)
- Fritz Culp
- College of Engineering and Computing, University of South Carolina, Columbia, SC 29208, USA
| | - Yuqi Wu
- Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Dezhi Wu
- College of Engineering and Computing, University of South Carolina, Columbia, SC 29208, USA
| | - Yang Ren
- College of Engineering and Computing, University of South Carolina, Columbia, SC 29208, USA
| | - Phyllis Raynor
- College of Nursing, University of South Carolina, Columbia, SC 29208, USA
| | - Peiyin Hung
- Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Shan Qiao
- Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Xiaoming Li
- Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Kacey Eichelberger
- Prisma Health Upstate, University of South Carolina School of Medicine Greenville, Greensville, SC 29605, USA
| |
Collapse
|
15
|
Stone JA, Ryerson NC. Tweeting about alcohol: Exploring differences in Twitter sentiment during the onset of the COVID-19 pandemic. PLoS One 2022; 17:e0276863. [PMID: 36327323 PMCID: PMC9632796 DOI: 10.1371/journal.pone.0276863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 10/16/2022] [Indexed: 11/06/2022] Open
Abstract
This study explores pandemic-related changes in Twitter communication by examining differences in emotional, psychological and social sentiment between alcohol-related tweets and a random sample of non-alcohol tweets during the onset of the COVID-19 pandemic. Two equivalent size sets of English-language, COVID-specific tweets posted between February 1st and April 20th, 2020 are examined. The first set includes 1.5 million tweets containing alcohol-related keywords, while the second set does not contain such references. LIWC software analyzed the tweets for sentiment factors. ANCOVAs were used to determine whether language use significantly differed between the sets, considering differences in the pandemic period (before or after the pandemic declaration) while controlling for the number of tweets. The study found that tweets in the 40 days after March 11, 2020 contained more authentic language, more affiliation-oriented language, and exhibited more positive emotion than tweets in the 40 days pre-declaration. Alcohol-related status was a significant factor only when tweets contained personal concerns, regardless of pandemic period. Authenticity levels increased significantly in alcohol-related tweets post-declaration. The findings suggest alcohol may play a lesser role in the expression of psychological, social, and emotional sentiment than the pandemic period, but interaction between authentic language and alcohol references may reflect an increased use of alcohol for coping.
Collapse
Affiliation(s)
- Jeffrey A. Stone
- Department of Information Sciences and Technology, Penn State University, Center Valley, PA, United States of America
- * E-mail:
| | - Nicole C. Ryerson
- Department of Psychology, Penn State University, Center Valley, PA, United States of America
| |
Collapse
|
16
|
Deng T, Barman-Adhikari A, Lee YJ, Dewri R, Bender K. Substance use and sentiment and topical tendencies: a study using social media conversations of youth experiencing homelessness. INFORMATION TECHNOLOGY & PEOPLE 2022. [DOI: 10.1108/itp-12-2020-0860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThis study investigates associations between Facebook (FB) conversations and self-reports of substance use among youth experiencing homelessness (YEH). YEH engage in high rates of substance use and are often difficult to reach, for both research and interventions. Social media sites provide rich digital trace data for observing the social context of YEH's health behaviors. The authors aim to investigate the feasibility of using these big data and text mining techniques as a supplement to self-report surveys in detecting and understanding YEH attitudes and engagement in substance use.Design/methodology/approachParticipants took a self-report survey in addition to providing consent for researchers to download their Facebook feed data retrospectively. The authors collected survey responses from 92 participants and retrieved 33,204 textual Facebook conversations. The authors performed text mining analysis and statistical analysis including ANOVA and logistic regression to examine the relationship between YEH's Facebook conversations and their substance use.FindingsFacebook posts of YEH have a moderately positive sentiment. YEH substance users and non-users differed in their Facebook posts regarding: (1) overall sentiment and (2) topics discussed. Logistic regressions show that more positive sentiment in a respondent's FB conversation suggests a lower likelihood of marijuana usage. On the other hand, discussing money-related topics in the conversation increases YEH's likelihood of marijuana use.Originality/valueDigital trace data on social media sites represent a vast source of ecological data. This study demonstrates the feasibility of using such data from a hard-to-reach population to gain unique insights into YEH's health behaviors. The authors provide a text-mining-based toolkit for analyzing social media data for interpretation by experts from a variety of domains.
Collapse
|
17
|
Giorgi S, Lynn VE, Gupta K, Ahmed F, Matz S, Ungar LH, Schwartz HA. Correcting Sociodemographic Selection Biases for Population Prediction from Social Media. PROCEEDINGS OF THE ... INTERNATIONAL AAAI CONFERENCE ON WEBLOGS AND SOCIAL MEDIA. INTERNATIONAL AAAI CONFERENCE ON WEBLOGS AND SOCIAL MEDIA 2022; 16:228-240. [PMID: 36467573 PMCID: PMC9714525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Social media is increasingly used for large-scale population predictions, such as estimating community health statistics. However, social media users are not typically a representative sample of the intended population - a "selection bias". Within the social sciences, such a bias is typically addressed with restratification techniques, where observations are reweighted according to how under- or over-sampled their socio-demographic groups are. Yet, restratifaction is rarely evaluated for improving prediction. In this two-part study, we first evaluate standard, "out-of-the-box" restratification techniques, finding they provide no improvement and often even degraded prediction accuracies across four tasks of esimating U.S. county population health statistics from Twitter. The core reasons for degraded performance seem to be tied to their reliance on either sparse or shrunken estimates of each population's socio-demographics. In the second part of our study, we develop and evaluate Robust Poststratification, which consists of three methods to address these problems: (1) estimator redistribution to account for shrinking, as well as (2) adaptive binning and (3) informed smoothing to handle sparse socio-demographic estimates. We show that each of these methods leads to significant improvement in prediction accuracies over the standard restratification approaches. Taken together, Robust Poststratification enables state-of-the-art prediction accuracies, yielding a 53.0% increase in variance explained (R 2) in the case of surveyed life satisfaction, and a 17.8% average increase across all tasks.
Collapse
|
18
|
Jose R, Matero M, Sherman G, Curtis B, Giorgi S, Schwartz HA, Ungar LH. Using Facebook language to predict and describe excessive alcohol use. Alcohol Clin Exp Res 2022; 46:836-847. [PMID: 35575955 PMCID: PMC9179895 DOI: 10.1111/acer.14807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 02/10/2022] [Accepted: 03/10/2022] [Indexed: 11/28/2022]
Abstract
BACKGROUND Assessing risk for excessive alcohol use is important for applications ranging from recruitment into research studies to targeted public health messaging. Social media language provides an ecologically embedded source of information for assessing individuals who may be at risk for harmful drinking. METHODS Using data collected on 3664 respondents from the general population, we examine how accurately language used on social media classifies individuals as at-risk for alcohol problems based on Alcohol Use Disorder Identification Test-Consumption score benchmarks. RESULTS We find that social media language is moderately accurate (area under the curve = 0.75) at identifying individuals at risk for alcohol problems (i.e., hazardous drinking/alcohol use disorders) when used with models based on contextual word embeddings. High-risk alcohol use was predicted by individuals' usage of words related to alcohol, partying, informal expressions, swearing, and anger. Low-risk alcohol use was predicted by individuals' usage of social, affiliative, and faith-based words. CONCLUSIONS The use of social media data to study drinking behavior in the general public is promising and could eventually support primary and secondary prevention efforts among Americans whose at-risk drinking may have otherwise gone "under the radar."
Collapse
Affiliation(s)
- Rupa Jose
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Matthew Matero
- Department of Computer Science, Stony Brook University, Stony Brook, New York, USA
| | - Garrick Sherman
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Brenda Curtis
- Technology and Translational Research Unit, National Institute on Drug Abuse, Baltimore, Maryland, USA
| | - Salvatore Giorgi
- Technology and Translational Research Unit, National Institute on Drug Abuse, Baltimore, Maryland, USA.,Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | | | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Department of Psychology, Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
19
|
Kelley SW, Mhaonaigh CN, Burke L, Whelan R, Gillan CM. Machine learning of language use on Twitter reveals weak and non-specific predictions. NPJ Digit Med 2022; 5:35. [PMID: 35338248 PMCID: PMC8956571 DOI: 10.1038/s41746-022-00576-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 02/11/2022] [Indexed: 11/30/2022] Open
Abstract
Depressed individuals use language differently than healthy controls and it has been proposed that social media posts can be used to identify depression. Much of the evidence behind this claim relies on indirect measures of mental health and few studies have tested if these language features are specific to depression versus other aspects of mental health. We analysed the Tweets of 1006 participants who completed questionnaires assessing symptoms of depression and 8 other mental health conditions. Daily Tweets were subjected to textual analysis and the resulting linguistic features were used to train an Elastic Net model on depression severity, using nested cross-validation. We then tested performance in a held-out test set (30%), comparing predictions of depression versus 8 other aspects of mental health. The depression trained model had modest out-of-sample predictive performance, explaining 2.5% of variance in depression symptoms (R2 = 0.025, r = 0.16). The performance of this model was as-good or superior when used to identify other aspects of mental health: schizotypy, social anxiety, eating disorders, generalised anxiety, above chance for obsessive-compulsive disorder, apathy, but not significant for alcohol abuse or impulsivity. Machine learning analysis of social media data, when trained on well-validated clinical instruments, could not make meaningful individualised predictions regarding users’ mental health. Furthermore, language use associated with depression was non-specific, having similar performance in predicting other mental health problems.
Collapse
Affiliation(s)
- Sean W Kelley
- School of Psychology, Trinity College Dublin, Dublin, Ireland. .,Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland.
| | | | - Louise Burke
- School of Psychology, Trinity College Dublin, Dublin, Ireland
| | - Robert Whelan
- School of Psychology, Trinity College Dublin, Dublin, Ireland.,Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland.,Global Brain Health Institute, Trinity College Dublin, Dublin, Ireland
| | - Claire M Gillan
- School of Psychology, Trinity College Dublin, Dublin, Ireland.,Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland.,Global Brain Health Institute, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
20
|
Amgalan A, Mujica-Parodi LR, Skiena SS. Fast spatial autocorrelation. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-021-01640-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
21
|
Natural language analyzed with AI-based transformers predict traditional subjective well-being measures approaching the theoretical upper limits in accuracy. Sci Rep 2022; 12:3918. [PMID: 35273198 PMCID: PMC8913644 DOI: 10.1038/s41598-022-07520-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 02/21/2022] [Indexed: 01/07/2023] Open
Abstract
We show that using a recent break-through in artificial intelligence –transformers–, psychological assessments from text-responses can approach theoretical upper limits in accuracy, converging with standard psychological rating scales. Text-responses use people's primary form of communication –natural language– and have been suggested as a more ecologically-valid response format than closed-ended rating scales that dominate social science. However, previous language analysis techniques left a gap between how accurately they converged with standard rating scales and how well ratings scales converge with themselves – a theoretical upper-limit in accuracy. Most recently, AI-based language analysis has gone through a transformation as nearly all of its applications, from Web search to personalized assistants (e.g., Alexa and Siri), have shown unprecedented improvement by using transformers. We evaluate transformers for estimating psychological well-being from questionnaire text- and descriptive word-responses, and find accuracies converging with rating scales that approach the theoretical upper limits (Pearson r = 0.85, p < 0.001, N = 608; in line with most metrics of rating scale reliability). These findings suggest an avenue for modernizing the ubiquitous questionnaire and ultimately opening doors to a greater understanding of the human condition.
Collapse
|
22
|
Riordan BC, Merrill JE, Ward RM, Raubenheimer J. When are alcohol-related blackout Tweets written in the United States? Addict Behav 2022; 124:107110. [PMID: 34530209 DOI: 10.1016/j.addbeh.2021.107110] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 07/14/2021] [Accepted: 09/01/2021] [Indexed: 11/16/2022]
Abstract
BACKGROUND Alcohol use varies throughout the year and often peaks on weekends or during celebrations (e.g., New Year's). There is not a perfect correlation between alcohol use and negative consequences, and the extent to which one particularly risky consequence-an alcohol-related blackout-is more common during certain times of the year is unknown. Identifying when blackouts occur may help identify which periods are associated with more risk and be critical in designing public health campaigns. Thus, we examined Twitter data to ascertain whether alcohol-related blackouts occur more during certain holidays/celebrations than typical weekends and whether they differed in timing from general alcohol-related Tweets. METHODS We used a Twitter-sponsored platform to access unique Tweets written in the United States referencing blackouts (e.g., "blackout") and alcohol generally (e.g., "drunk"). RESULTS The final dataset included 3.5 million blackout Tweets and 591 million alcohol Tweets (written between 2009 and 2020). Both blackout and alcohol Tweets were written in the late evening, on weekends, and during certain holidays (New Years, St. Patrick's). However, relative to typical weekends, only blackout Tweets were more common during Thanksgiving and only general alcohol-related Tweets were more common during Cinco de Mayo. CONCLUSION While blackout and alcohol-related Tweets were similar in time of day (peaking in the evening) and day of week (peaking on weekends), they differed during certain celebrations/holidays, suggesting that while alcohol use may be more common during some celebrations, others are more associated with serious harms.
Collapse
Affiliation(s)
- Benjamin C Riordan
- Centre for Alcohol Policy Research, La Trobe University, Melbourne, Victoria, Australia.
| | - Jennifer E Merrill
- Department of Behavioral and Social Sciences, Center for Alcohol and Addiction Studies, Brown University School of Public Health, Providence, RI, USA
| | - Rose Marie Ward
- Department of Kinesiology and Health, Miami University, Oxford, OH, USA
| | - Jacques Raubenheimer
- The University of Sydney, Biomedical Informatics and Digital Health: School of Medical Sciences, Sydney, New South Wales, Australia
| |
Collapse
|
23
|
Sachini E, Sioumalas- Christodoulou K, Bouras N, Karampekios N. Lessons for science and technology policy? Probing the Linkedin network of an RDI organisation. SN SOCIAL SCIENCES 2022; 2:271. [PMCID: PMC9734916 DOI: 10.1007/s43545-022-00586-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 11/30/2022] [Indexed: 12/14/2022]
Abstract
In this paper, we seek to examine the network of the Greek National Documentation Centre (EKT) as formed by its LinkedIn followers. By applying specific data collection and processing techniques, we explore the network of all the individuals that follow EKT’s LinkedIn page. Significant manual and automatic approaches have been implemented with regard to data extraction, data curation and data homogenization. The aim is to identify the network’s advancement over time, the institutions involved and the countries. The timeframe of the study spans from when the relevant LinkedIn page was constructed in 2015 to 2020. Findings indicate that there is a steady increase in the number of new followers, peaking in 2020. On an international scale, the evolution of the network of followers is imprinted and distributed in worldwide maps. In total, 68 countries have followed EKT over the examined time period. Also, in terms of followers’ institutional sector the Business Sector (BES) stands out (46.5%). Higher Education (HES) and Government Sector (GOV) are associated with 26.4 and 22.2% of the followers, respectively. Lastly, this paper provides a first institutional and country-level mapping of who constitutes the organisation’s interlocutors in the national and global RDI ecosystem.
Collapse
Affiliation(s)
- Evi Sachini
- grid.22459.380000 0001 2232 6894National Documentation Centre, 48 Vas. Konstantinou Str., 11635 Athens, Greece
| | - Konstantinos Sioumalas- Christodoulou
- grid.22459.380000 0001 2232 6894National Documentation Centre, 48 Vas. Konstantinou Str., 11635 Athens, Greece ,grid.5216.00000 0001 2155 0800Department of History and Philosophy of Science, National and Kapodistrian University of Athens, Athens, Greece
| | - Nikias Bouras
- grid.22459.380000 0001 2232 6894National Documentation Centre, 48 Vas. Konstantinou Str., 11635 Athens, Greece
| | - Nikolaos Karampekios
- grid.22459.380000 0001 2232 6894National Documentation Centre, 48 Vas. Konstantinou Str., 11635 Athens, Greece
| |
Collapse
|
24
|
Giorgi S, Nguyen KL, Eichstaedt JC, Kern ML, Yaden DB, Kosinski M, Seligman MEP, Ungar LH, Schwartz HA, Park G. Regional personality assessment through social media language. J Pers 2021; 90:405-425. [PMID: 34536229 DOI: 10.1111/jopy.12674] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 08/26/2021] [Accepted: 09/12/2021] [Indexed: 11/30/2022]
Abstract
OBJECTIVE We explore the personality of counties as assessed through linguistic patterns on social media. Such studies were previously limited by the cost and feasibility of large-scale surveys; however, language-based computational models applied to large social media datasets now allow for large-scale personality assessment. METHOD We applied a language-based assessment of the five factor model of personality to 6,064,267 U.S. Twitter users. We aggregated the Twitter-based personality scores to 2,041 counties and compared to political, economic, social, and health outcomes measured through surveys and by government agencies. RESULTS There was significant personality variation across counties. Openness to experience was higher on the coasts, conscientiousness was uniformly spread, extraversion was higher in southern states, agreeableness was higher in western states, and emotional stability was highest in the south. Across 13 outcomes, language-based personality estimates replicated patterns that have been observed in individual-level and geographic studies. This includes higher Republican vote share in less agreeable counties and increased life satisfaction in more conscientious counties. CONCLUSIONS Results suggest that regions vary in their personality and that these differences can be studied through computational linguistic analysis of social media. Furthermore, these methods may be used to explore other psychological constructs across geographies.
Collapse
Affiliation(s)
- Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Khoa Le Nguyen
- Department Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Johannes C Eichstaedt
- Department of Psychology, Institute for Human-Centered A.I., Stanford University, Stanford, California, USA
| | - Margaret L Kern
- Melbourne Graduate School of Education, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Yaden
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Michal Kosinski
- Graduate School of Business, Stanford University, Stanford, California, USA
| | - Martin E P Seligman
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, New York, USA
| | - Gregory Park
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
25
|
Ricard BJ, Hassanpour S. Deep Learning for Identification of Alcohol-Related Content on Social Media (Reddit and Twitter): Exploratory Analysis of Alcohol-Related Outcomes. J Med Internet Res 2021; 23:e27314. [PMID: 34524095 PMCID: PMC8482254 DOI: 10.2196/27314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 03/30/2021] [Accepted: 08/01/2021] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Many social media studies have explored the ability of thematic structures, such as hashtags and subreddits, to identify information related to a wide variety of mental health disorders. However, studies and models trained on specific themed communities are often difficult to apply to different social media platforms and related outcomes. A deep learning framework using thematic structures from Reddit and Twitter can have distinct advantages for studying alcohol abuse, particularly among the youth in the United States. OBJECTIVE This study proposes a new deep learning pipeline that uses thematic structures to identify alcohol-related content across different platforms. We apply our method on Twitter to determine the association of the prevalence of alcohol-related tweets with alcohol-related outcomes reported from the National Institute of Alcoholism and Alcohol Abuse, Centers for Disease Control Behavioral Risk Factor Surveillance System, county health rankings, and the National Industry Classification System. METHODS The Bidirectional Encoder Representations From Transformers neural network learned to classify 1,302,524 Reddit posts as either alcohol-related or control subreddits. The trained model identified 24 alcohol-related hashtags from an unlabeled data set of 843,769 random tweets. Querying alcohol-related hashtags identified 25,558,846 alcohol-related tweets, including 790,544 location-specific (geotagged) tweets. We calculated the correlation between the prevalence of alcohol-related tweets and alcohol-related outcomes, controlling for confounding effects of age, sex, income, education, and self-reported race, as recorded by the 2013-2018 American Community Survey. RESULTS Significant associations were observed: between alcohol-hashtagged tweets and alcohol consumption (P=.01) and heavy drinking (P=.005) but not binge drinking (P=.37), self-reported at the metropolitan-micropolitan statistical area level; between alcohol-hashtagged tweets and self-reported excessive drinking behavior (P=.03) but not motor vehicle fatalities involving alcohol (P=.21); between alcohol-hashtagged tweets and the number of breweries (P<.001), wineries (P<.001), and beer, wine, and liquor stores (P<.001) but not drinking places (P=.23), per capita at the US county and county-equivalent level; and between alcohol-hashtagged tweets and all gallons of ethanol consumed (P<.001), as well as ethanol consumed from wine (P<.001) and liquor (P=.01) sources but not beer (P=.63), at the US state level. CONCLUSIONS Here, we present a novel natural language processing pipeline developed using Reddit's alcohol-related subreddits that identify highly specific alcohol-related Twitter hashtags. The prevalence of identified hashtags contains interpretable information about alcohol consumption at both coarse (eg, US state) and fine-grained (eg, metropolitan-micropolitan statistical area level and county) geographical designations. This approach can expand research and deep learning interventions on alcohol abuse and other behavioral health outcomes.
Collapse
Affiliation(s)
| | - Saeed Hassanpour
- Department of Biomedical Data Science, Dartmouth College, Lebanon, NH, United States
- Department of Epidemiology, Dartmouth College, Hanover, NH, United States
- Department of Computer Science, Dartmouth College, Hanover, NH, United States
| |
Collapse
|
26
|
Leon DA, Yom-Tov E, Johnson AM, Petticrew M, Williamson E, Lampos V, Cox I. What on-line searches tell us about public interest and potential impact on behaviour in response to minimum unit pricing of alcohol in Scotland. Addiction 2021; 116:2008-2015. [PMID: 33394517 DOI: 10.1111/add.15388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 09/02/2020] [Accepted: 12/23/2020] [Indexed: 11/27/2022]
Abstract
AIMS To investigate whether the introduction of minimum unit pricing (MUP) in Scotland on 1 May 2018 was reflected in changes in the likelihood of alcohol-related queries submitted to an internet search engine, and in particular whether there was any evidence of increased interest in purchasing of alcohol from outside Scotland. DESIGN Observational study in which individual queries to the internet Bing search engine for 2018 in Scotland and England were captured and analysed. Fluctuations over time in the likelihood of specific topic searches were examined. The patterns seen in Scotland were contrasted with those in England. SETTING Scotland and England. PARTICIPANTS People who used the Bing search engine during 2018. MEASUREMENTS Numbers of daily queries submitted to Bing in 2018 on eight alcohol-related topics expressed as a proportion of queries on that day on any topic. These daily likelihoods were smoothed using a 14-day moving average for Scotland and England separately. FINDINGS There were substantial peaks in queries about MUP itself, cheap sources of alcohol and online alcohol outlets at the time of introduction of MUP in May 2018 in Scotland, but not England. These were relatively short-lived. Queries related to intoxication and alcohol problems did not show a MUP peak, but were appreciably higher in Scotland than in England throughout 2018. CONCLUSIONS Analysis of internet search engine queries appears to show that a fraction of people in Scotland may have considered circumventing minimum unit pricing in 2018 by looking for on-line alcohol retailers. The overall higher levels of queries related to alcohol problems in Scotland compared with England mirrors the corresponding differences in alcohol consumption and harms between the countries.
Collapse
Affiliation(s)
- David A Leon
- London School of Hygiene and Tropical Medicine, London, UK.,Department of Community Medicine, UiT, Arctic University of Norway, Tromsø, Norway.,International Laboratory for Population and Health, National Research University Higher School of Economics, Moscow, Russia
| | | | - Anne M Johnson
- Institute of Global Health, University College London, London, UK
| | - Mark Petticrew
- London School of Hygiene and Tropical Medicine, London, UK
| | | | - Vasileios Lampos
- Department of Computer Science, University College London, London, UK
| | - Ingemar Cox
- Department of Computer Science, University College London, London, UK.,Centre for Communication and Computing, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
27
|
Riordan BC, Winter DT, Haber PS, Day CA, Morley KC. What are people saying on social networking sites about the Australian alcohol consumption guidelines? Med J Aust 2021; 214:105-107.e1. [PMID: 33429457 DOI: 10.5694/mja2.50902] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
28
|
Riordan BC, Raubenheimer J, Ward RM, Merrill JE, Winter T, Scarf D. Monitoring the sentiment of cannabis-related tweets in the lead up to New Zealand's cannabis referendum. Drug Alcohol Rev 2020; 40:835-841. [PMID: 33022132 DOI: 10.1111/dar.13184] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 09/02/2020] [Accepted: 09/07/2020] [Indexed: 11/30/2022]
Abstract
INTRODUCTION AND AIMS In October 2020, New Zealanders will vote on whether cannabis should be legalised for recreational use. With this in mind, the aim of the present study is to gauge the views and opinions of the New Zealand population on cannabis via tweets. To achieve this, we conducted a sentiment analysis of all historic cannabis-related tweets and referendum-specific tweets written in New Zealand. DESIGN AND METHODS We used a Twitter-sponsored commercial platform to access all historic cannabis-related tweets written in New Zealand and used search terms to remove non-cannabis-related terms. Next, we used the platform's machine learning function to code the sentiment of tweets (i.e. positive/pro-cannabis, negative/anti-cannabis or neutral). RESULTS Between July 2009 and August 2020, 304 760 cannabis-related tweets were written in New Zealand. Overall, the tweets were predominantly positive (62.0%) and there was a higher proportion of positive tweets written in 2020 (65.3%) compared to negative or neutral tweets. Similarly, for referendum-specific tweets, the 2020 data reveal a generally positive view of cannabis (53.5%). DISCUSSION AND CONCLUSIONS Both cannabis-related, and referendum-specific tweets, suggest that Twitter users in New Zealand have a generally positive view of cannabis. Given the nature of Twitter, the current method will allow us to study whether views toward cannabis change as the referendum nears and capture any late swings in pro- or anti-cannabis sentiment (abcd-lab.shinyapps.io/cannabis_sentiment/).
Collapse
Affiliation(s)
- Benjamin C Riordan
- Center for Alcohol Policy Research, La Trobe University, Melbourne, Australia.,Discipline of Addiction Medicine, Central Clinical School, Faculty of Medicine and Health, University of Sydney, Sydney, Australia
| | - Jacques Raubenheimer
- Biomedical Informatics and Digital Health, School Medical Sciences, University of Sydney, Sydney, Australia
| | - Rose Marie Ward
- Department of Kinesiology and Health, Miami University, Oxford, USA
| | - Jennifer E Merrill
- Department of Behavioral and Social Sciences, Center for Alcohol and Addiction Studies, Brown University School of Public Health, Providence, USA
| | - Taylor Winter
- Department of Psychology, University of Victoria Wellington, Wellington, New Zealand
| | - Damian Scarf
- Department of Psychology, University of Otago, Otago, New Zealand
| |
Collapse
|
29
|
Giorgi S, Yaden DB, Eichstaedt JC, Ashford RD, Buffone AE, Schwartz HA, Ungar LH, Curtis B. Cultural Differences in Tweeting about Drinking Across the US. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17041125. [PMID: 32053866 PMCID: PMC7068559 DOI: 10.3390/ijerph17041125] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 02/06/2020] [Accepted: 02/08/2020] [Indexed: 11/16/2022]
Abstract
Excessive alcohol use in the US contributes to over 88,000 deaths per year and costs over $250 billion annually. While previous studies have shown that excessive alcohol use can be detected from general patterns of social media engagement, we characterized how drinking-specific language varies across regions and cultures in the US. From a database of 38 billion public tweets, we selected those mentioning “drunk”, found the words and phrases distinctive of drinking posts, and then clustered these into topics and sets of semantically related words. We identified geolocated “drunk” tweets and correlated their language with the prevalence of self-reported excessive alcohol consumption (Behavioral Risk Factor Surveillance System; BRFSS). We then identified linguistic markers associated with excessive drinking in different regions and cultural communities as identified by the American Community Project. “Drunk” tweet frequency (of the 3.3 million geolocated “drunk” tweets) correlated with excessive alcohol consumption at both the county and state levels (r = 0.26 and 0.45, respectively, p < 0.01). Topic analyses revealed that excessive alcohol consumption was most correlated with references to drinking with friends (r = 0.20), family (r = 0.15), and driving under the influence (r = 0.14). Using the American Community Project classification, we found a number of cultural markers of drinking: religious communities had a high frequency of anti-drunk driving tweets, Hispanic centers discussed family members drinking, and college towns discussed sexual behavior. This study shows that Twitter can be used to explore the specific sociocultural contexts in which excessive alcohol use occurs within particular regions and communities. These findings can inform more targeted public health messaging and help to better understand cultural determinants of substance abuse.
Collapse
Affiliation(s)
- Salvatore Giorgi
- Computer and Information Science Department, University of Pennsylvania, Philadelphia, PA 19104, USA; (S.G.); (L.H.U.)
- National Institutes of Health, National Institute on Drug Abuse, Bethesda, MD 20892, USA
| | - David B. Yaden
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA; (D.B.Y.)
| | - Johannes C. Eichstaedt
- Department of Psychology & Institute for Human-Centered Artificial Intelligence, Stanford University, Stanford, CA 94305, USA;
| | - Robert D. Ashford
- Substance Use Disorders Institute, University of the Sciences, Philadelphia, PA 19104, USA;
| | - Anneke E.K. Buffone
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104, USA; (D.B.Y.)
| | - H. Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA;
| | - Lyle H. Ungar
- Computer and Information Science Department, University of Pennsylvania, Philadelphia, PA 19104, USA; (S.G.); (L.H.U.)
| | - Brenda Curtis
- National Institutes of Health, National Institute on Drug Abuse, Bethesda, MD 20892, USA
- Correspondence:
| |
Collapse
|
30
|
Exploring the association between problem drinking and language use on Facebook in young adults. Heliyon 2019; 5:e02523. [PMID: 31667380 PMCID: PMC6812202 DOI: 10.1016/j.heliyon.2019.e02523] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 08/26/2019] [Accepted: 09/23/2019] [Indexed: 11/21/2022] Open
Abstract
Recent literature suggests that variations in both formal and content aspects of texts shared on social media tend to reflect user-level differences in demographic, psychosocial, and behavioral characteristics. In the present study, we examined associations between language use on Facebook and problematic alcohol use. We collected texts shared on Facebook by a sample of 296 adult social media users (66.9% females; mean age = 28.44 years (SD = 7.38)). Texts were mined using the closed-vocabulary approach based on the Linguistic Inquiry Word Count (LIWC) semantic dictionary, and an open-vocabulary approach performed via Latent Dirichlet Allocation (LDA). Then, we examined associations between emerging textual features and alcohol-drinking scores as assessed using the AUDIT-C questionnaire. As a final aim, we employed the Random Forest machine-learning algorithm to determine and compare the predictive accuracy of closed- and open-vocabulary features over users' AUDIT-C scores. We found use of words about family, school, and positive feelings and emotions to be negatively associated with alcohol use and problematic drinking, while words suggesting interest in sport events, politics and economics, nightlife, and use of coarse language were more frequent among problematic drinkers. Results coming from LIWC and LDA analyses were quite similar, but LDA added information that could not be retrieved only with LIWC analysis. Furthermore, open-vocabulary features outperformed closed-vocabulary features in terms of predictive power over participants’ AUDIT-C scores (r = .46 vs. r = .28, respectively). Emerging relationships between text features and offline behaviors may have important implications for alcohol screening purposes in the online environment.
Collapse
|
31
|
Conway M, Hu M, Chapman WW. Recent Advances in Using Natural Language Processing to Address Public Health Research Questions Using Social Media and ConsumerGenerated Data. Yearb Med Inform 2019; 28:208-217. [PMID: 31419834 PMCID: PMC6697505 DOI: 10.1055/s-0039-1677918] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
OBJECTIVE We present a narrative review of recent work on the utilisation of Natural Language Processing (NLP) for the analysis of social media (including online health communities) specifically for public health applications. METHODS We conducted a literature review of NLP research that utilised social media or online consumer-generated text for public health applications, focussing on the years 2016 to 2018. Papers were identified in several ways, including PubMed searches and the inspection of recent conference proceedings from the Association of Computational Linguistics (ACL), the Conference on Human Factors in Computing Systems (CHI), and the International AAAI (Association for the Advancement of Artificial Intelligence) Conference on Web and Social Media (ICWSM). Popular data sources included Twitter, Reddit, various online health communities, and Facebook. RESULTS In the recent past, communicable diseases (e.g., influenza, dengue) have been the focus of much social media-based NLP health research. However, mental health and substance use and abuse (including the use of tobacco, alcohol, marijuana, and opioids) have been the subject of an increasing volume of research in the 2016 - 2018 period. Associated with this trend, the use of lexicon-based methods remains popular given the availability of psychologically validated lexical resources suitable for mental health and substance abuse research. Finally, we found that in the period under review "modern" machine learning methods (i.e. deep neural-network-based methods), while increasing in popularity, remain less widely used than "classical" machine learning methods.
Collapse
Affiliation(s)
- Mike Conway
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States
| | - Mengke Hu
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States
| | - Wendy W Chapman
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States
| |
Collapse
|
32
|
Riordan BC, Merrill JE, Ward RM. "Can't Wait to Blackout Tonight": An Analysis of the Motives to Drink to Blackout Expressed on Twitter. Alcohol Clin Exp Res 2019; 43:1769-1776. [PMID: 31373703 PMCID: PMC6684310 DOI: 10.1111/acer.14132] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 06/09/2019] [Indexed: 12/31/2022]
Abstract
BACKGROUND Alcohol-related blackouts are associated with a range of negative consequences and are common among social drinkers. Discussing alcohol use on social networking platforms (e.g., Twitter) is common and related to higher alcohol consumption levels. Due to the widespread nature of alcohol-related social networking posts and alcohol-related blackouts, we examined the content of alcohol-related blackouts posts/"Tweets" on Twitter, with a focus on intentions to blackout and specific motivations for blacking out. METHODS A set of Tweets containing "blackout," "blackout," "blacking out," "blacked out," or "blacks out" were collected from April 26, 2018, and April 29, 2018. Using NVivo software, we coded all preblackout Tweets (i.e., before the blackout experience) for intentions and motives to blackout. RESULTS Most Tweets that we collected expressed an intention to blackout and these intentions ranged in strength (i.e., will blackout vs. might blackout). With respect to specific motives for blacking out, celebration motives were identified. For example, Tweets addressed blacking out to celebrate one's birthday, someone else's birthday, a school or work accomplishment, a sports win, during a vacation, or a holiday. Another endorsed motive for blacking out was loss or coping motives. For example, the Tweets commented on blacking out to deal with stress or a bad day. CONCLUSION Our findings suggest that Twitter users express intentions to blackout due to celebration or coping reasons. Given the consequences associated with blackout drinking, future research should consider the link between blackout intentions, blackout motives, and alcohol-related harm.
Collapse
Affiliation(s)
- Benjamin C. Riordan
- Discipline of Addiction Medicine, Central Clinical School, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - Jennifer E. Merrill
- Department of Behavioral and Social Sciences, Center for Alcohol and Addiction Studies, Brown University School of Public Health, Providence, Rhode Island, U.S.A
| | - Rose Marie Ward
- Department of Kinesiology and Health, Miami University, Oxford, Ohio, U.S.A
| |
Collapse
|
33
|
Curtis BL, Ashford RD, Magnuson KI, Ryan-Pettes SR. Comparison of Smartphone Ownership, Social Media Use, and Willingness to Use Digital Interventions Between Generation Z and Millennials in the Treatment of Substance Use: Cross-Sectional Questionnaire Study. J Med Internet Res 2019; 21:e13050. [PMID: 30994464 PMCID: PMC6492066 DOI: 10.2196/13050] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Revised: 02/11/2019] [Accepted: 02/17/2019] [Indexed: 01/05/2023] Open
Abstract
Background Problematic substance use in adolescence and emerging adulthood is a significant public health concern in the United States due to high recurrence of use rates and unmet treatment needs coupled with increased use. Consequently, there is a need for both improved service utilization and availability of recovery supports. Given the ubiquitous use of the internet and social media via smartphones, a viable option is to design digital treatments and recovery support services to include internet and social media platforms. Objective Although digital treatments delivered through social media and the internet are a possibility, it is unclear how interventions using these tools should be tailored for groups with problematic substance use. There is limited research comparing consumer trends of use of social media platforms, use of platform features, and vulnerability of exposure to drug cues online. The goal of this study was to compare digital platforms used among adolescents (Generation Zs, age 13-17) and emerging adults (Millennials, age 18-35) attending outpatient substance use treatment and to examine receptiveness toward these platforms in order to support substance use treatment and recovery. Methods Generation Zs and Millennials enrolled in outpatient substance use treatment (n=164) completed a survey examining social media use, digital intervention acceptability, frequency of substance exposure, and substance use experiences. Generation Zs (n=53) completed the survey in July 2018. Millennials (n=111) completed the survey in May 2016. Results Generation Zs had an average age of 15.66 (SD 1.18) years and primarily identified as male (50.9%). Millennials had an average age of 27.66 (SD 5.12) years and also primarily identified as male (75.7%). Most participants owned a social media account (Millennials: 82.0%, Generation Zs: 94.3%) and used it daily (Millennials: 67.6%, Generation Zs: 79.2%); however, Generation Zs were more likely to use Instagram and Snapchat, whereas Millennials were more likely to use Facebook. Further, Generation Zs were more likely to use the features within social media platforms (eg, instant messaging: Millennials: 55.0%, Generation Zs: 79.2%; watching videos: Millennials: 56.8%, Generation Zs: 81.1%). Many participants observed drug cues on social media (Millennials: 67.5%, Generation Zs: 71.7%). However, fewer observed recovery information on social media (Millennials: 30.6%, Generation Zs: 34.0%). Participants felt that social media (Millennials: 55.0%, Generation Zs: 49.1%), a mobile phone app (Millennials: 36.9%, Generation Zs: 45.3%), texting (Millennials: 28.8%, Generation Zs: 45.3%), or a website (Millennials: 39.6%, Generation Zs: 32.1%) would be useful in delivering recovery support. Conclusions Given the high rates of exposure to drug cues on social media, disseminating recovery support within a social media platform may be the ideal just-in-time intervention needed to decrease the rates of recurrent drug use. However, our results suggest that cross-platform solutions capable of transcending generational preferences are necessary and one-size-fits-all digital interventions should be avoided.
Collapse
Affiliation(s)
- Brenda L Curtis
- Psychology-Addictions Treatment Research Center, University of Pennsylvania, Philadelphia, PA, United States
| | - Robert D Ashford
- Psychology-Addictions Treatment Research Center, University of Pennsylvania, Philadelphia, PA, United States.,Substance Use Disorders Institute, University of the Sciences, Pennsylvania, PA, United States
| | - Katherine I Magnuson
- Department of Psychology and Neuroscience, Baylor University, Waco, TX, United States
| | - Stacy R Ryan-Pettes
- Department of Psychology and Neuroscience, Baylor University, Waco, TX, United States
| |
Collapse
|
34
|
Ashford RD, Curtis BL. Commentary on Cohn and Colleagues: Discussions of Alcohol Use in an Online Social Network for Smoking Cessation: Analysis of Topics, Sentiment, and Social Network Centrality (ACER, 2019). Alcohol Clin Exp Res 2019; 43:401-404. [PMID: 30589438 PMCID: PMC10765966 DOI: 10.1111/acer.13945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 12/19/2018] [Indexed: 11/30/2022]
Affiliation(s)
- Robert D Ashford
- Substance Use Disorders Institute , University of the Sciences, Philadelphia, Pennsylvania
| | - Brenda L Curtis
- Department of Psychiatry , Center for Studies of Addiction, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
35
|
Identifying substance use risk based on deep neural networks and Instagram social media data. Neuropsychopharmacology 2019; 44:487-494. [PMID: 30356094 PMCID: PMC6333814 DOI: 10.1038/s41386-018-0247-x] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 09/19/2018] [Accepted: 10/15/2018] [Indexed: 01/10/2023]
Abstract
Social media may provide new insight into our understanding of substance use and addiction. In this study, we developed a deep-learning method to automatically classify individuals' risk for alcohol, tobacco, and drug use based on the content from their Instagram profiles. In total, 2287 active Instagram users participated in the study. Deep convolutional neural networks for images and long short-term memory (LSTM) for text were used to extract predictive features from these data for risk assessment. The evaluation of our approach on a held-out test set of 228 individuals showed that among the substances we evaluated, our method could estimate the risk of alcohol abuse with statistical significance. These results are the first to suggest that deep-learning approaches applied to social media data can be used to identify potential substance use risk behavior, such as alcohol use. Utilization of automated estimation techniques can provide new insights for the next generation of population-level risk assessment and intervention delivery.
Collapse
|