1
|
Macanovic A, Przepiorka W. A systematic evaluation of text mining methods for short texts: Mapping individuals' internal states from online posts. Behav Res Methods 2024; 56:2782-2803. [PMID: 38575776 PMCID: PMC11133038 DOI: 10.3758/s13428-024-02381-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2024] [Indexed: 04/06/2024]
Abstract
Short texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals' internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders' evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets, while providing best-practice recommendations.
Collapse
Affiliation(s)
- Ana Macanovic
- Department of Sociology/ICS, Utrecht University, Utrecht, The Netherlands.
| | - Wojtek Przepiorka
- Department of Sociology/ICS, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
2
|
Hou XD, Guntuku SC, Cho YM, Sherman G, Zhang T, Li M, Ungar L, Tay L. A cross-cultural examination of temporal orientation through everyday language on social media. PLoS One 2024; 19:e0292963. [PMID: 38457381 PMCID: PMC10923455 DOI: 10.1371/journal.pone.0292963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 10/02/2023] [Indexed: 03/10/2024] Open
Abstract
Past research has shown that culture can form and shape our temporal orientation-the relative emphasis on the past, present, or future. However, there are mixed findings on how temporal orientations vary between North American and East Asian cultures due to the limitations of survey methodology and sampling. In this study, we applied an inductive approach and leveraged big data and natural language processing between two popular social media platforms-Twitter and Weibo-to assess the similarities and differences in temporal orientation in the United States of America and China, respectively. We first established predictive models from annotation data and used them to classify a larger set of English Twitter sentences (NTW = 1,549,136) and a larger set of Chinese Weibo sentences (NWB = 95,181) into four temporal catetories-past, future, atemporal present, and temporal present. Results show that there is no significant difference between Twitter and Weibo on past or future orientations; the large temporal orientation difference between North Americans and Chinese derives from their different prevailing focus on atemporal (e.g., facts, ideas) present (Twitter) or temporal present (e.g., the "here" and "now") (Weibo). Our findings contribute to the debate on cultural differences in temporal orientations with new perspectives following a new methodological approach. The study's implications call for a reevaluation of how temporal orientation is measured in cross-cultural studies, emphasizing the use of large-scale language data and acknowledging the atemporal present category. Understanding temporal orientations can guide effective cross-cultural communication strategies to tailor approaches for different audience based on temporal orientations, enhancing intercultural understanding and engagement.
Collapse
Affiliation(s)
- Xin Daphne Hou
- Department of Psychological Sciences, Purdue University, West Lafayette, IN, United States of America
| | - Sharath Chandra Guntuku
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States of America
| | - Young-Min Cho
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States of America
| | - Garrick Sherman
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States of America
| | - Tingdan Zhang
- Collaborative Innovation Center of Assessment Toward Basic Education Quality, Beijing Normal University, Beijing, China
| | - Mingyang Li
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States of America
| | - Lyle Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States of America
| | - Louis Tay
- Department of Psychological Sciences, Purdue University, West Lafayette, IN, United States of America
| |
Collapse
|
3
|
Mayor E, Bietti LM. Language use on Twitter reflects social structure and social disparities. Heliyon 2024; 10:e23528. [PMID: 38293550 PMCID: PMC10825303 DOI: 10.1016/j.heliyon.2023.e23528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 11/24/2023] [Accepted: 12/05/2023] [Indexed: 02/01/2024] Open
Abstract
Large-scale mental health assessments increasingly rely upon user-contributed social media data. It is widely known that mental health and well-being are affected by minority group membership and social disparity. But do these factors manifest in the language use of social media users? We elucidate this question using spatial lag regressions. We examined the county-level (N = 1069) associations of lexical indicators linked to well-being and mental health, notably depression (e.g., first-person singular pronouns, negative emotions) with markers of social disparity (e.g., the Area Deprivation Index-3) and ethnicity, using a sample of approximately 30 million content-coded tweets (U.S. county-level aggregation). Results confirmed most expected associations: County-level lexical indicators of depression are positively linked with county-level area disparity (e.g., economic hardship and inequity) and percentage of ethnic minority groups. Predictive validity checks show that lexical indicators are related to future health and mental health outcomes. Lexical indicators of depression and adjustment coded from tweets aggregated at the county level could play a crucial role in prioritizing public health campaigns, particularly in socially deprived counties.
Collapse
|
4
|
Sametoğlu S, Pelt DHM, Eichstaedt JC, Ungar LH, Bartels M. Comparison of wellbeing structures based on survey responses and social media language: A network analysis. Appl Psychol Health Well Being 2023; 15:1555-1582. [PMID: 37161901 DOI: 10.1111/aphw.12451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 04/07/2023] [Indexed: 05/11/2023]
Abstract
Wellbeing is predominantly measured through surveys but is increasingly measured by analysing individuals' language on social media platforms using social media text mining (SMTM). To investigate whether the structure of wellbeing is similar across both data collection methods, we compared networks derived from survey items and social media language features collected from the same participants. The dataset was split into an independent exploration (n = 1169) and a final subset (n = 1000). After estimating exploration networks, redundant survey items and language topics were eliminated. Final networks were then estimated using exploratory graph analysis (EGA). The networks of survey items and those from language topics were similar, both consisting of five wellbeing dimensions. The dimensions in the survey- and SMTM-based assessment of wellbeing showed convergent structures congruent with theories of wellbeing. Specific dimensions found in each network reflected the unique aspects of each type of data (survey and social media language). Networks derived from both language features and survey items show similar structures. Survey and SMTM methods may provide complementary methods to understand differences in human wellbeing.
Collapse
Affiliation(s)
- Selim Sametoğlu
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Dirk H M Pelt
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Johannes C Eichstaedt
- Department of Psychology, Stanford University, Stanford, California, USA
- Institute for Human-Centered AI, Stanford University, Stanford, California, USA
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Meike Bartels
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| |
Collapse
|
5
|
Mazonde N, Goldstein S. Online Health Communities' Portrayal of Obesity on Social Media Platforms in South Africa. JOURNAL OF HEALTH COMMUNICATION 2023; 28:15-24. [PMID: 38146160 DOI: 10.1080/10810730.2023.2231374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
The rapidly increasing prevalence of obesity in South Africa, intertwined with extensive changes in diet, life expectancy, and nutritional status has led to a complex framing of obesity on social media. This has prompted the prioritization of media-based social and behavior change communication interventions leveraging social media for obesity prevention. This study was conducted to understand how obesity is constructed and represented on social media in South Africa. A media review of Facebook and Twitter platforms in South Africa was conducted over a six-month period using Meltwater software for data collection. The search yielded 13 500 posts and tweets. Data were cleaned and coded in Microsoft Excel. Content and framing analysis were performed to add insight into the nature of obesity discourse on social media. Portrayals of obesity on social media were dominated by stigmatizing imagery blaming individuals for unhealthy lifestyles, poor diets, and lack of physical activity. Future media-based social and behavior change communication interventions for obesity prevention can leverage social media to reach the broader public and insights into media portrayals of obesity have the potential to influence the shape and development of these behavioral interventions.
Collapse
Affiliation(s)
- Natasha Mazonde
- SAMRC/Wits Centre for Health Economics & Decision Science- PRICELESS-SA, University of Witwatersrand, Faculty of Health Sciences, School of Public Health, Parktown Johannesburg, South Africa
| | - Susan Goldstein
- SAMRC/Wits Centre for Health Economics & Decision Science- PRICELESS-SA, University of Witwatersrand, Faculty of Health Sciences, School of Public Health, Parktown Johannesburg, South Africa
| |
Collapse
|
6
|
Curtis B, Giorgi S, Ungar L, Vu H, Yaden D, Liu T, Yadeta K, Schwartz HA. AI-based analysis of social media language predicts addiction treatment dropout at 90 days. Neuropsychopharmacology 2023; 48:1579-1585. [PMID: 37095253 PMCID: PMC10517013 DOI: 10.1038/s41386-023-01585-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 04/03/2023] [Accepted: 04/05/2023] [Indexed: 04/26/2023]
Abstract
The reoccurrence of use (relapse) and treatment dropout is frequently observed in substance use disorder (SUD) treatment. In the current paper, we evaluated the predictive capability of an AI-based digital phenotype using the social media language of patients receiving treatment for substance use disorders (N = 269). We found that language phenotypes outperformed a standard intake psychometric assessment scale when predicting patients' 90-day treatment outcomes. We also use a modern deep learning-based AI model, Bidirectional Encoder Representations from Transformers (BERT) to generate risk scores using pre-treatment digital phenotype and intake clinic data to predict dropout probabilities. Nearly all individuals labeled as low-risk remained in treatment while those identified as high-risk dropped out (risk score for dropout AUC = 0.81; p < 0.001). The current study suggests the possibility of utilizing social media digital phenotypes as a new tool for intake risk assessment to identify individuals most at risk of treatment dropout and relapse.
Collapse
Affiliation(s)
- Brenda Curtis
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA.
| | - Salvatore Giorgi
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Lyle Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA
| | - Huy Vu
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - David Yaden
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Tingting Liu
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA
| | - Kenna Yadeta
- Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
7
|
Fedorova M, Shevyakova T. Shocktainment Techniques in the Mirror, a British Daily Tabloid: Linguocultural Features of news and semantic-cognitive Analysis of Leading Concepts. JOURNAL OF PSYCHOLINGUISTIC RESEARCH 2023; 52:1669-1683. [PMID: 37171684 DOI: 10.1007/s10936-023-09971-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 04/27/2023] [Indexed: 05/13/2023]
Abstract
This study analyzed 300 news articles from the online version of the British daily tabloid The Mirror. Quantitative analysis showed that shocktainment elements are present in 150 news articles out of 300. The results showed that some shocktainment categories are either incompletely presented or lose to other categories. At the same time, the techniques of familiarity (800), absurdity (908), outrage (5,105), influence (4,200), and persuasion (1,101) are used much less frequently than others. The study proved that the authors of the articles actively use shocktainment techniques since they allow the authors to explain the linguocultural features of the country to the readers easily and understandably. The analysis also revealed the main topics that are disclosed in news articles with the use of shocktainment and trigger cognitive processes in readers. The study results can help to correct theoretical and empirical gaps in the research on news media discourse.
Collapse
Affiliation(s)
- Mariya Fedorova
- Faculty of Postgraduate Education, Kazakh Ablai Khan University of International Relations and World Languages (KAUIR&WL), Almaty, Kazakhstan.
| | - Tatyana Shevyakova
- Faculty of Postgraduate Education, Kazakh Ablai Khan University of International Relations and World Languages (KAUIR&WL), Almaty, Kazakhstan
| |
Collapse
|
8
|
Lane JM, Habib D, Curtis B. Linguistic Methodologies to Surveil the Leading Causes of Mortality: Scoping Review of Twitter for Public Health Data. J Med Internet Res 2023; 25:e39484. [PMID: 37307062 PMCID: PMC10337472 DOI: 10.2196/39484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 01/26/2023] [Accepted: 02/07/2023] [Indexed: 02/10/2023] Open
Abstract
BACKGROUND Twitter has become a dominant source of public health data and a widely used method to investigate and understand public health-related issues internationally. By leveraging big data methodologies to mine Twitter for health-related data at the individual and community levels, scientists can use the data as a rapid and less expensive source for both epidemiological surveillance and studies on human behavior. However, limited reviews have focused on novel applications of language analyses that examine human health and behavior and the surveillance of several emerging diseases, chronic conditions, and risky behaviors. OBJECTIVE The primary focus of this scoping review was to provide a comprehensive overview of relevant studies that have used Twitter as a data source in public health research to analyze users' tweets to identify and understand physical and mental health conditions and remotely monitor the leading causes of mortality related to emerging disease epidemics, chronic diseases, and risk behaviors. METHODS A literature search strategy following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) extended guidelines for scoping reviews was used to search specific keywords on Twitter and public health on 5 databases: Web of Science, PubMed, CINAHL, PsycINFO, and Google Scholar. We reviewed the literature comprising peer-reviewed empirical research articles that included original research published in English-language journals between 2008 and 2021. Key information on Twitter data being leveraged for analyzing user language to study physical and mental health and public health surveillance was extracted. RESULTS A total of 38 articles that focused primarily on Twitter as a data source met the inclusion criteria for review. In total, two themes emerged from the literature: (1) language analysis to identify health threats and physical and mental health understandings about people and societies and (2) public health surveillance related to leading causes of mortality, primarily representing 3 categories (ie, respiratory infections, cardiovascular disease, and COVID-19). The findings suggest that Twitter language data can be mined to detect mental health conditions, disease surveillance, and death rates; identify heart-related content; show how health-related information is shared and discussed; and provide access to users' opinions and feelings. CONCLUSIONS Twitter analysis shows promise in the field of public health communication and surveillance. It may be essential to use Twitter to supplement more conventional public health surveillance approaches. Twitter can potentially fortify researchers' ability to collect data in a timely way and improve the early identification of potential health threats. Twitter can also help identify subtle signals in language for understanding physical and mental health conditions.
Collapse
Affiliation(s)
- Jamil M Lane
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Daniel Habib
- Technology and Translational Research Unit, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| | - Brenda Curtis
- Technology and Translational Research Unit, National Institute on Drug Abuse, National Institutes of Health, Baltimore, MD, United States
| |
Collapse
|
9
|
Stade EC, Ungar L, Havaldar S, Ruscio AM. Perseverative thinking is associated with features of spoken language. Behav Res Ther 2023; 165:104307. [PMID: 37121016 DOI: 10.1016/j.brat.2023.104307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 03/07/2023] [Accepted: 03/23/2023] [Indexed: 04/03/2023]
Abstract
Perseverative thinking (PT), such as rumination or worry, is a transdiagnostic process implicated in the onset and maintenance of emotional disorders. Existing measures of PT are limited by demand and expectancy effects, cognitive biases, and reflexivity, leading to calls for unobtrusive, behavioral measures. In response, we developed a behavioral measure of PT based on language. A mixed sample of 188 participants with major depressive disorder, generalized anxiety disorder, or no psychopathology completed self-report PT measures. Participants were also interviewed, providing a natural language sample. We examined language features associated with PT, then built a language-based PT model and examined its predictive power. PT was associated with multiple language features, most notably I-usage (e.g., "I", "me"; β = 0.25) and negative emotion language (e.g., "anxiety", "difficult"; β = 0.19). In machine learning analyses, language features accounted for 14% of the variance in self-reported PT. Language-based PT predicted the presence and severity of depression and anxiety, psychiatric comorbidity, and treatment seeking, with effects in the r = 0.15-0.41 range. PT has face-valid linguistic correlates and our language-based measure holds promise for assessing PT unobtrusively. With further development, this measure could be used to passively detect PT for deployment of "just-in-time" interventions.
Collapse
|
10
|
Son Y, Clouston SAP, Kotov R, Eichstaedt JC, Bromet EJ, Luft BJ, Schwartz HA. World Trade Center responders in their own words: predicting PTSD symptom trajectories with AI-based language analyses of interviews. Psychol Med 2023; 53:918-926. [PMID: 34154682 PMCID: PMC8692489 DOI: 10.1017/s0033291721002294] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 05/15/2021] [Accepted: 05/21/2021] [Indexed: 12/24/2022]
Abstract
BACKGROUND Oral histories from 9/11 responders to the World Trade Center (WTC) attacks provide rich narratives about distress and resilience. Artificial Intelligence (AI) models promise to detect psychopathology in natural language, but they have been evaluated primarily in non-clinical settings using social media. This study sought to test the ability of AI-based language assessments to predict PTSD symptom trajectories among responders. METHODS Participants were 124 responders whose health was monitored at the Stony Brook WTC Health and Wellness Program who completed oral history interviews about their initial WTC experiences. PTSD symptom severity was measured longitudinally using the PTSD Checklist (PCL) for up to 7 years post-interview. AI-based indicators were computed for depression, anxiety, neuroticism, and extraversion along with dictionary-based measures of linguistic and interpersonal style. Linear regression and multilevel models estimated associations of AI indicators with concurrent and subsequent PTSD symptom severity (significance adjusted by false discovery rate). RESULTS Cross-sectionally, greater depressive language (β = 0.32; p = 0.049) and first-person singular usage (β = 0.31; p = 0.049) were associated with increased symptom severity. Longitudinally, anxious language predicted future worsening in PCL scores (β = 0.30; p = 0.049), whereas first-person plural usage (β = -0.36; p = 0.014) and longer words usage (β = -0.35; p = 0.014) predicted improvement. CONCLUSIONS This is the first study to demonstrate the value of AI in understanding PTSD in a vulnerable population. Future studies should extend this application to other trauma exposures and to other demographic groups, especially under-represented minorities.
Collapse
Affiliation(s)
- Youngseo Son
- Department of Computer Science, Stony Brook University, New York, USA
| | - Sean A. P. Clouston
- Program in Public Health, Stony Brook University, New York, USA
- Department of Family, Population and Preventive Medicine, Stony Brook University, New York, USA
| | - Roman Kotov
- Department of Psychiatry, Stony Brook University, New York, USA
| | - Johannes C. Eichstaedt
- Department of Psychology & Institute for Human-Centered A.I., Stanford University, Stanford, California, USA
| | | | | | | |
Collapse
|
11
|
Williams CYK, Li RX, Luo MY, Bance M. Exploring patient experiences and concerns in the online Cochlear implant community: A cross-sectional study and validation of automated topic modelling. Clin Otolaryngol 2023; 48:442-450. [PMID: 36645237 DOI: 10.1111/coa.14037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 12/20/2022] [Accepted: 01/07/2023] [Indexed: 01/17/2023]
Abstract
OBJECTIVE There is a paucity of research examining patient experiences of cochlear implants. We sought to use natural language processing methods to explore patient experiences and concerns in the online cochlear implant (CI) community. MATERIALS AND METHODS Cross-sectional study of posts on the online Reddit r/CochlearImplants forum from 1 March 2015 to 11 November 2021. Natural language processing using the BERTopic automated topic modelling technique was employed to cluster posts into semantically similar topics. Topic categorisation was manually validated by two independent reviewers and Cohen's kappa calculated to determine inter-rater reliability between machine vs human and human vs human categorisation. RESULTS We retrieved 987 posts from 588 unique Reddit users on the r/CochlearImplants forum. Posts were initially categorised by BERTopic into 16 different Topics, which were increased to 23 Topics following manual inspection. The most popular topics related to CI connectivity (n = 112), adults considering getting a CI (n = 107), surgery-related posts (n = 89) and day-to-day living with a CI (n = 85). Cohen's kappa among all posts was 0.62 (machine vs. human) and 0.72 (human vs. human), and among categorised posts was 0.85 (machine vs. human) and 0.84 (human vs. human). CONCLUSIONS This cross-sectional study of social media discussions among the online cochlear implant community identified common attitudes, experiences and concerns of patients living with, or seeking, a cochlear implant. Our validation of natural language processing methods to categorise topics shows that automated analysis of similar Otolaryngology-related content is a viable and accurate alternative to manual qualitative approaches.
Collapse
Affiliation(s)
- Christopher Y K Williams
- School of Clinical Medicine, University of Cambridge, Cambridge, UK.,Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Rosia X Li
- School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Michael Y Luo
- School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Manohar Bance
- Department of Otolaryngology-Head and Neck Surgery, Addenbrooke's Hospital, Cambridge, UK
| |
Collapse
|
12
|
Gawda B. The novel narrative technique uncovers emotional scripts in individuals with psychopathy and high trait anxiety. PLoS One 2023; 18:e0283391. [PMID: 36952499 PMCID: PMC10045615 DOI: 10.1371/journal.pone.0283391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 03/08/2023] [Indexed: 03/25/2023] Open
Abstract
Mental representations are of great importance for understanding human behaviour. The aim of this article is to present an innovative way to assess emotional scripts, which are a form of mental representations of emotional events, based on an analysis of narratives and their contents. Theoretical background on emotional schemas and scripts is provided along with information about types of related measures. Then, a rationale is presented for introducing an assessment of scripts related to specific emotions such as love, hate, and anxiety in a psychopathological context. This is followed by a perspective explaining the procedure of the relevant technique based on narrative data analysis. The technique has been successfully applied in two studies [I study (n- 200), II study (n- 280)]. A total of 1440 narratives about specific emotions have been analyzed to identify the indicators of scripts. The psychometric properties of the proposed technique have been established such as reliability, inter-rater agreement, and accuracy. The results show the value of the assessment of emotional script in individuals, particularly with high psychopathy and high trait anxiety. The contents of love and hate scripts are an illustration of cognitive distortions and deficits in the emotional information processing in individuals with psychopathy. The method enables the collection of informative data on romantic love, hate, and anxiety scripts which provides insight into how people may perceive and experience emotions and how they behave emotionally. Future research should focus on verification of the technique in other types of psychopathology and on the improvement of computer software dedicated to the narrative technique described in this paper.
Collapse
Affiliation(s)
- Barbara Gawda
- Department of Psychology of Emotion & Personality, Maria Curie-Sklodowska University, Lublin, Poland
| |
Collapse
|
13
|
Ravenda D, Valencia-Silva MM, Argiles-Bosch JM, García-Blandón J. The strategic usage of Facebook by local governments: A structural topic modelling analysis. INFORMATION & MANAGEMENT 2022. [DOI: 10.1016/j.im.2022.103704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
14
|
Liu T, Ungar LH, Curtis B, Sherman G, Yadeta K, Tay L, Eichstaedt JC, Guntuku SC. Head versus heart: social media reveals differential language of loneliness from depression. NPJ MENTAL HEALTH RESEARCH 2022; 1:16. [PMID: 38609477 PMCID: PMC10955894 DOI: 10.1038/s44184-022-00014-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 09/12/2022] [Indexed: 04/14/2024]
Abstract
We study the language differentially associated with loneliness and depression using 3.4-million Facebook posts from 2986 individuals, and uncover the statistical associations of survey-based depression and loneliness with both dictionary-based (Linguistic Inquiry Word Count 2015) and open-vocabulary linguistic features (words, phrases, and topics). Loneliness and depression were found to have highly overlapping language profiles, including sickness, pain, and negative emotions as (cross-sectional) risk factors, and social relationships and activities as protective factors. Compared to depression, the language associated with loneliness reflects a stronger cognitive focus, including more references to cognitive processes (i.e., differentiation and tentative language, thoughts, and the observation of irregularities), and cognitive activities like reading and writing. As might be expected, less lonely users were more likely to reference social relationships (e.g., friends and family, romantic relationships), and use first-person plural pronouns. Our findings suggest that the mechanisms of loneliness include self-oriented cognitive activities (i.e., reading) and an overattention to the interpretation of information in the environment. These data-driven ecological findings suggest interventions for loneliness that target maladaptive social cognitions (e.g., through reframing the perception of social environments), strengthen social relationships, and treat other affective distress (i.e., depression).
Collapse
Affiliation(s)
- Tingting Liu
- National Institute on Drug Abuse (NIDA IRP), National Institutes of Health (NIH), Baltimore, MD, USA.
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA.
| | - Lyle H Ungar
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Brenda Curtis
- National Institute on Drug Abuse (NIDA IRP), National Institutes of Health (NIH), Baltimore, MD, USA
| | - Garrick Sherman
- Positive Psychology Center, University of Pennsylvania, Philadelphia, PA, USA
| | - Kenna Yadeta
- National Institute on Drug Abuse (NIDA IRP), National Institutes of Health (NIH), Baltimore, MD, USA
| | - Louis Tay
- Department of Psychological Sciences, Purdue University, West Lafayette, IN, USA
| | - Johannes C Eichstaedt
- Department of Psychology, Institute for Human-Centered A.I., Stanford University, Stanford, CA, USA
| | - Sharath Chandra Guntuku
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
15
|
Santos LA, Voelkel JG, Willer R, Zaki J. Belief in the Utility of Cross-Partisan Empathy Reduces Partisan Animosity and Facilitates Political Persuasion. Psychol Sci 2022; 33:1557-1573. [PMID: 36041234 DOI: 10.1177/09567976221098594] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
In polarized political environments, partisans tend to deploy empathy parochially, furthering division. We propose that belief in the usefulness of cross-partisan empathy-striving to understand other people with whom one disagrees politically-promotes out-group empathy and has powerful ramifications for both intra- and interpersonal processes. Across four studies (total N = 4,748), we examined these predictions in online and college samples using surveys, social-network analysis, preregistered experiments, and natural-language processing. Believing that cross-partisan empathy is useful is associated with less partisan division and politically diverse friendship networks (Studies 1 and 2). When prompted to believe that empathy is a political resource-versus a political weakness-people become less affectively polarized (Study 3) and communicate in ways that decrease out-partisans' animosity and attitudinal polarization (Study 4). These findings demonstrate that belief in cross-partisan empathy impacts not only individuals' own attitudes and behaviors but also the attitudes of those they communicate with.
Collapse
Affiliation(s)
| | | | - Robb Willer
- Department of Sociology, Stanford University
| | - Jamil Zaki
- Department of Psychology, Stanford University
| |
Collapse
|
16
|
The Prediction of Consumer Behavior from Social Media Activities. Behav Sci (Basel) 2022; 12:bs12080284. [PMID: 36004855 PMCID: PMC9404982 DOI: 10.3390/bs12080284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 08/08/2022] [Accepted: 08/10/2022] [Indexed: 11/21/2022] Open
Abstract
Consumer behavior variants are evolving by utilizing advanced packing models. These models can make consumer behavior detection considerably problematic. New techniques that are superior to customary models to be utilized to efficiently observe consumer behaviors. Machine learning models are no longer efficient in identifying complex consumer behavior variants. Deep learning models can be a capable solution for detecting all consumer behavior variants. In this paper, we are proposing a new deep learning model to classify consumer behavior variants using an ensemble architecture. The new model incorporates two pretrained learning algorithms in an optimized fashion. This model has four main phases, namely, data gathering, deep neural modeling, model training, and deep learning model evaluation. The ensemble model is tested on Facemg BIG-D15 and TwitD databases. The experiment results depict that the ensemble model can efficiently classify consumer behavior with high precision that outperforms recent models in the literature. The ensemble model achieved 98.78% accuracy on the Facemg database, which is higher than most machine learning consumer behavior detection models by more than 8%.
Collapse
|
17
|
A methodology for preprocessing structured big data in the behavioral sciences. Behav Res Methods 2022:10.3758/s13428-022-01895-4. [PMID: 35768746 DOI: 10.3758/s13428-022-01895-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/27/2022] [Indexed: 11/08/2022]
Abstract
The characteristics of big data, including high volume, increased variety, and velocity, pose special challenges for data analysis. As these characteristics generally preclude manual data inspection and processing, researchers must often use computational methodologies to deal with this type of data; techniques that may be unfamiliar to nonspecialists, including behavioral scientists. However, previous data analytics methodologies within the field of computer science, developed to handle the generic tasks of data collection, preprocessing, and analysis, can be appropriated for use in other disciplines. These methodologies involve a sequential pipeline of quality checks to prepare data sets for analysis and application. Building upon these methodologies, this paper describes the Big Data Quality & Statistical Assurance (BDQSA) model, applicable for researchers in the behavioral sciences. It involves a series of data preprocessing tasks, to achieve data understanding, as well as data screening, cleaning, and transformation. These are followed by a statistical quality phase, which includes extraction of the relevant data subset, type conversions, ensuring sample representativeness when appropriate, and assessing statistical assumptions. The resulting model thereby provides methodological guidance for the preprocessing of behavioral science big data, aimed at ensuring acceptable data quality before analysis is undertaken. Sample R code snippets demonstrating the application of this model are provided throughout the paper.
Collapse
|
18
|
The development and validation of the Romanian version of Linguistic Inquiry and Word Count 2015 (Ro-LIWC2015). CURRENT PSYCHOLOGY 2022. [DOI: 10.1007/s12144-020-00872-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
AbstractToday, performing automatic language analysis to extract meaning from natural language is one of the top-notch directions in social science research, but it can be challenging. Linguistic Inquiry and Word Count 2015 (LIWC2015; Pennebaker et al. 2015) is one of the most versatile, yet easy to master instruments to transform any text into data, meeting the needs of psychologists who are not usually proficient in data science. Moreover, LIWC2015 is already available in multiple languages, which opens the door to exciting intercultural quests. The current article introduces the first Romanian version of LIWC2015, Ro-LIWC2015, and thus, contributes to the line of research concerning multilingual analysis. Throughout the paper, we describe the challenges of creating the Romanian dictionary and discuss other linguistics aspects, which could be useful for new adaptations of LIWC2015. Also, we present the results of two studies for assessing the criterion validity of Ro-LIWC2015. The first study focuses on the consistency between the Romanian and the English dictionaries in analyzing a corpus of books. The second study tests whether Ro-LIWC2015 can acquire linguistic differences in contrasting corpora. For this purpose, we analyzed posts from help-seeking forums for anxiety, depression, and health issues, and leveraged supervised learning to address several classification problems. The selected algorithm allows feature ranking, which facilitates more thorough interpretations. The linguistic markers extracted with Ro-LIWC2015 mirrored a number of disorder-specific features of depression and anxiety. Given the obtained results, this research encourages the use of Ro-LIWC2015 for hypothesis testing.
Collapse
|
19
|
Jose R, Matero M, Sherman G, Curtis B, Giorgi S, Schwartz HA, Ungar LH. Using Facebook language to predict and describe excessive alcohol use. Alcohol Clin Exp Res 2022; 46:836-847. [PMID: 35575955 PMCID: PMC9179895 DOI: 10.1111/acer.14807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 02/10/2022] [Accepted: 03/10/2022] [Indexed: 11/28/2022]
Abstract
BACKGROUND Assessing risk for excessive alcohol use is important for applications ranging from recruitment into research studies to targeted public health messaging. Social media language provides an ecologically embedded source of information for assessing individuals who may be at risk for harmful drinking. METHODS Using data collected on 3664 respondents from the general population, we examine how accurately language used on social media classifies individuals as at-risk for alcohol problems based on Alcohol Use Disorder Identification Test-Consumption score benchmarks. RESULTS We find that social media language is moderately accurate (area under the curve = 0.75) at identifying individuals at risk for alcohol problems (i.e., hazardous drinking/alcohol use disorders) when used with models based on contextual word embeddings. High-risk alcohol use was predicted by individuals' usage of words related to alcohol, partying, informal expressions, swearing, and anger. Low-risk alcohol use was predicted by individuals' usage of social, affiliative, and faith-based words. CONCLUSIONS The use of social media data to study drinking behavior in the general public is promising and could eventually support primary and secondary prevention efforts among Americans whose at-risk drinking may have otherwise gone "under the radar."
Collapse
Affiliation(s)
- Rupa Jose
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Matthew Matero
- Department of Computer Science, Stony Brook University, Stony Brook, New York, USA
| | - Garrick Sherman
- Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Brenda Curtis
- Technology and Translational Research Unit, National Institute on Drug Abuse, Baltimore, Maryland, USA
| | - Salvatore Giorgi
- Technology and Translational Research Unit, National Institute on Drug Abuse, Baltimore, Maryland, USA.,Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | | | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Department of Psychology, Positive Psychology Center, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
20
|
Ashokkumar A, Pennebaker JW. Tracking group identity through natural language within groups. PNAS NEXUS 2022; 1:pgac022. [PMID: 35774418 PMCID: PMC9229362 DOI: 10.1093/pnasnexus/pgac022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 02/16/2022] [Accepted: 03/28/2022] [Indexed: 01/29/2023]
Abstract
To what degree can we determine people's connections with groups through the language they use? In recent years, large archives of behavioral data from social media communities have become available to social scientists, opening the possibility of tracking naturally occurring group identity processes. A feature of most digital groups is that they rely exclusively on the written word. Across 3 studies, we developed and validated a language-based metric of group identity strength and demonstrated its potential in tracking identity processes in online communities. In Studies 1a-1c, 873 people wrote about their connections to various groups (country, college, or religion). A total of 2 language markers of group identity strength were found: high affiliation (more words like we, togetherness) and low cognitive processing or questioning (fewer words like think, unsure). Using these markers, a language-based unquestioning affiliation index was developed and applied to in-class stream-of-consciousness essays of 2,161 college students (Study 2). Greater levels of unquestioning affiliation expressed in language predicted not only self-reported university identity but also students' likelihood of remaining enrolled in college a year later. In Study 3, the index was applied to naturalistic Reddit conversations of 270,784 people in 2 online communities of supporters of the 2016 presidential candidates-Hillary Clinton and Donald Trump. The index predicted how long people would remain in the group (3a) and revealed temporal shifts mirroring members' joining and leaving of groups (3b). Together, the studies highlight the promise of a language-based approach for tracking and studying group identity processes in online groups.
Collapse
Affiliation(s)
- Ashwini Ashokkumar
- Polarization and Social Change Lab, 450 Jane Stanford Way Building 120, Room 201, Stanford, CA 94305, USA
| | - James W Pennebaker
- Department of Psychology, University of Texas Austin, 108 E. Dean Keeton, Austin, TX 78712-0187, USA
| |
Collapse
|
21
|
Nanath K, Balasubramanian S, Shukla V, Islam N, Kaitheri S. Developing a mental health index using a machine learning approach: Assessing the impact of mobility and lockdown during the COVID-19 pandemic. TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE 2022; 178:121560. [PMID: 35185222 PMCID: PMC8841156 DOI: 10.1016/j.techfore.2022.121560] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 02/03/2022] [Accepted: 02/05/2022] [Indexed: 06/14/2023]
Abstract
Governments worldwide have implemented stringent restrictions to curtail the spread of the COVID-19 pandemic. Although beneficial to physical health, these preventive measures could have a profound detrimental effect on the mental health of the population. This study focuses on the impact of lockdowns and mobility restrictions on mental health during the COVID-19 pandemic. We first develop a novel mental health index based on the analysis of data from over three million global tweets using the Microsoft Azure machine learning approach. The computed mental health index scores are then regressed with the lockdown strictness index and Google mobility index using fixed-effects ordinary least squares (OLS) regression. The results reveal that the reduction in workplace mobility, reduction in retail and recreational mobility, and increase in residential mobility (confinement to the residence) have harmed mental health. However, restrictions on mobility to parks, grocery stores, and pharmacy outlets were found to have no significant impact. The proposed mental health index provides a path for theoretical and empirical mental health studies using social media.
Collapse
Affiliation(s)
| | | | | | - Nazrul Islam
- Department of Science, Innovation, Technology and Entrepreneurship, University of Exeter Business School, UK
| | | |
Collapse
|
22
|
Pradhan R, Sharma DK. An ensemble deep learning classifier for sentiment analysis on code-mix Hindi-English data. Soft comput 2022; 27:1-18. [PMID: 35493275 PMCID: PMC9034263 DOI: 10.1007/s00500-022-07091-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/24/2022] [Indexed: 11/25/2022]
Abstract
Code-mixing on social media is a trend in many countries where people speak multiple languages, such as India, where Hindi and English are major communication languages. Sentiment analysis is beneficial in understanding users' opinions and thoughts on social, economic, and political issues. It eliminates the manual monitoring of each and every review, which is a cumbersome task. However, performing sentiment analysis on code-mix data is challenging, as it involves various out of vocabulary terms and numerous issues, making it a new field in natural language processing. This work includes dealing with such text and ensembling a classifier to detect sentiment polarity. Our classifier ensembles a multilingual variant of RoBERTa and a sentence-level embedding from Universal Sentence Encoder to identify the sentiments of these code-mixed tweets with higher accuracy. This ensemble optimises the classifier's performance by using the strength of both for transfer learning. Experiments were conducted on real-life benchmark datasets and revealed their sentiment. The performance of the proposed classifier framework is compared with other baselines and deep learning models on five datasets to show the superiority of our results. Results showed improved and increased performance in the proposed classifier's accuracy, precision, and recall. The accuracy achieved by our classifier on code-mix datasets is 66% on Joshi et al. 2016, 60% on SAIL 2017, and 67% on SemEval 2020 Task-9 dataset, which is on average around 3% as compared to contemporary baselines.
Collapse
|
23
|
Jarman HK, McLean SA, Griffiths S, Teague SJ, Rodgers RF, Paxton SJ, Austen E, Harris E, Steward T, Shatte A, Khanh-Dao Le L, Anwar T, Mihalopoulos C, Parker AG, Yager Z, Fuller-Tyszkiewicz M. Critical measurement issues in the assessment of social media influence on body image. Body Image 2022; 40:225-236. [PMID: 35032949 DOI: 10.1016/j.bodyim.2021.12.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 12/16/2021] [Indexed: 02/06/2023]
Abstract
Progress towards understanding how social media impacts body image hinges on the use of appropriate measurement tools and methodologies. This review provides an overview of common (qualitative, self-report survey, lab-based experiments) and emerging (momentary assessment, computational) methodological approaches to the exploration of the impact of social media on body image. The potential of these methodologies is detailed, with examples illustrating current use as well as opportunities for expansion. A key theme from our review is that each methodology has provided insights for the body image research field, yet is insufficient in isolation to fully capture the nuance and complexity of social media experiences. Thus, in consideration of gaps in methodology, we emphasise the need for big picture thinking that leverages and combines the strengths of each of these methodologies to yield a more comprehensive, nuanced, and robust picture of the positive and negative impacts of social media.
Collapse
Affiliation(s)
- Hannah K Jarman
- School of Psychology, Deakin University, 1 Gheringhap Street, Geelong, Victoria, Australia; Centre for Social and Early Emotional Development, School of Psychology, Deakin University, Burwood, Australia.
| | - Siân A McLean
- The Bouverie Centre, School of Psychology & Public Health, La Trobe University, Melbourne, Australia
| | - Scott Griffiths
- Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Victoria, Australia
| | - Samantha J Teague
- School of Psychology, Deakin University, 1 Gheringhap Street, Geelong, Victoria, Australia; Centre for Social and Early Emotional Development, School of Psychology, Deakin University, Burwood, Australia
| | - Rachel F Rodgers
- APPEAR, Department of Applied Psychology, Northeastern University, Boston, USA; Department of Psychiatric Emergency & Acute Care, Lapeyronie Hospital, CHRU Montpellier, France
| | - Susan J Paxton
- School of Psychology & Public Health, La Trobe University, Melbourne, Australia
| | - Emma Austen
- Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Victoria, Australia
| | - Emily Harris
- Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Victoria, Australia
| | - Trevor Steward
- Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Victoria, Australia
| | - Adrian Shatte
- School of Engineering, Information Technology & Physical Sciences, Federation University, Melbourne, Australia
| | - Long Khanh-Dao Le
- Deakin Health Economics, Institute for Health Transformation, School of Health and Social Development, Deakin University, Burwood, Australia
| | - Tarique Anwar
- Department of Computing Technologies, Swinburne University of Technology, Melbourne, Australia
| | - Cathrine Mihalopoulos
- Deakin Health Economics, Institute for Health Transformation, School of Health and Social Development, Deakin University, Burwood, Australia
| | - Alexandra G Parker
- Institute for Health and Sport, Victoria University, Melbourne, Australia; Orygen and Centre for Youth Mental Health, University of Melbourne, Australia
| | - Zali Yager
- Institute for Health and Sport, Victoria University, Melbourne, Australia
| | - Matthew Fuller-Tyszkiewicz
- School of Psychology, Deakin University, 1 Gheringhap Street, Geelong, Victoria, Australia; Centre for Social and Early Emotional Development, School of Psychology, Deakin University, Burwood, Australia
| |
Collapse
|
24
|
D NK, P V G D PR, Venkata Rao K. Emotion recognition in election day tweets using optimised kernel extreme learning machine classifier. J EXP THEOR ARTIF IN 2022. [DOI: 10.1080/0952813x.2021.1960633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- N.S.B. Kavitha D
- Department of Computer Science & Systems Engineering, Andhra University College of Engineering (A), Andhra University, Visakhapatnam, India
| | - Prasad Reddy P V G D
- Department of Computer Science & Systems Engineering, Andhra University College of Engineering (A), Andhra University, Visakhapatnam, India
| | - K. Venkata Rao
- Department of Computer Science & Systems Engineering, Andhra University College of Engineering (A), Andhra University, Visakhapatnam, India
| |
Collapse
|
25
|
Tay L, Woo SE, Hickman L, Booth BM, D’Mello S. A Conceptual Framework for Investigating and Mitigating Machine-Learning Measurement Bias (MLMB) in Psychological Assessment. ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE 2022. [DOI: 10.1177/25152459211061337] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Given significant concerns about fairness and bias in the use of artificial intelligence (AI) and machine learning (ML) for psychological assessment, we provide a conceptual framework for investigating and mitigating machine-learning measurement bias (MLMB) from a psychometric perspective. MLMB is defined as differential functioning of the trained ML model between subgroups. MLMB manifests empirically when a trained ML model produces different predicted score levels for different subgroups (e.g., race, gender) despite them having the same ground-truth levels for the underlying construct of interest (e.g., personality) and/or when the model yields differential predictive accuracies across the subgroups. Because the development of ML models involves both data and algorithms, both biased data and algorithm-training bias are potential sources of MLMB. Data bias can occur in the form of nonequivalence between subgroups in the ground truth, platform-based construct, behavioral expression, and/or feature computing. Algorithm-training bias can occur when algorithms are developed with nonequivalence in the relation between extracted features and ground truth (i.e., algorithm features are differentially used, weighted, or transformed between subgroups). We explain how these potential sources of bias may manifest during ML model development and share initial ideas for mitigating them, including recognizing that new statistical and algorithmic procedures need to be developed. We also discuss how this framework clarifies MLMB but does not reduce the complexity of the issue.
Collapse
Affiliation(s)
- Louis Tay
- Department of Psychological Sciences, Purdue University, West Lafayette, Indiana
| | - Sang Eun Woo
- Department of Psychological Sciences, Purdue University, West Lafayette, Indiana
| | - Louis Hickman
- The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Brandon M. Booth
- Institute of Cognitive Science, University of Colorado Boulder, Boulder, Colorado
| | - Sidney D’Mello
- Institute of Cognitive Science, University of Colorado Boulder, Boulder, Colorado
| |
Collapse
|
26
|
Turnwald BP, Perry MA, Jurgens D, Prabhakaran V, Jurafsky D, Markus HR, Crum AJ. Language in popular American culture constructs the meaning of healthy and unhealthy eating: Narratives of craveability, excitement, and social connection in movies, television, social media, recipes, and food reviews. Appetite 2022; 172:105949. [PMID: 35090976 DOI: 10.1016/j.appet.2022.105949] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 12/31/2021] [Accepted: 01/21/2022] [Indexed: 11/02/2022]
Abstract
Many people want to eat healthier but struggle to do so, in part due to a dominant perception that healthy foods are at odds with hedonic goals. Is the perception that healthy foods are less appealing than unhealthy foods represented in language across popular entertainment media and social media? Six studies analyzed dialogue about food in six cultural products - creations of a culture that reflect its perspectives - including movies, television, social media posts, food recipes, and food reviews. In Study 1 (N = 617 movies) and Study 2 (N = 27 television shows), healthy foods were described with fewer appealing descriptions (e.g., "couldn't stop eating"; d = 0.59 and d = 0.37, respectively) and more unappealing descriptions (e.g., "I hate peas"; d = -.57 and d = -.63, respectively) than unhealthy foods in characters' speech from the film and television industries. Using sources with richer descriptive language, Studies 3-6 analyzed popular American restaurants' Facebook posts (Study 3, N = 2275), recipe descriptions from Allrecipes.com (Study 4, N = 1000), Yelp reviews from six U.S. cities (Study 5, N = 4403), and Twitter tweets (Study 6, N = 10,000) for seven specific themes. Meta-analytic results across Studies 3-6 showed that healthy foods were specifically described as less craveworthy (d = 0.51, 95% CI: 0.44-0.59), less exciting (d = 0.40, 95% CI: 0.31-0.49), and less social (d = 0.36, 95% CI: 0.04-0.68) than unhealthy foods. Machine learning methods further generalized patterns across 1.6 million tweets spanning 42 different foods representing a range of nutritional quality. These data suggest that strategies to encourage healthy choices must counteract pervasive narratives that dissociate healthy foods from craveability, excitement, and social connection in individuals' everyday lives.
Collapse
Affiliation(s)
- Bradley P Turnwald
- University of Chicago Booth School of Business, USA; Stanford University, Department of Psychology, USA.
| | | | | | | | - Dan Jurafsky
- Stanford University, Department of Computer Science and Department of Linguistics, USA
| | | | - Alia J Crum
- Stanford University, Department of Psychology, USA
| |
Collapse
|
27
|
Koch TK, Romero P, Stachl C. Age and gender in language, emoji, and emoticon usage in instant messages. COMPUTERS IN HUMAN BEHAVIOR 2022. [DOI: 10.1016/j.chb.2021.106990] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
28
|
Bettis AH, Burke TA, Nesi J, Liu RT. Digital Technologies for Emotion-Regulation Assessment and Intervention: A Conceptual Review. Clin Psychol Sci 2022; 10:3-26. [PMID: 35174006 PMCID: PMC8846444 DOI: 10.1177/21677026211011982] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
The ability to regulate emotions in response to stress is central to healthy development. While early research in emotion regulation predominantly employed static, self-report measurement, the past decade has seen a shift in focus toward understanding the dynamic nature of regulation processes. This is reflected in recent refinements in the definition of emotion regulation, which emphasize the importance of the ability to flexibly adapt regulation efforts across contexts. The latest proliferation of digital technologies employed in mental health research offers the opportunity to capture the state- and context-sensitive nature of emotion regulation. In this conceptual review, we examine the use of digital technologies (ecological momentary assessment; wearable and smartphone technology, physical activity, acoustic data, visual data, and geo-location; smart home technology; virtual reality; social media) in the assessment of emotion regulation and describe their application to interventions. We also discuss challenges and ethical considerations, and outline areas for future research.
Collapse
Affiliation(s)
| | | | | | - Richard T Liu
- Harvard Medical School
- Massachusetts General Hospital
| |
Collapse
|
29
|
Batzdorfer V, Steinmetz H, Biella M, Alizadeh M. Conspiracy theories on Twitter: emerging motifs and temporal dynamics during the COVID-19 pandemic. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2021; 13:315-333. [PMID: 34977334 PMCID: PMC8703214 DOI: 10.1007/s41060-021-00298-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 12/01/2021] [Indexed: 11/29/2022]
Abstract
The COVID-19 pandemic resulted in an upsurge in the spread of diverse conspiracy theories (CTs) with real-life impact. However, the dynamics of user engagement remain under-researched. In the present study, we leverage Twitter data across 11 months in 2020 from the timelines of 109 CT posters and a comparison group (non-CT group) of equal size. Within this approach, we used word embeddings to distinguish non-CT content from CT-related content as well as analysed which element of CT content emerged in the pandemic. Subsequently, we applied time series analyses on the aggregate and individual level to investigate whether there is a difference between CT posters and non-CT posters in non-CT tweets as well as the temporal dynamics of CT tweets. In this regard, we provide a description of the aggregate and individual series, conducted a STL decomposition in trends, seasons, and errors, as well as an autocorrelation analysis, and applied generalised additive mixed models to analyse nonlinear trends and their differences across users. The narrative motifs, characterised by word embeddings, address pandemic-specific motifs alongside broader motifs and can be related to several psychological needs (epistemic, existential, or social). Overall, the comparison of the CT group and non-CT group showed a substantially higher level of overall COVID-19-related tweets in the non-CT group and higher level of random fluctuations. Focussing on conspiracy tweets, we found a slight positive trend but, more importantly, an increase in users in 2020. Moreover, the aggregate series of CT content revealed two breaks in 2020 and a significant albeit weak positive trend since June. On the individual level, the series showed strong differences in temporal dynamics and a high degree of randomness and day-specific sensitivity. The results stress the importance of Twitter as a means of communication during the pandemic and illustrate that these beliefs travel very fast and are quickly endorsed. Supplementary Information The online version contains supplementary material available at 10.1007/s41060-021-00298-6.
Collapse
Affiliation(s)
- Veronika Batzdorfer
- Computational Social Science, GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
| | | | - Marco Biella
- Department of Psychology, Eberhard Karls Universität Tuebingen, Tuebingen, Germany
| | - Meysam Alizadeh
- Kennedy School of Government, Harvard University, Cambridge, USA
| |
Collapse
|
30
|
Mahmic S, Kern ML, Janson A. Identifying and Shifting Disempowering Paradigms for Families of Children With Disability Through a System Informed Positive Psychology Approach. Front Psychol 2021; 12:663640. [PMID: 35002821 PMCID: PMC8734639 DOI: 10.3389/fpsyg.2021.663640] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 11/11/2021] [Indexed: 11/13/2022] Open
Abstract
Despite the emergence of socio-ecological, strength-based, and capacity-building approaches, care for children with disability remains primarily grounded in a deficit-based perspective. Diagnoses and interventions primarily focus on what children and families cannot do, rather than what might be possible, often undermining the competence, mental health, and functioning of both the children and their families. We first critically examine typical approaches to disability care for families of young children, describe the importance of a systems-informed positive psychology (SIPP) approach to care, and identify the existence of two dominant paradigms, disability is a disadvantage and experts know best. Then, we present a case study investigating families’ experiences with these two paradigms and whether shifts to alternative perspectives could occur through participation in a SIPP-based program co-designed by professionals and families. Of program participants, nine parents and five early intervention professionals participated in two separate focus groups, and ten e-books were randomly selected for review. Thematic analysis of the e-books and focus group data identified two primary themes representing alternative perspectives that arose through the intervention: we will start with our strengths and we’ve got this. Participant comments indicated that they developed a greater sense of hope, empowerment, engagement, and wellbeing, enabled by embedding wellbeing concepts and practices in their routines and communications with their children. We suggest that benefits arose in part from the structure of the program and the development of wellbeing literacy in participants. While care needs to be taken in generalizing the results, the case study provides clear examples of shifts in perspectives that occurred and suggests that the incorporation of SIPP principles within early intervention approaches provides a potential pathway for shifting the problematic paradigms that dominate disability care.
Collapse
Affiliation(s)
- Sylvana Mahmic
- Plumtree Children’s Services, Sydney, NSW, Australia
- School of Education, Western Sydney University, Sydney, NSW, Australia
| | - Margaret L. Kern
- Centre for Wellbeing Science, Melbourne Graduate School of Education, The University of Melbourne, Melbourne, VIC, Australia
- *Correspondence: Margaret L. Kern,
| | - Annick Janson
- Centre for Cross Cultural Research, Victoria University of Wellington, Wellington, New Zealand
| |
Collapse
|
31
|
Oltmanns JR, Schwartz HA, Ruggero C, Son Y, Miao J, Waszczuk M, Clouston SAP, Bromet EJ, Luft BJ, Kotov R. Artificial intelligence language predictors of two-year trauma-related outcomes. J Psychiatr Res 2021; 143:239-245. [PMID: 34509091 PMCID: PMC8935804 DOI: 10.1016/j.jpsychires.2021.09.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 06/29/2021] [Accepted: 09/01/2021] [Indexed: 11/19/2022]
Abstract
BACKGROUND Recent research on artificial intelligence has demonstrated that natural language can be used to provide valid indicators of psychopathology. The present study examined artificial intelligence-based language predictors (ALPs) of seven trauma-related mental and physical health outcomes in responders to the World Trade Center disaster. METHODS The responders (N = 174, Mage = 55.4 years) provided daily voicemail updates over 14 days. Algorithms developed using machine learning in large social media discovery samples were applied to the voicemail transcriptions to derive ALP scores for several risk factors (depressivity, anxiousness, anger proneness, stress, and personality). Responders also completed self-report assessments of these risk factors at baseline and trauma-related mental and physical health outcomes at two-year follow-up (including symptoms of depression, posttraumatic stress disorder, sleep disturbance, respiratory problems, and GERD). RESULTS Voicemail ALPs were significantly associated with a majority of the trauma-related outcomes at two-year follow-up, over and above corresponding baseline self-reports. ALPs showed significant convergence with corresponding self-report scales, but also considerable uniqueness from each other and from self-report scales. LIMITATIONS The study has a relatively short follow-up period relative to trauma occurrence and a limited sample size. CONCLUSIONS This study shows evidence that ALPs may provide a novel, objective, and clinically useful approach to forecasting, and may in the future help to identify individuals at risk for negative health outcomes.
Collapse
|
32
|
Ashokkumar A, Pennebaker JW. Social media conversations reveal large psychological shifts caused by COVID-19's onset across U.S. cities. SCIENCE ADVANCES 2021; 7:eabg7843. [PMID: 34550738 PMCID: PMC8457655 DOI: 10.1126/sciadv.abg7843] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 07/28/2021] [Indexed: 05/11/2023]
Abstract
The current research chronicles the unfolding of the early psychological impacts of coronavirus disease 2019 (COVID-19) by analyzing Reddit language from 18 U.S. cities (200,000+ people) and large-scale survey data (11,000+ people). Large psychological shifts were found reflecting three distinct phases. When COVID-19 warnings first emerged (“warning phase”), people’s attentional focus switched to the impending threat. Anxiety levels surged, and positive emotion and anger dropped. In parallel, people’s thinking became more intuitive rather than analytic. When lockdowns began (“isolation phase”), analytic thinking dropped further. People became sadder, and their thinking reflected attempts to process the uncertainty. Familial ties strengthened, but ties to broader social groups weakened. Six weeks after COVID-19’s onset (“normalization phase”), people’s psychological states stabilized but remained elevated. Most psychological shifts were stronger when the threat of COVID-19 was greater. The magnitude of the observed shifts dwarfed responses to other events that occurred in the previous decade.
Collapse
Affiliation(s)
- Ashwini Ashokkumar
- Department of Psychology, University of Texas Austin, 108 E. Dean Keeton, Austin, TX 78712-0187, USA
| | | |
Collapse
|
33
|
Giorgi S, Nguyen KL, Eichstaedt JC, Kern ML, Yaden DB, Kosinski M, Seligman MEP, Ungar LH, Schwartz HA, Park G. Regional personality assessment through social media language. J Pers 2021; 90:405-425. [PMID: 34536229 DOI: 10.1111/jopy.12674] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 08/26/2021] [Accepted: 09/12/2021] [Indexed: 11/30/2022]
Abstract
OBJECTIVE We explore the personality of counties as assessed through linguistic patterns on social media. Such studies were previously limited by the cost and feasibility of large-scale surveys; however, language-based computational models applied to large social media datasets now allow for large-scale personality assessment. METHOD We applied a language-based assessment of the five factor model of personality to 6,064,267 U.S. Twitter users. We aggregated the Twitter-based personality scores to 2,041 counties and compared to political, economic, social, and health outcomes measured through surveys and by government agencies. RESULTS There was significant personality variation across counties. Openness to experience was higher on the coasts, conscientiousness was uniformly spread, extraversion was higher in southern states, agreeableness was higher in western states, and emotional stability was highest in the south. Across 13 outcomes, language-based personality estimates replicated patterns that have been observed in individual-level and geographic studies. This includes higher Republican vote share in less agreeable counties and increased life satisfaction in more conscientious counties. CONCLUSIONS Results suggest that regions vary in their personality and that these differences can be studied through computational linguistic analysis of social media. Furthermore, these methods may be used to explore other psychological constructs across geographies.
Collapse
Affiliation(s)
- Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Khoa Le Nguyen
- Department Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Johannes C Eichstaedt
- Department of Psychology, Institute for Human-Centered A.I., Stanford University, Stanford, California, USA
| | - Margaret L Kern
- Melbourne Graduate School of Education, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Yaden
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Michal Kosinski
- Graduate School of Business, Stanford University, Stanford, California, USA
| | - Martin E P Seligman
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, New York, USA
| | - Gregory Park
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
34
|
Côté M, Lamarche B. Artificial intelligence in nutrition research: perspectives on current and future applications. Appl Physiol Nutr Metab 2021; 47:1-8. [PMID: 34525321 DOI: 10.1139/apnm-2021-0448] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Artificial intelligence (AI) is a rapidly evolving area that offers unparalleled opportunities of progress and applications in many healthcare fields. In this review, we provide an overview of the main and latest applications of AI in nutrition research and identify gaps to address to potentialize this emerging field. AI algorithms may help better understand and predict the complex and non-linear interactions between nutrition-related data and health outcomes, particularly when large amounts of data need to be structured and integrated, such as in metabolomics. AI-based approaches, including image recognition, may also improve dietary assessment by maximizing efficiency and addressing systematic and random errors associated with self-reported measurements of dietary intakes. Finally, AI applications can extract, structure and analyze large amounts of data from social media platforms to better understand dietary behaviours and perceptions among the population. In summary, AI-based approaches will likely improve and advance nutrition research as well as help explore new applications. However, further research is needed to identify areas where AI does deliver added value compared with traditional approaches, and other areas where AI is simply not likely to advance the field. Novelty: Artificial intelligence offers unparalleled opportunities of progress and applications in nutrition. There remain gaps to address to potentialize this emerging field.
Collapse
Affiliation(s)
- Mélina Côté
- Centre de recherche Nutrition, santé et société (NUTRISS), INAF, Université Laval, Québec, QC, Canada
- School of Nutrition, Université Laval, Québec, QC, Canada
| | - Benoît Lamarche
- Centre de recherche Nutrition, santé et société (NUTRISS), INAF, Université Laval, Québec, QC, Canada
- School of Nutrition, Université Laval, Québec, QC, Canada
| |
Collapse
|
35
|
Dudău DP, Sava FA. Performing Multilingual Analysis With Linguistic Inquiry and Word Count 2015 (LIWC2015). An Equivalence Study of Four Languages. Front Psychol 2021; 12:570568. [PMID: 34322047 PMCID: PMC8311520 DOI: 10.3389/fpsyg.2021.570568] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 06/18/2021] [Indexed: 11/13/2022] Open
Abstract
Today, there is a range of computer-aided techniques to convert text into data. However, they convey not only strengths but also vulnerabilities compared to traditional content analysis. One of the challenges that have gained increasing attention is performing automatic language analysis to make sound inferences in a multilingual assessment setting. The current study is the first to test the equivalence of multiple versions of one of the most appealing and widely used lexicon-based tools worldwide, Linguistic Inquiry and Word Count 2015 (LIWC2015). For this purpose, we employed supervised learning in a classification problem and computed Pearson's correlations and intraclass correlation coefficients on a large corpus of parallel texts in English, Dutch, Brazilian Portuguese, and Romanian. Our findings suggested that LIWC2015 is a valuable tool for multilingual analysis, but within-language standardization is needed when the aim is to analyze texts sourced from different languages.
Collapse
Affiliation(s)
| | - Florin Alin Sava
- Department of Psychology, West University of Timisoara, Timisoara, Romania
| |
Collapse
|
36
|
Shim Y, Scotney VS, Tay L. Conducting mobile-enabled ecological momentary intervention research in positive psychology: key considerations and recommended practices. THE JOURNAL OF POSITIVE PSYCHOLOGY 2021. [DOI: 10.1080/17439760.2021.1913642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Yerin Shim
- Department of Psychology, Chungnam National University, Daejeon, South Korea
| | | | - Louis Tay
- Department of Psychological Sciences, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
37
|
Jensen M, Hussong A. Text message content as a window into college student drinking: Development and initial validation of a dictionary of "alcohol talk". INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT 2021; 45:3-10. [PMID: 33456098 DOI: 10.1177/0165025419889175] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The ubiquity of digital communication within the high-risk drinking environment of college students raises exciting new directions for prevention research. However, we are lacking relevant constructs and tools to analyze digital platforms that serve to facilitate, discuss, and rehash alcohol use. In the current study, we introduce the construct of alcohol-talk (or the extent to which college students use alcohol-related words in text messaging exchanges) as well as introduce and validate a novel tool for measuring this construct. We describe a closed-vocabulary, dictionary-based method for assessing alcohol-talk. Analyses of 569,172 text messages from 267 college students indicate that this method produces a reliable and valid measure that correlates as expected with self-reported alcohol and related risk constructs. We discuss the potential utility of this method for prevention studies.
Collapse
Affiliation(s)
- Michaeline Jensen
- University of North Carolina at Greensboro, Department of Psychology, 296 Eberhart Bldg, PO Box 26170, Greensboro, NC 27412-5001
| | | |
Collapse
|
38
|
Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality. PROCEEDINGS OF THE CONFERENCE. ASSOCIATION FOR COMPUTATIONAL LINGUISTICS. NORTH AMERICAN CHAPTER. MEETING 2021; 2021:4515-4532. [PMID: 34296226 DOI: 10.18653/v1/2021.naacl-main.357] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components analysis, factorization techniques, or multi-layer auto-encoders) as well as the dimensionality of embedding vectors and sample sizes as a function of predictive performance. We first find that fine-tuning large models with a limited amount of data pose a significant difficulty which can be overcome with a pre-trained dimension reduction regime. RoBERTa consistently achieves top performance in human-level tasks, with PCA giving benefit over other reduction methods in better handling users that write longer texts. Finally, we observe that a majority of the tasks achieve results comparable to the best performance with just 1 12 of the embedding dimensions.
Collapse
|
39
|
Bittermann A, Batzdorfer V, Müller SM, Steinmetz H. Mining Twitter to Detect Hotspots in Psychology. ZEITSCHRIFT FUR PSYCHOLOGIE-JOURNAL OF PSYCHOLOGY 2021. [DOI: 10.1027/2151-2604/a000437] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Abstract. For identifying psychological hotspot topics, a mere focus on bibliometric data suffers from a publication delay. To overcome this issue, we introduce Twitter mining of ongoing online communication among scientists for the detection of psychological research topics. Specifically, we collected the entire 69,963 tweets posted between August 2007 and July 2020 from 139 accounts of psychology professors, departments, and research institutes from the German-speaking countries, as well as sections of the German Psychological Society (DGPs). To examine whether Twitter topics are hotspots in terms of indicating future publication trends, 346,361 references in the PSYNDEX database were extracted. For determining the additional value of our approach in contrast to traditional conference analysis, we gathered all available conference programs of the DGPs and its sections since 2010 and compared dates of topic emergence. Results revealed 21 topics addressing societal issues (e.g., COVID-19), methodology (e.g., machine learning), scientific research (e.g., replication crisis), and different areas of psychological research. Ten topics indicated an increasing publication trend, particularly topics related to methodology or scientific transparency. Seven Twitter topics emerged earlier on Twitter than at conferences. A total of four topics could be expected neither by bibliometric forecasting nor conference contents: “methodological issues in meta-analyses”, “playfulness”, “preregistration”, and “mobile brain/body imaging”. Taken together, Twitter mining is a worthwhile endeavor for identifying psychological hotspot topics, especially regarding societal issues, novel research methods, and research transparency in psychology. In order to get the most comprehensive picture of research hotspots, Twitter mining is recommended in addition to bibliometric analyses of publication trends and monitoring of conference topics.
Collapse
|
40
|
School Values: A Comparison of Academic Motivation, Mental Health Promotion, and School Belonging With Student Achievement. THE EDUCATIONAL AND DEVELOPMENTAL PSYCHOLOGIST 2020. [DOI: 10.1017/edp.2017.5] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
41
|
Hickman L, Thapa S, Tay L, Cao M, Srinivasan P. Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations. ORGANIZATIONAL RESEARCH METHODS 2020. [DOI: 10.1177/1094428120971683] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Recent advances in text mining have provided new methods for capitalizing on the voluminous natural language text data created by organizations, their employees, and their customers. Although often overlooked, decisions made during text preprocessing affect whether the content and/or style of language are captured, the statistical power of subsequent analyses, and the validity of insights derived from text mining. Past methodological articles have described the general process of obtaining and analyzing text data, but recommendations for preprocessing text data were inconsistent. Furthermore, primary studies use and report different preprocessing techniques. To address this, we conduct two complementary reviews of computational linguistics and organizational text mining research to provide empirically grounded text preprocessing decision-making recommendations that account for the type of text mining conducted (i.e., open or closed vocabulary), the research question under investigation, and the data set’s characteristics (i.e., corpus size and average document length). Notably, deviations from these recommendations will be appropriate and, at times, necessary due to the unique characteristics of one’s text data. We also provide recommendations for reporting text mining to promote transparency and reproducibility.
Collapse
Affiliation(s)
- Louis Hickman
- Purdue University College of Health and Human Sciences, West Lafayette, IN, USA
| | - Stuti Thapa
- Purdue University College of Health and Human Sciences, West Lafayette, IN, USA
| | - Louis Tay
- Purdue University College of Health and Human Sciences, West Lafayette, IN, USA
| | | | | |
Collapse
|
42
|
Pongiglione B, Kern ML, Carpentieri JD, Schwartz HA, Gupta N, Goodman A. Do children's expectations about future physical activity predict their physical activity in adulthood? Int J Epidemiol 2020; 49:1749-1758. [PMID: 33011758 PMCID: PMC7746399 DOI: 10.1093/ije/dyaa131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 06/26/2020] [Indexed: 11/20/2022] Open
Abstract
Background Much of the population fails to meet recommended physical activity (PA) levels, but there remains considerable individual variation. By understanding drivers of different trajectories, interventions can be better targeted and more effective. One such driver may be a person’s physical activity identity (PAI)—the extent to which a person perceives PA as central to who they are. Methods Using survey information and a unique body of essays written at age 11 from the National Child Development Study (N = 10 500), essays mentioning PA were automatically identified using the machine learning technique support vector classification and PA trajectories were estimated using latent class analysis. Analyses tested the extent to which childhood PAI correlated with activity levels from age 23 through 55 and with trajectories across adulthood. Results 42.2% of males and 33.5% of females mentioned PA in their essays, describing active and/or passive engagement. Active PAI in childhood was correlated with higher levels of activity for men but not women, and was correlated with consistently active PA trajectories for both genders. Passive PAI was not related to PA for either gender. Conclusions This study offers a novel approach for analysing large qualitative datasets to assess identity and behaviours. Findings suggest that at as young as 11 years old, the way a young person conceptualizes activity as part of their identity has a lasting association with behaviour. Still, an active identity may require a supportive sociocultural context to manifest in subsequent behaviour.
Collapse
Affiliation(s)
- Benedetta Pongiglione
- Centre for Research on Health and Social Care Management, Bocconi University, Milan, Italy.,UCL Institute of Education, University College London, London, UK
| | - Margaret L Kern
- Melbourne Graduate School of Education, University of Melbourne, Melbourne, VIC, Australia
| | - J D Carpentieri
- UCL Institute of Education, University College London, London, UK
| | - H Andrew Schwartz
- Computer Science Department, Stony Brook University, Stony Brook, NY, USA
| | - Neelaabh Gupta
- Computer Science Department, Stony Brook University, Stony Brook, NY, USA
| | - Alissa Goodman
- UCL Institute of Education, University College London, London, UK
| |
Collapse
|
43
|
Fleming MN. Considerations for the Ethical Implementation of Psychological Assessment Through Social Media via Machine Learning. ETHICS & BEHAVIOR 2020; 31:181-192. [PMID: 34248317 DOI: 10.1080/10508422.2020.1817026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The ubiquity of social media usage has led to exciting new technologies such as machine learning. Machine learning is poised to change many fields of health, including psychology. The wealth of information provided by each social media user in combination with machine learning technologies may pave the way for automated psychological assessment and diagnosis. Assessment of individuals' social media profiles using machine learning technologies for diagnosis and screening confers many benefits (i.e., time and cost efficiency, reduced recall bias, information about an individual's emotions and functioning spanning months or years, etc.); however the implementation of these technologies will pose unique challenges to the professional ethics of psychology. Namely, psychologists must understand the impact of these assessment technologies on privacy and confidentiality, informed consent, recordkeeping, bases for assessments, and diversity and justice. This paper offers a brief review of the current applications of machine learning technologies in psychology and public health, provides an overview of potential implementations in clinical settings, and introduces ethical considerations for professional psychologists. This paper presents considerations which may aid in the extension of the current Ethical Principles of Psychologists and Code of Conduct to address these important technological advancements in the field of clinical psychology.
Collapse
Affiliation(s)
- Megan N Fleming
- Department of Psychological Sciences, University of Missouri - Columbia
| |
Collapse
|
44
|
Glenn JJ, Nobles AL, Barnes LE, Teachman BA. Can Text Messages Identify Suicide Risk in Real Time? A Within-Subjects Pilot Examination of Temporally Sensitive Markers of Suicide Risk. Clin Psychol Sci 2020; 8:704-722. [PMID: 35692890 PMCID: PMC9186807 DOI: 10.1177/2167702620906146] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
Objective tools to assess suicide risk are needed to determine when someone is at imminent risk. This pilot laboratory investigation utilized a within-subjects design to identify patterns in text messaging (SMS) unique to high-risk periods preceding suicide attempts. Individuals reporting a history of suicide attempt (N=33) retrospectively identified past attempts and periods of lower risk (e.g., suicide ideation). Language analysis software scored 189,478 text messages to capture three psychological constructs: self-focus, sentiment, and social engagement. Mixed-effects models tested whether these constructs differed in general (means) and over time (slopes) two weeks before a suicide attempt, relative to lower-risk periods. Regarding mean differences, no language features uniquely differentiated suicide attempts from other episodes. However, when examining patterns over time, anger increased and positive emotion decreased to a greater extent as one approached a suicide attempt. Results suggest private electronic communication has the potential to provide real-time digital markers of suicide risk.
Collapse
Affiliation(s)
- Jeffrey J. Glenn
- University of Virginia
- Durham Veterans Affairs Health Care System
- VA Mid-Atlantic Mental Illness Research, Education and Clinical Center (VISN 6 MIRECC)
| | | | | | | |
Collapse
|
45
|
Abstract
Zusammenfassung. Die Erforschung extremistischer Radikalisierung hat durch digitale Verhaltensspurdaten, wie z. B. Social-Media-Posts oder öffentlich zugänglichen Medien, einen neuen Auftrieb erfahren. Vor dem Hintergrund, dass Big Data als „epistemologische Revolution“ angesehen wird, liefert die vorliegende systematische Literaturübersicht einen Überblick darüber, (i) welche Ziele, Datenquellen und Methoden im Rahmen von Spurdatenstudien in der Radikalisierungsforschung gewählt werden, illustriert exemplarisch einige Ergebnisse dieser Studien und (ii) analysiert welche Gemeinsamkeiten und Unterschiede zu traditionellen Studien wie Fragebogen- oder Experimentalstudien bestehen. Grundlage für den Überblick liefern 63 Studien, von denen allerdings nur eine geringe Anzahl (k = 18) digitale Verhaltensspurdaten nutzten, während der Großteil aus traditionellen Zugängen (k = 52) besteht. Die Ergebnisse zeigen, dass Spurdatenstudien größtenteils darauf abzielten, Personen mit radikalen Einstellungen zu identifizieren und die Entwicklung radikaler Ansichten vorherzusagen. Insgesamt eröffnen sich durch Verhaltensspurdaten bisher ungenutzte Potentiale für die Analyse von Persönlichkeitsprofilen und die Untersuchung dynamischer sozialer Interaktionen derjenigen, die anfällig für extremistische Rekrutierung sind.Eine englische Übersetzung als Rohfassung dieses Artikels finden Sie als Elektronisches Supplement 1.
Collapse
Affiliation(s)
- Veronika Batzdorfer
- Leibniz-Zentrum für Psychologische Information und Dokumentation (ZPID), Trier
| | - Holger Steinmetz
- Leibniz-Zentrum für Psychologische Information und Dokumentation (ZPID), Trier
| | - Michael Bosnjak
- Leibniz-Zentrum für Psychologische Information und Dokumentation (ZPID), Trier
| |
Collapse
|
46
|
Hoover J, Portillo-Wightman G, Yeh L, Havaldar S, Davani AM, Lin Y, Kennedy B, Atari M, Kamel Z, Mendlen M, Moreno G, Park C, Chang TE, Chin J, Leong C, Leung JY, Mirinjian A, Dehghani M. Moral Foundations Twitter Corpus: A Collection of 35k Tweets Annotated for Moral Sentiment. SOCIAL PSYCHOLOGICAL AND PERSONALITY SCIENCE 2020. [DOI: 10.1177/1948550619876629] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Research has shown that accounting for moral sentiment in natural language can yield insight into a variety of on- and off-line phenomena such as message diffusion, protest dynamics, and social distancing. However, measuring moral sentiment in natural language is challenging, and the difficulty of this task is exacerbated by the limited availability of annotated data. To address this issue, we introduce the Moral Foundations Twitter Corpus, a collection of 35,108 tweets that have been curated from seven distinct domains of discourse and hand annotated by at least three trained annotators for 10 categories of moral sentiment. To facilitate investigations of annotator response dynamics, we also provide psychological and demographic metadata for each annotator. Finally, we report moral sentiment classification baselines for this corpus using a range of popular methodologies.
Collapse
Affiliation(s)
- Joe Hoover
- University of Southern California, Los Angeles, CA, USA
| | | | - Leigh Yeh
- University of Southern California, Los Angeles, CA, USA
| | | | | | - Ying Lin
- Rensselaer Polytechnic Institute, Troy, NY, USA
| | | | | | - Zahra Kamel
- University of Southern California, Los Angeles, CA, USA
| | | | | | | | | | - Jenna Chin
- University of Southern California, Los Angeles, CA, USA
| | | | - Jun Yen Leung
- University of Southern California, Los Angeles, CA, USA
| | | | | |
Collapse
|
47
|
Oswald FL, Behrend TS, Putka DJ, Sinar E. Big Data in Industrial-Organizational Psychology and Human Resource Management: Forward Progress for Organizational Research and Practice. ANNUAL REVIEW OF ORGANIZATIONAL PSYCHOLOGY AND ORGANIZATIONAL BEHAVIOR 2020. [DOI: 10.1146/annurev-orgpsych-032117-104553] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Big data and artificial intelligence (AI) have become quite compelling—and relevant, ideally—to organizations and the consulting services that help manage them. Researchers and practitioners in industrial-organizational psychology (IOP) and human resource management (HRM) can add significant value to big data and AI by offering their substantive expertise in how workforce-relevant data are measured and analyzed and how big data results are professionally, legally, and ethically interpreted and implemented by organizational decision makers, employees, policymakers, and other stakeholders in the employment arena. This article provides a perspective and framework for big data relevant to IOP and HRM that include both micro issues (e.g., linking data sources, decisions about which data to include, big data analytics) and macro issues (e.g., changing nature of big data, developing big data teams, educating professionals and graduate students, ethical and legal considerations). Ultimately, we strongly believe that IOP and HRM researchers and practitioners will become increasingly valuable for their contributions to the substance, technologies, algorithms, and communities that address big data, AI, and machine learning problems and applications in organizations relevant to their expertise.
Collapse
Affiliation(s)
- Frederick L. Oswald
- Department of Psychological Sciences, Rice University, Houston, Texas 77005, USA
| | - Tara S. Behrend
- Department of Organizational Sciences and Communication, George Washington University, Washington, DC 20052, USA
| | - Dan J. Putka
- Human Resources Research Organization, Alexandria, Virginia 22314, USA
| | - Evan Sinar
- BetterUp, Pittsburgh, Pennsylvania 15243, USA
| |
Collapse
|
48
|
Abstract
In the present research, we investigated whether people's everyday language contains sufficient signal to predict the future occurrence of mental illness. Language samples were collected from the social media website Reddit, drawing on posts to discussion groups focusing on different kinds of mental illness (clinical subreddits), as well as on posts to discussion groups focusing on nonmental health topics (nonclinical subreddits). As expected, words drawn from the clinical subreddits could be used to distinguish several kinds of mental illness (ADHD, anxiety, bipolar disorder, and depression). Interestingly, words drawn from the nonclinical subreddits (e.g., travel, cooking, cars) could also be used to distinguish different categories of mental illness, implying that the impact of mental illness spills over into topics unrelated to mental illness. Most importantly, words derived from the nonclinical subreddits predicted future postings to clinical subreddits, implying that everyday language contains signal about the likelihood of future mental illness, possibly before people are aware of their mental health condition. Finally, whereas models trained on clinical subreddits learned to focus on words indicating disorder-specific symptoms, models trained to predict future mental illness learned to focus on words indicating life stress, suggesting that kinds of features that are predictive of mental illness may change over time. Implications for the underlying causes of mental illness are discussed.
Collapse
|
49
|
Exploring the association between problem drinking and language use on Facebook in young adults. Heliyon 2019; 5:e02523. [PMID: 31667380 PMCID: PMC6812202 DOI: 10.1016/j.heliyon.2019.e02523] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 08/26/2019] [Accepted: 09/23/2019] [Indexed: 11/21/2022] Open
Abstract
Recent literature suggests that variations in both formal and content aspects of texts shared on social media tend to reflect user-level differences in demographic, psychosocial, and behavioral characteristics. In the present study, we examined associations between language use on Facebook and problematic alcohol use. We collected texts shared on Facebook by a sample of 296 adult social media users (66.9% females; mean age = 28.44 years (SD = 7.38)). Texts were mined using the closed-vocabulary approach based on the Linguistic Inquiry Word Count (LIWC) semantic dictionary, and an open-vocabulary approach performed via Latent Dirichlet Allocation (LDA). Then, we examined associations between emerging textual features and alcohol-drinking scores as assessed using the AUDIT-C questionnaire. As a final aim, we employed the Random Forest machine-learning algorithm to determine and compare the predictive accuracy of closed- and open-vocabulary features over users' AUDIT-C scores. We found use of words about family, school, and positive feelings and emotions to be negatively associated with alcohol use and problematic drinking, while words suggesting interest in sport events, politics and economics, nightlife, and use of coarse language were more frequent among problematic drinkers. Results coming from LIWC and LDA analyses were quite similar, but LDA added information that could not be retrieved only with LIWC analysis. Furthermore, open-vocabulary features outperformed closed-vocabulary features in terms of predictive power over participants’ AUDIT-C scores (r = .46 vs. r = .28, respectively). Emerging relationships between text features and offline behaviors may have important implications for alcohol screening purposes in the online environment.
Collapse
|
50
|
Oldfield BJ, Becker WC. News Media Recommendations for Opioid Disposal: Keeping Flush with the Guidelines? PAIN MEDICINE 2019; 20:1645-1646. [PMID: 31216021 DOI: 10.1093/pm/pnz141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- Benjamin J Oldfield
- Department of Medicine, Yale School of Medicine, New Haven, Connecticut.,Department of Pediatrics, Yale School of Medicine, New Haven, Connecticut
| | - William C Becker
- Department of Medicine, Yale School of Medicine, New Haven, Connecticut.,Pain Research, Informatics, Multimorbidities & Education (PRIME) Center, VA Connecticut Healthcare System, West Haven, Connecticut, USA
| |
Collapse
|