1
|
Xu D, García GL, O'Connor K, Holston H, Klein AZ, Amaro IF, Scotch M, Gonzalez-Hernandez G. Mining Social Media Data for Influenza Vaccine Effectiveness Using a Large Language Model and Chain-of-Thought Prompting. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.03.26.25324701. [PMID: 40196289 PMCID: PMC11974990 DOI: 10.1101/2025.03.26.25324701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/09/2025]
Abstract
Influenza vaccine effectiveness (VE) estimation plays a critical role in public health decision-making by quantifying the real-world impact of vaccination campaigns and guiding policy adjustments. Current approaches to VE estimation are constrained by limited population representation, selection bias, and delayed reporting. To address some of these gaps, we propose leveraging large language models (LLMs) with few-shot chain-of-thought (CoT) prompting to mine social media data for real-time influenza VE estimation. We annotated over 4,000 tweets from the 2020-2021 flu season using structured guidelines, achieving high inter-annotator agreement. Our best prompting strategy achieves F1 scores above 87% for identifying influenza vaccination status and test outcomes, outperforming traditional supervised fine-tuning methods by large margins. These findings indicate that LLM-based prompting approaches effectively identify relevant social media information for influenza VE estimation, offering a valuable real-time surveillance tool that complements traditional epidemiological methods.
Collapse
Affiliation(s)
- Dongfang Xu
- Cedars-Sinai Medical Center, Los Angeles, CA
| | | | | | | | - Ari Z Klein
- University of Pennsylvania, Philadelphia, PA
| | | | | | | |
Collapse
|
2
|
Wiederhold BK. Parsing Platforms: Natural Language Processing and Public Mental Health. CYBERPSYCHOLOGY, BEHAVIOR AND SOCIAL NETWORKING 2024; 27:521-523. [PMID: 39021219 DOI: 10.1089/cyber.2024.0386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
|
3
|
Swilley-Martinez ME, Coles SA, Miller VE, Alam IZ, Fitch KV, Cruz TH, Hohl B, Murray R, Ranapurwala SI. "We adjusted for race": now what? A systematic review of utilization and reporting of race in American Journal of Epidemiology and Epidemiology, 2020-2021. Epidemiol Rev 2023; 45:15-31. [PMID: 37789703 PMCID: PMC12098948 DOI: 10.1093/epirev/mxad010] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 07/31/2023] [Accepted: 09/28/2023] [Indexed: 10/05/2023] Open
Abstract
Race is a social construct, commonly used in epidemiologic research to adjust for confounding. However, adjustment of race may mask racial disparities, thereby perpetuating structural racism. We conducted a systematic review of articles published in Epidemiology and American Journal of Epidemiology between 2020 and 2021 to (1) understand how race, ethnicity, and similar social constructs were operationalized, used, and reported; and (2) characterize good and poor practices of utilization and reporting of race data on the basis of the extent to which they reveal or mask systemic racism. Original research articles were considered for full review and data extraction if race data were used in the study analysis. We extracted how race was categorized, used-as a descriptor, confounder, or for effect measure modification (EMM)-and reported if the authors discussed racial disparities and systemic bias-related mechanisms responsible for perpetuating the disparities. Of the 561 articles, 299 had race data available and 192 (34.2%) used race data in analyses. Among the 160 US-based studies, 81 different racial categorizations were used. Race was most often used as a confounder (52%), followed by effect measure modifier (33%), and descriptive variable (12%). Fewer than 1 in 4 articles (22.9%) exhibited good practices (EMM along with discussing disparities and mechanisms), 63.5% of the articles exhibited poor practices (confounding only or not discussing mechanisms), and 13.5% were considered neither poor nor good practices. We discuss implications and provide 13 recommendations for operationalization, utilization, and reporting of race in epidemiologic and public health research.
Collapse
Affiliation(s)
- Monica E Swilley-Martinez
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599-7435, United States
- Injury Prevention Research Center, University of North Carolina, Chapel Hill, NC 27599, United States
| | - Serita A Coles
- Department of Health Behavior, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599-7440, United States
| | - Vanessa E Miller
- Injury Prevention Research Center, University of North Carolina, Chapel Hill, NC 27599, United States
| | - Ishrat Z Alam
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599-7435, United States
- Injury Prevention Research Center, University of North Carolina, Chapel Hill, NC 27599, United States
| | - Kate Vinita Fitch
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599-7435, United States
- Injury Prevention Research Center, University of North Carolina, Chapel Hill, NC 27599, United States
| | - Theresa H Cruz
- Prevention Research Center, Department of Pediatrics, Health Sciences Center, University of New Mexico, Albuquerque, NM 87131, United States
| | - Bernadette Hohl
- Penn Injury Science Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6021, United States
| | - Regan Murray
- Center for Public Health and Technology, Department of Health, Human Performance and Recreation, University of Arkansas, Fayetteville, AR 72701, United States
| | - Shabbar I Ranapurwala
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599-7435, United States
- Injury Prevention Research Center, University of North Carolina, Chapel Hill, NC 27599, United States
| |
Collapse
|
4
|
Stangl FJ, Riedl R, Kiemeswenger R, Montag C. Negative psychological and physiological effects of social networking site use: The example of Facebook. Front Psychol 2023; 14:1141663. [PMID: 37599719 PMCID: PMC10435997 DOI: 10.3389/fpsyg.2023.1141663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 05/03/2023] [Indexed: 08/22/2023] Open
Abstract
Social networking sites (SNS), with Facebook as a prominent example, have become an integral part of our daily lives and more than four billion people worldwide use SNS. However, the (over-)use of SNS also poses both psychological and physiological risks. In the present article, we review the scientific literature on the risk of Facebook (over-)use. Addressing this topic is critical because evidence indicates the development of problematic Facebook use ("Facebook addiction") due to excessive and uncontrolled use behavior with various psychological and physiological effects. We conducted a review to examine the scope, range, and nature of prior empirical research on the negative psychological and physiological effects of Facebook use. Our literature search process revealed a total of 232 papers showing that Facebook use is associated with eight major psychological effects (perceived anxiety, perceived depression, perceived loneliness, perceived eating disorders, perceived self-esteem, perceived life satisfaction, perceived insomnia, and perceived stress) and three physiological effects (physiological stress, human brain alteration, and affective experience state). The review also describes how Facebook use is associated with these effects and provides additional details on the reviewed literature, including research design, sample, age, and measures. Please note that the term "Facebook use" represents an umbrella term in the present work, and in the respective sections it will be made clear what kind of Facebook use is associated with a myriad of investigated psychological variables. Overall, findings indicate that certain kinds of Facebook use may come along with significant risks, both psychologically and physiologically. Based on our review, we also identify potential avenues for future research.
Collapse
Affiliation(s)
- Fabian J. Stangl
- Digital Business Institute, School of Business and Management, University of Applied Sciences Upper Austria, Steyr, Austria
| | - René Riedl
- Digital Business Institute, School of Business and Management, University of Applied Sciences Upper Austria, Steyr, Austria
- Institute of Business Informatics – Information Engineering, Johannes Kepler University Linz, Linz, Austria
| | - Roman Kiemeswenger
- Institute of Business Informatics – Information Engineering, Johannes Kepler University Linz, Linz, Austria
| | - Christian Montag
- Department of Molecular Psychology, Institute of Psychology and Education, Ulm University, Ulm, Germany
| |
Collapse
|
5
|
Guzman AA, Brecht ML, Doering LV, Macey PM, Mentes JC. Social Media Use and Depression in Older Adults: A Systematic Review. Res Gerontol Nurs 2023; 16:97-104. [PMID: 36944173 DOI: 10.3928/19404921-20230220-05] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Abstract
Social media has become an integral part of everyday life and revolutionized how older adults communicate and interact with others. The aim of the current review was to identify and synthesize quantitative studies addressing the potential relationship between social media use and depression in older adults. Medline, CINAHL, and PsycINFO databases were used to identify studies performed up to July 2020. Keywords identified were depression, social media use, and older adults. A nuanced relationship was revealed between social media use and depression in older adults. There were noted differences in the conceptualization of social media use. The reviewed studies lacked exploration of structural characteristics, examination of content, and quality of interactions in older adults' social media use. Health variables, social factors, and age cohort differences could influence the relationship between social media use and depression. Further studies are needed to enhance the understanding and explore the benefits and potential disadvantages of social media use in older adults. [Research in Gerontological Nursing, 16(2), 97-104.].
Collapse
|
6
|
James P, Trudel-Fitzgerald C, Lee HH, Koga HK, Kubzansky LD, Grodstein F. Linking Individual-Level Facebook Posts With Psychological and Health Data in an Epidemiological Cohort: Feasibility Study. JMIR Form Res 2022; 6:e32423. [PMID: 35389368 PMCID: PMC9030896 DOI: 10.2196/32423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 12/04/2021] [Accepted: 12/19/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Psychological factors (eg, depression) and related biological and behavioral responses are associated with numerous physical health outcomes. Most research in this area relies on self-reported assessments of psychological factors, which are difficult to scale because they may be expensive and time-consuming. Investigators are increasingly interested in using social media as a novel and convenient platform for obtaining information rapidly in large populations. OBJECTIVE We evaluated the feasibility of obtaining Facebook data from a large ongoing cohort study of midlife and older women, which may be used to assess psychological functioning efficiently with low cost. METHODS This study was conducted with participants in the Nurses' Health Study II (NHSII), which was initiated in 1989 with biennial follow-ups. Facebook does not share data readily; therefore, we developed procedures to enable women to download and transfer their Facebook data to cohort servers (for linkage with other study data they have provided). Since privacy is a critical concern when collecting individual-level data, we partnered with a third-party software developer, Digi.me, to enable participants to obtain their own Facebook data and to send it securely to our research team. In 2020, we invited a subset of the 18,519 NHSII participants (aged 56-73 years) via email to participate. Women were selected if they reported on the 2017-2018 questionnaire that they regularly posted on Facebook and were still active cohort participants. We included an exit survey for those who chose not to participate in order to gauge the reasons for nonparticipation. RESULTS We invited 309 women to participate. Few women signed the consent form (n=52), and only 3 used the Digi.me app to download and transfer their Facebook data. This low participation rate was observed despite modifying our protocol between waves of recruitment, including by (1) excluding active health care workers, who might be less available to participate due to the pandemic, (2) developing a Frequently Asked Questions factsheet to provide more information regarding the protocol, and (3) simplifying the instructions for using the Digi.me app. On our exit survey, the reasons most commonly reported for not participating were concerns regarding data privacy and hesitation sharing personal Facebook posts. The low participation rate suggests that obtaining individual-level Facebook data in a cohort of middle-aged and older women may be challenging. CONCLUSIONS In this cohort of midlife and older women who were actively participating for over three decades, we were largely unable to obtain permission to access individual-level data from participants' Facebook accounts. Despite working with a third-party developer to customize an app to implement safeguards for privacy, data privacy remained a key concern in these women. Future studies aiming to leverage individual-level social media data should explore alternate populations or means of sharing social media data.
Collapse
Affiliation(s)
- Peter James
- Division of Chronic Disease Research Across the Lifecourse, Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, United States
- Department of Environmental Health, Harvard TH Chan School of Public Health, Boston, MA, United States
| | - Claudia Trudel-Fitzgerald
- Department of Social and Behavioral Sciences, Harvard TH Chan School of Public Health, Boston, MA, United States
- Lee Kum Sheung Center for Health and Happiness, Harvard TH Chan School of Public Health, Boston, MA, United States
| | - Harold H Lee
- Department of Social and Behavioral Sciences, Harvard TH Chan School of Public Health, Boston, MA, United States
- Lee Kum Sheung Center for Health and Happiness, Harvard TH Chan School of Public Health, Boston, MA, United States
| | - Hayami K Koga
- Department of Social and Behavioral Sciences, Harvard TH Chan School of Public Health, Boston, MA, United States
| | - Laura D Kubzansky
- Department of Social and Behavioral Sciences, Harvard TH Chan School of Public Health, Boston, MA, United States
- Lee Kum Sheung Center for Health and Happiness, Harvard TH Chan School of Public Health, Boston, MA, United States
| | - Francine Grodstein
- Department of Internal Medicine, Rush Medical College, Chicago, IL, United States
- Rush Alzheimer's Disease Center, Rush University, Chicago, IL, United States
| |
Collapse
|
7
|
Quialheiro A, Figueiró TH, Rech CR, Marques LP, Paiva KMD, Xavier AJ, d'Orsi E. Can internet use reduce the incidence of cognitive impairment? Analysis of the EpiFloripa Aging Cohort Study (2009-2019). Prev Med 2022; 154:106904. [PMID: 34863810 DOI: 10.1016/j.ypmed.2021.106904] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 11/19/2021] [Accepted: 11/28/2021] [Indexed: 02/06/2023]
Abstract
This study aims to estimate the effect of internet use on the incidence of cognitive impairment in older adults. Data are from the EpiFloripa Aging Cohort Study which has been following a population-based sample of older adults (60+) residing in Florianópolis, southern Brazil, for ten years. The outcome was the incidence of cognitive decline in follow-up waves measured by the Mini-Mental State Examination using cutoff points according to education. The exposure was internet use according to wave (yes/no). We excluded individuals with cognitive impairment from Wave 1 (n = 453). We used a longitudinal analysis model (Generalized Estimating Equations) to estimate incidence rate ratios (IRR) with 95% confidence intervals. We estimated the risk of cognitive impairment in Wave 2 or Wave 3 according to internet use in the previous wave. The incidence of cognitive impairment was 13.4% in Wave 2 and 13.3% in Wave 3. Despite the aging of this cohort, the prevalence of internet users increased from 26.4% in Wave 1 to 32.8% in Wave 2 and 46.8% in Wave 3. The risk of cognitive impairment in Wave 2 or Wave 3 was 70% lower for older adults who used the internet in the previous wave, adjusted for sex, age, years of education, household income, and self-reported comorbidities (IRR = 0.30; 95% CI: 0.15-0.61; p = 0.001). Internet use was associated with a decline in the incidence of cognitive impairment among older adults living in the urban areas of southern Brazil after a period of ten years.
Collapse
Affiliation(s)
- Anna Quialheiro
- Life and Health Sciences Research Institute, Medical School, University of Minho, Portugal.
| | - Thamara Hubler Figueiró
- Post-Graduate Program in Public Health, Federal University of Santa Catarina, Florianópolis, SC, Brazil
| | - Cassiano Ricardo Rech
- Post-Graduate Program in Physical Education, Federal University of Santa Catarina, Florianópolis, SC, Brazil
| | - Larissa Pruner Marques
- Sergio Arouca National School of Public Health, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Karina Mary de Paiva
- Department of Speech, Language and Hearing, Federal University of Santa Catarina, Florianópolis, SC, Brazil
| | - André Junqueira Xavier
- Post-Graduate Program in Public Health, Federal University of Santa Catarina, Florianópolis, SC, Brazil; Medicine Course, University of Southern Santa Catarina, Palhoça, Brazil
| | - Eleonora d'Orsi
- Post-Graduate Program in Public Health, Federal University of Santa Catarina, Florianópolis, SC, Brazil
| |
Collapse
|
8
|
Baclic O, Tunis M, Young K, Doan C, Swerdfeger H, Schonfeld J. Challenges and opportunities for public health made possible by advances in natural language processing. CANADA COMMUNICABLE DISEASE REPORT = RELEVE DES MALADIES TRANSMISSIBLES AU CANADA 2020; 46:161-168. [PMID: 32673380 PMCID: PMC7343054 DOI: 10.14745/ccdr.v46i06a02] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Natural language processing (NLP) is a subfield of artificial intelligence devoted to understanding and generation of language. The recent advances in NLP technologies are enabling rapid analysis of vast amounts of text, thereby creating opportunities for health research and evidence-informed decision making. The analysis and data extraction from scientific literature, technical reports, health records, social media, surveys, registries and other documents can support core public health functions including the enhancement of existing surveillance systems (e.g. through faster identification of diseases and risk factors/at-risk populations), disease prevention strategies (e.g. through more efficient evaluation of the safety and effectiveness of interventions) and health promotion efforts (e.g. by providing the ability to obtain expert-level answers to any health related question). NLP is emerging as an important tool that can assist public health authorities in decreasing the burden of health inequality/inequity in the population. The purpose of this paper is to provide some notable examples of both the potential applications and challenges of NLP use in public health.
Collapse
Affiliation(s)
- Oliver Baclic
- Centre for Immunization and Respiratory Infectious Disease, Public Health Agency of Canada, Ottawa, ON
| | - Matthew Tunis
- Centre for Immunization and Respiratory Infectious Disease, Public Health Agency of Canada, Ottawa, ON
| | - Kelsey Young
- Centre for Immunization and Respiratory Infectious Disease, Public Health Agency of Canada, Ottawa, ON
| | - Coraline Doan
- Data, Partnerships and Innovation Hub, Public Health Agency of Canada, Ottawa, ON
| | - Howard Swerdfeger
- Data, Partnerships and Innovation Hub, Public Health Agency of Canada, Ottawa, ON
| | - Justin Schonfeld
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB
| |
Collapse
|
9
|
Jaidka K, Giorgi S, Schwartz HA, Kern ML, Ungar LH, Eichstaedt JC. Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods. Proc Natl Acad Sci U S A 2020; 117:10165-10171. [PMID: 32341156 PMCID: PMC7229753 DOI: 10.1073/pnas.1906364117] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. When users post on social media, they leave behind digital traces that reflect their thoughts and feelings. Aggregation of such digital traces may make it possible to monitor well-being at large scale. However, social media-based methods need to be robust to regional effects if they are to produce reliable estimates. Using a sample of 1.53 billion geotagged English tweets, we provide a systematic evaluation of word-level and data-driven methods for text analysis for generating well-being estimates for 1,208 US counties. We compared Twitter-based county-level estimates with well-being measurements provided by the Gallup-Sharecare Well-Being Index survey through 1.73 million phone surveys. We find that word-level methods (e.g., Linguistic Inquiry and Word Count [LIWC] 2015 and Language Assessment by Mechanical Turk [LabMT]) yielded inconsistent county-level well-being measurements due to regional, cultural, and socioeconomic differences in language use. However, removing as few as three of the most frequent words led to notable improvements in well-being prediction. Data-driven methods provided robust estimates, approximating the Gallup data at up to r = 0.64. We show that the findings generalized to county socioeconomic and health outcomes and were robust when poststratifying the samples to be more representative of the general US population. Regional well-being estimation from social media data seems to be robust when supervised data-driven methods are used.
Collapse
Affiliation(s)
- Kokil Jaidka
- Department of Communications and New Media, National University of Singapore, Singapore 117416;
- Centre for Trusted Internet and Community, National University of Singapore, Singapore 117416
| | - Salvatore Giorgi
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104
| | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794
| | - Margaret L Kern
- Melbourne Graduate School of Education, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104
| | - Johannes C Eichstaedt
- Department of Psychology, Stanford University, Stanford, CA 94305;
- Institute for Human-Centered Artificial Intelligence, Stanford University, Stanford, CA 94305
| |
Collapse
|