1
|
Ma L, Chen R, Ge W, Rogers P, Lyn-Cook B, Hong H, Tong W, Wu N, Zou W. AI-powered topic modeling: comparing LDA and BERTopic in analyzing opioid-related cardiovascular risks in women. Exp Biol Med (Maywood) 2025; 250:10389. [PMID: 40093658 PMCID: PMC11906279 DOI: 10.3389/ebm.2025.10389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Accepted: 01/16/2025] [Indexed: 03/19/2025] Open
Abstract
Topic modeling is a crucial technique in natural language processing (NLP), enabling the extraction of latent themes from large text corpora. Traditional topic modeling, such as Latent Dirichlet Allocation (LDA), faces limitations in capturing the semantic relationships in the text document although it has been widely applied in text mining. BERTopic, created in 2022, leveraged advances in deep learning and can capture the contextual relationships between words. In this work, we integrated Artificial Intelligence (AI) modules to LDA and BERTopic and provided a comprehensive comparison on the analysis of prescription opioid-related cardiovascular risks in women. Opioid use can increase the risk of cardiovascular problems in women such as arrhythmia, hypotension etc. 1,837 abstracts were retrieved and downloaded from PubMed as of April 2024 using three Medical Subject Headings (MeSH) words: "opioid," "cardiovascular," and "women." Machine Learning of Language Toolkit (MALLET) was employed for the implementation of LDA. BioBERT was used for document embedding in BERTopic. Eighteen was selected as the optimal topic number for MALLET and 23 for BERTopic. ChatGPT-4-Turbo was integrated to interpret and compare the results. The short descriptions created by ChatGPT for each topic from LDA and BERTopic were highly correlated, and the performance accuracies of LDA and BERTopic were similar as determined by expert manual reviews of the abstracts grouped by their predominant topics. The results of the t-SNE (t-distributed Stochastic Neighbor Embedding) plots showed that the clusters created from BERTopic were more compact and well-separated, representing improved coherence and distinctiveness between the topics. Our findings indicated that AI algorithms could augment both traditional and contemporary topic modeling techniques. In addition, BERTopic has the connection port for ChatGPT-4-Turbo or other large language models in its algorithm for automatic interpretation, while with LDA interpretation must be manually, and needs special procedures for data pre-processing and stop words exclusion. Therefore, while LDA remains valuable for large-scale text analysis with resource constraints, AI-assisted BERTopic offers significant advantages in providing the enhanced interpretability and the improved semantic coherence for extracting valuable insights from textual data.
Collapse
Affiliation(s)
- Li Ma
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
- Department of Information Science, University of Arkansas at Little Rock, Little Rock, AR, United States
| | - Ru Chen
- Office of New Drug, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, United States
| | - Weigong Ge
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Paul Rogers
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Beverly Lyn-Cook
- Division of Biochemical Toxicology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Ningning Wu
- Department of Information Science, University of Arkansas at Little Rock, Little Rock, AR, United States
| | - Wen Zou
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| |
Collapse
|
2
|
Kim J, Cai ZR, Chen ML, Rezaei SJ, Onyeka S, Rodriguez CI, Hernandez-Boussard T, Filkov V, Whitmer RA, Linos E, Choi YK. Mental health care needs of caregivers of people with Alzheimer's disease from online forum analysis. NPJ MENTAL HEALTH RESEARCH 2024; 3:54. [PMID: 39537826 PMCID: PMC11561247 DOI: 10.1038/s44184-024-00100-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 11/01/2024] [Indexed: 11/16/2024]
Abstract
Informal caregivers of people with Alzheimer's disease and related dementias (ADRD) are at risk of poor mental health. This study aimed to investigate the feasibility and validity of studying caregivers' mental stressors using online caregiving forum data (March 2018-February 2022) and natural language processing and machine learning (NLP/ML). NLP/ML topic modeling generated eight prominent topics, which we compared with qualitatively defined themes and the existing caregiving framework to assess validity. Among a total of 60,182 posts, 5848 were mental distress-related; for the ADRD patients (symptoms, medication, relocation, care duty share, diagnosis, conversation strategy) and the caregivers (caregiving burden and support). While we observed novel topics from NLP/ML-defined topics, mostly those were aligned with the existing framework. For feasibility assessment, qualitative title screening was done. The findings shed new light on the potential of NLP/ML text analysis of the online forum for informal caregivers to prepare tailored support for this vulnerable population.
Collapse
Affiliation(s)
- Jiyeong Kim
- Department of Medicine, Stanford Center for Digital Health, Stanford University, Stanford, CA, USA.
- Department of Dermatology, School of Medicine, Stanford University, Stanford, CA, USA.
| | - Zhuo Ran Cai
- Department of Medicine, Stanford Center for Digital Health, Stanford University, Stanford, CA, USA
- Department of Dermatology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Michael L Chen
- Department of Medicine, Stanford Center for Digital Health, Stanford University, Stanford, CA, USA
- Department of Dermatology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Shawheen J Rezaei
- Department of Medicine, Stanford Center for Digital Health, Stanford University, Stanford, CA, USA
- Department of Dermatology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Sonia Onyeka
- Department of Medicine, Stanford Center for Digital Health, Stanford University, Stanford, CA, USA
- Department of Dermatology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Carolyn I Rodriguez
- Department of Psychiatry and Behavioral Sciences, School of Medicine, Stanford University, Stanford, CA, USA
- Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, USA
| | | | - Vladimir Filkov
- Department of Computer Science, College of Engineering, University of California Davis, Davis, CA, USA
| | - Rachel A Whitmer
- Department of Public Health Sciences, School of Medicine, University of California, Davis, Davis, CA, USA
| | - Eleni Linos
- Department of Medicine, Stanford Center for Digital Health, Stanford University, Stanford, CA, USA
- Department of Dermatology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Yong K Choi
- Department of Health Information Management, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
3
|
Pérez-Pérez M, Fernandez Gonzalez M, Rodriguez-Rajo FJ, Fdez-Riverola F. Tracking the Spread of Pollen on Social Media Using Pollen-Related Messages From Twitter: Retrospective Analysis. J Med Internet Res 2024; 26:e58309. [PMID: 39432897 PMCID: PMC11535798 DOI: 10.2196/58309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 05/27/2024] [Accepted: 09/10/2024] [Indexed: 10/23/2024] Open
Abstract
BACKGROUND Allergy disorders caused by biological particles, such as the proteins in some airborne pollen grains, are currently considered one of the most common chronic diseases, and European Academy of Allergy and Clinical Immunology forecasts indicate that within 15 years 50% of Europeans will have some kind of allergy as a consequence of urbanization, industrialization, pollution, and climate change. OBJECTIVE The aim of this study was to monitor and analyze the dissemination of information about pollen symptoms from December 2006 to January 2022. By conducting a comprehensive evaluation of public comments and trends on Twitter, the research sought to provide valuable insights into the impact of pollen on sensitive individuals, ultimately enhancing our understanding of how pollen-related information spreads and its implications for public health awareness. METHODS Using a blend of large language models, dimensionality reduction, unsupervised clustering, and term frequency-inverse document frequency, alongside visual representations such as word clouds and semantic interaction graphs, our study analyzed Twitter data to uncover insights on respiratory allergies. This concise methodology enabled the extraction of significant themes and patterns, offering a deep dive into public knowledge and discussions surrounding respiratory allergies on Twitter. RESULTS The months between March and August had the highest volume of messages. The percentage of patient tweets appeared to increase notably during the later years, and there was also a potential increase in the prevalence of symptoms, mainly in the morning hours, indicating a potential rise in pollen allergies and related discussions on social media. While pollen allergy is a global issue, specific sociocultural, political, and economic contexts mean that patients experience symptomatology at a localized level, needing appropriate localized responses. CONCLUSIONS The interpretation of tweet information represents a valuable tool to take preventive measures to mitigate the impact of pollen allergy on sensitive patients to achieve equity in living conditions and enhance access to health information and services.
Collapse
Affiliation(s)
- Martín Pérez-Pérez
- CINBIO, Universidade de Vigo (University of Vigo), Vigo, Spain
- Department of Computer Science, School of Computer Engineering, Universidade de Vigo (University of Vigo), Ourense, Spain
- Next Generation Computer Systems Group, School of Computer Engineering, Galicia Sur Health Research Institute, Galician Health Service, SERGAS-UVIGO, Ourense, Spain
| | - María Fernandez Gonzalez
- Department of Plant Biology and Soil Sciences, Faculty of Sciences, Universidade de Vigo (University of Vigo), Ourense, Spain
| | - Francisco Javier Rodriguez-Rajo
- Department of Plant Biology and Soil Sciences, Faculty of Sciences, Universidade de Vigo (University of Vigo), Ourense, Spain
| | - Florentino Fdez-Riverola
- CINBIO, Universidade de Vigo (University of Vigo), Vigo, Spain
- Department of Computer Science, School of Computer Engineering, Universidade de Vigo (University of Vigo), Ourense, Spain
- Next Generation Computer Systems Group, School of Computer Engineering, Galicia Sur Health Research Institute, Galician Health Service, SERGAS-UVIGO, Ourense, Spain
| |
Collapse
|
4
|
Mapundu MT, Kabudula CW, Musenge E, Olago V, Celik T. Text mining of verbal autopsy narratives to extract mortality causes and most prevalent diseases using natural language processing. PLoS One 2024; 19:e0308452. [PMID: 39298425 DOI: 10.1371/journal.pone.0308452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 07/24/2024] [Indexed: 09/21/2024] Open
Abstract
Verbal autopsy (VA) narratives play a crucial role in understanding and documenting the causes of mortality, especially in regions lacking robust medical infrastructure. In this study, we propose a comprehensive approach to extract mortality causes and identify prevalent diseases from VA narratives utilizing advanced text mining techniques, so as to better understand the underlying health issues leading to mortality. Our methodology integrates n-gram-based language processing, Latent Dirichlet Allocation (LDA), and BERTopic, offering a multi-faceted analysis to enhance the accuracy and depth of information extraction. This is a retrospective study that uses secondary data analysis. We used data from the Agincourt Health and Demographic Surveillance Site (HDSS), which had 16338 observations collected between 1993 and 2015. Our text mining steps entailed data acquisition, pre-processing, feature extraction, topic segmentation, and discovered knowledge. The results suggest that the HDSS population may have died from mortality causes such as vomiting, chest/stomach pain, fever, coughing, loss of weight, low energy, headache. Additionally, we discovered that the most prevalent diseases entailed human immunodeficiency virus (HIV), tuberculosis (TB), diarrhoea, cancer, neurological disorders, malaria, diabetes, high blood pressure, chronic ailments (kidney, heart, lung, liver), maternal and accident related deaths. This study is relevant in that it avails valuable insights regarding mortality causes and most prevalent diseases using novel text mining approaches. These results can be integrated in the diagnosis pipeline for ease of human annotation and interpretation. As such, this will help with effective informed intervention programmes that can improve primary health care systems and chronic based delivery, thus increasing life expectancy.
Collapse
Affiliation(s)
- Michael Tonderai Mapundu
- Department of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa
| | - Chodziwadziwa Whiteson Kabudula
- Department of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa
- MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), Johannesburg, South Africa
| | - Eustasius Musenge
- Department of Epidemiology and Biostatistics, School of Public Health, University of the Witwatersrand, Johannesburg, South Africa
| | - Victor Olago
- National Health Laboratory Service (NHLS), National Cancer Registry, Johannesburg, South Africa
| | - Turgay Celik
- Wits Institute of Data Science, University of The Witwatersrand, Johannesburg, South Africa
- School of Electrical and Information Engineering, University of The Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
5
|
Zhang Y, Folarin AA, Dineley J, Conde P, de Angel V, Sun S, Ranjan Y, Rashid Z, Stewart C, Laiou P, Sankesara H, Qian L, Matcham F, White K, Oetzmann C, Lamers F, Siddi S, Simblett S, Schuller BW, Vairavan S, Wykes T, Haro JM, Penninx BWJH, Narayan VA, Hotopf M, Dobson RJB, Cummins N. Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model. J Affect Disord 2024; 355:40-49. [PMID: 38552911 DOI: 10.1016/j.jad.2024.03.106] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 03/18/2024] [Accepted: 03/22/2024] [Indexed: 04/01/2024]
Abstract
BACKGROUND Prior research has associated spoken language use with depression, yet studies often involve small or non-clinical samples and face challenges in the manual transcription of speech. This paper aimed to automatically identify depression-related topics in speech recordings collected from clinical samples. METHODS The data included 3919 English free-response speech recordings collected via smartphones from 265 participants with a depression history. We transcribed speech recordings via automatic speech recognition (Whisper tool, OpenAI) and identified principal topics from transcriptions using a deep learning topic model (BERTopic). To identify depression risk topics and understand the context, we compared participants' depression severity and behavioral (extracted from wearable devices) and linguistic (extracted from transcribed texts) characteristics across identified topics. RESULTS From the 29 topics identified, we identified 6 risk topics for depression: 'No Expectations', 'Sleep', 'Mental Therapy', 'Haircut', 'Studying', and 'Coursework'. Participants mentioning depression risk topics exhibited higher sleep variability, later sleep onset, and fewer daily steps and used fewer words, more negative language, and fewer leisure-related words in their speech recordings. LIMITATIONS Our findings were derived from a depressed cohort with a specific speech task, potentially limiting the generalizability to non-clinical populations or other speech tasks. Additionally, some topics had small sample sizes, necessitating further validation in larger datasets. CONCLUSION This study demonstrates that specific speech topics can indicate depression severity. The employed data-driven workflow provides a practical approach for analyzing large-scale speech data collected from real-world settings.
Collapse
Affiliation(s)
- Yuezhou Zhang
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
| | - Amos A Folarin
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; University College London, London, UK; South London and Maudsley NHS Foundation Trust, London, UK; Health Data Research UK London, University College London, London, UK
| | - Judith Dineley
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; University of Augsburg, Augsburg, Germany
| | - Pauline Conde
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Valeria de Angel
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Shaoxiong Sun
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Department of Computer Science, University of Sheffield, Sheffield, UK
| | - Yatharth Ranjan
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Zulqarnain Rashid
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Callum Stewart
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Petroula Laiou
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Heet Sankesara
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Linglong Qian
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Faith Matcham
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; School of Psychology, University of Sussex, Falmer, East Sussex, UK
| | - Katie White
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Carolin Oetzmann
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Femke Lamers
- Department of Psychiatry, Amsterdam Public Health Research Institute and Amsterdam Neuroscience, Amsterdam University Medical Centre, Vrije Universiteit and GGZ InGeest, Amsterdam, the Netherlands
| | - Sara Siddi
- Parc Sanitari Sant Joan de Déu, Fundació Sant Joan de Déu, CIBERSAM, Universitat de Barcelona, Barcelona, Spain
| | - Sara Simblett
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Björn W Schuller
- University of Augsburg, Augsburg, Germany; GLAM - Group on Language, Audio, & Music, Imperial College London, London, UK
| | | | - Til Wykes
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; South London and Maudsley NHS Foundation Trust, London, UK
| | - Josep Maria Haro
- Parc Sanitari Sant Joan de Déu, Fundació Sant Joan de Déu, CIBERSAM, Universitat de Barcelona, Barcelona, Spain
| | - Brenda W J H Penninx
- Department of Psychiatry, Amsterdam Public Health Research Institute and Amsterdam Neuroscience, Amsterdam University Medical Centre, Vrije Universiteit and GGZ InGeest, Amsterdam, the Netherlands
| | | | - Matthew Hotopf
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; South London and Maudsley NHS Foundation Trust, London, UK
| | - Richard J B Dobson
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; University College London, London, UK; South London and Maudsley NHS Foundation Trust, London, UK; Health Data Research UK London, University College London, London, UK
| | - Nicholas Cummins
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
| |
Collapse
|
6
|
Jensen RE, Rohde JA, Muro AH, Schweppe CA, Vanderpool RC. Analysis of Telehealth Discussion Trends on Reddit (2019-2022). Telemed J E Health 2024; 30:e1790-e1797. [PMID: 38394136 PMCID: PMC11386991 DOI: 10.1089/tmj.2023.0651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2024] Open
Abstract
Introduction: Use of telehealth increased during the COVID-19 pandemic and continues to be a popular health resource. This study analyzed the frequency and sentiment of telehealth discussions on Reddit. Methods: The data set included 13,071 publicly available Reddit submissions containing keywords related to telehealth over a 3-year period. We identified 173 unique subreddit communities, which were coded into mutually exclusive categories: (1) general telehealth, (2) individual care, (3) professional, (4) news, and (5) COVID-19. The Vader lexicon-based machine was used to assign sentiment scores. Results: Most subreddits were coded as individual care (n = 112), professional (n = 26), and news (n = 22). The frequency of submissions increased during the first 2 months of the pandemic and dropped in June 2020, but remained consistent through October 2022. Most Reddit submissions were positive in sentiment (56%). Conclusion: Findings show a mostly positive view of telehealth among Reddit users and an increase in telehealth-related discussions since the COVID-19 pandemic.
Collapse
Affiliation(s)
- Roxanne E Jensen
- Outcomes Research Branch, Healthcare Delivery Research Program, Division of Cancer Control and Population Sciences National Cancer Institute, Bethesda, Maryland, USA
| | - Jacob A Rohde
- Consumer Behavior Research Program, Center for Communication & Media Impact, RTI International, Durham, North Carolina, USA
| | - Abigail H Muro
- Health Communication and Informatics Research Branch, Behavioral Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Maryland, USA
| | - Catherine A Schweppe
- Gastrointestinal and Other Cancers Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA
| | - Robin C Vanderpool
- Health Communication and Informatics Research Branch, Behavioral Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Maryland, USA
| |
Collapse
|
7
|
Zhu J, Jin R, Kenne DR, Phan N, Ku WS. User Dynamics and Thematic Exploration in r/Depression During the COVID-19 Pandemic: Insights From Overlapping r/SuicideWatch Users. J Med Internet Res 2024; 26:e53968. [PMID: 38767953 PMCID: PMC11129781 DOI: 10.2196/53968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 01/16/2024] [Accepted: 03/23/2024] [Indexed: 05/22/2024] Open
Abstract
BACKGROUND In 2023, the United States experienced its highest- recorded number of suicides, exceeding 50,000 deaths. In the realm of psychiatric disorders, major depressive disorder stands out as the most common issue, affecting 15% to 17% of the population and carrying a notable suicide risk of approximately 15%. However, not everyone with depression has suicidal thoughts. While "suicidal depression" is not a clinical diagnosis, it may be observed in daily life, emphasizing the need for awareness. OBJECTIVE This study aims to examine the dynamics, emotional tones, and topics discussed in posts within the r/Depression subreddit, with a specific focus on users who had also engaged in the r/SuicideWatch community. The objective was to use natural language processing techniques and models to better understand the complexities of depression among users with potential suicide ideation, with the goal of improving intervention and prevention strategies for suicide. METHODS Archived posts were extracted from the r/Depression and r/SuicideWatch Reddit communities in English spanning from 2019 to 2022, resulting in a final data set of over 150,000 posts contributed by approximately 25,000 unique overlapping users. A broad and comprehensive mix of methods was conducted on these posts, including trend and survival analysis, to explore the dynamic of users in the 2 subreddits. The BERT family of models extracted features from data for sentiment and thematic analysis. RESULTS On August 16, 2020, the post count in r/SuicideWatch surpassed that of r/Depression. The transition from r/Depression to r/SuicideWatch in 2020 was the shortest, lasting only 26 days. Sadness emerged as the most prevalent emotion among overlapping users in the r/Depression community. In addition, physical activity changes, negative self-view, and suicidal thoughts were identified as the most common depression symptoms, all showing strong positive correlations with the emotion tone of disappointment. Furthermore, the topic "struggles with depression and motivation in school and work" (12%) emerged as the most discussed topic aside from suicidal thoughts, categorizing users based on their inclination toward suicide ideation. CONCLUSIONS Our study underscores the effectiveness of using natural language processing techniques to explore language markers and patterns associated with mental health challenges in online communities like r/Depression and r/SuicideWatch. These insights offer novel perspectives distinct from previous research. In the future, there will be potential for further refinement and optimization of machine classifications using these techniques, which could lead to more effective intervention and prevention strategies.
Collapse
Affiliation(s)
- Jianfeng Zhu
- Department of Computer Science, Kent State University, Kent, OH, United States
| | - Ruoming Jin
- Department of Computer Science, Kent State University, Kent, OH, United States
| | - Deric R Kenne
- Center for Public Policy and Health, College of Public Health, Kent State University, Kent, OH, United States
| | - NhatHai Phan
- Department of Data Science, New Jersey Institute of Technology, Newark, NJ, United States
| | - Wei-Shinn Ku
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| |
Collapse
|
8
|
Baird A, Cheng Y, Xia Y. Determinants of outpatient substance use disorder treatment length-of-stay and completion: the case of a treatment program in the southeast U.S. Sci Rep 2023; 13:13961. [PMID: 37633996 PMCID: PMC10460408 DOI: 10.1038/s41598-023-41350-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 08/24/2023] [Indexed: 08/28/2023] Open
Abstract
Successful outcomes of outpatient substance use disorder treatment result from many factors for clients-including intersections between individual characteristics, choices made, and social determinants. However, prioritizing which of these and in what combination, to address and provide support for remains an open and complex question. Therefore, we ask: What factors are associated with outpatient substance use disorder clients remaining in treatment for > 90 days and successfully completing treatment? To answer this question, we apply a virtual twins machine learning (ML) model to de-identified data for a census of clients who received outpatient substance use disorder treatment services from 2018 to 2021 from one treatment program in the Southeast U.S. We find that primary predictors of outcome success are: (1) attending self-help groups while in treatment, and (2) setting goals for treatment. Secondary predictors are: (1) being linked to a primary care provider (PCP) during treatment, (2) being linked to supplemental nutrition assistance program (SNAP), and (3) attending 6 or more self-help group sessions during treatment. These findings can help treatment programs guide client choice making and help set priorities for social determinant support. Further, the ML method applied can explain intersections between individual and social predictors, as well as outcome heterogeneity associated with subgroup differences.
Collapse
Affiliation(s)
- Aaron Baird
- Institute for Insight, Robinson College of Business, Georgia State University, 55 Park Place, Atlanta, GA, 30303, USA.
| | - Yichen Cheng
- Institute for Insight, Robinson College of Business, Georgia State University, 55 Park Place, Atlanta, GA, 30303, USA
| | - Yusen Xia
- Institute for Insight, Robinson College of Business, Georgia State University, 55 Park Place, Atlanta, GA, 30303, USA
| |
Collapse
|
9
|
Pollack C, Gilbert-Diamond D, Onega T, Vosoughi S, O'Malley AJ, Emond JA. Obesity-Related Discourse on Facebook and Instagram Throughout the COVID-19 Pandemic: Comparative Longitudinal Evaluation. JMIR INFODEMIOLOGY 2023; 3:e40005. [PMID: 37191990 PMCID: PMC10203886 DOI: 10.2196/40005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 01/30/2023] [Accepted: 03/27/2023] [Indexed: 05/17/2023]
Abstract
BACKGROUND COVID-19 severity is amplified among individuals with obesity, which may have influenced mainstream media coverage of the disease by both improving understanding of the condition and increasing weight-related stigma. OBJECTIVE We aimed to measure obesity-related conversations on Facebook and Instagram around key dates during the first year of the COVID-19 pandemic. METHODS Public Facebook and Instagram posts were extracted for 29-day windows in 2020 around January 28 (the first US COVID-19 case), March 11 (when COVID-19 was declared a global pandemic), May 19 (when obesity and COVID-19 were linked in mainstream media), and October 2 (when former US president Trump contracted COVID-19 and obesity was mentioned most frequently in the mainstream media). Trends in daily posts and corresponding interactions were evaluated using interrupted time series. The 10 most frequent obesity-related topics on each platform were also examined. RESULTS On Facebook, there was a temporary increase in 2020 in obesity-related posts and interactions on May 19 (posts +405, 95% CI 166 to 645; interactions +294,930, 95% CI 125,986 to 463,874) and October 2 (posts +639, 95% CI 359 to 883; interactions +182,814, 95% CI 160,524 to 205,105). On Instagram, there were temporary increases in 2020 only in interactions on May 19 (+226,017, 95% CI 107,323 to 344,708) and October 2 (+156,974, 95% CI 89,757 to 224,192). Similar trends were not observed in controls. Five of the most frequent topics overlapped (COVID-19, bariatric surgery, weight loss stories, pediatric obesity, and sleep); additional topics specific to each platform included diet fads, food groups, and clickbait. CONCLUSIONS Social media conversations surged in response to obesity-related public health news. Conversations contained both clinical and commercial content of possibly dubious accuracy. Our findings support the idea that major public health announcements may coincide with the spread of health-related content (truthful or otherwise) on social media.
Collapse
Affiliation(s)
- Catherine Pollack
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH, United States
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH, United States
| | - Diane Gilbert-Diamond
- Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH, United States
- Department of Pediatrics, Geisel School of Medicine at Dartmouth, Hanover, NH, United States
- Department of Medicine, Geisel School of Medicine at Dartmouth, Hanover, NH, United States
| | - Tracy Onega
- Department of Population Health Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, United States
| | - Soroush Vosoughi
- Department of Computer Science, Dartmouth College, Hanover, NH, United States
| | - A James O'Malley
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH, United States
- The Dartmouth Institute for Health Policy and Clinical Practice, Hanover, NH, United States
| | - Jennifer A Emond
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH, United States
- Department of Pediatrics, Geisel School of Medicine at Dartmouth, Hanover, NH, United States
| |
Collapse
|
10
|
End-to-End Transformer-Based Models in Textual-Based NLP. AI 2023. [DOI: 10.3390/ai4010004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Transformer architectures are highly expressive because they use self-attention mechanisms to encode long-range dependencies in the input sequences. In this paper, we present a literature review on Transformer-based (TB) models, providing a detailed overview of each model in comparison to the Transformer’s standard architecture. This survey focuses on TB models used in the field of Natural Language Processing (NLP) for textual-based tasks. We begin with an overview of the fundamental concepts at the heart of the success of these models. Then, we classify them based on their architecture and training mode. We compare the advantages and disadvantages of popular techniques in terms of architectural design and experimental value. Finally, we discuss open research, directions, and potential future work to help solve current TB application challenges in NLP.
Collapse
|
11
|
Pap IA, Oniga S. A Review of Converging Technologies in eHealth Pertaining to Artificial Intelligence. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:11413. [PMID: 36141685 PMCID: PMC9517043 DOI: 10.3390/ijerph191811413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Revised: 08/31/2022] [Accepted: 09/06/2022] [Indexed: 06/16/2023]
Abstract
Over the last couple of years, in the context of the COVID-19 pandemic, many healthcare issues have been exacerbated, highlighting the paramount need to provide both reliable and affordable health services to remote locations by using the latest technologies such as video conferencing, data management, the secure transfer of patient information, and efficient data analysis tools such as machine learning algorithms. In the constant struggle to offer healthcare to everyone, many modern technologies find applicability in eHealth, mHealth, telehealth or telemedicine. Through this paper, we attempt to render an overview of what different technologies are used in certain healthcare applications, ranging from remote patient monitoring in the field of cardio-oncology to analyzing EEG signals through machine learning for the prediction of seizures, focusing on the role of artificial intelligence in eHealth.
Collapse
Affiliation(s)
- Iuliu Alexandru Pap
- Department of Electric, Electronic and Computer Engineering, Technical University of Cluj-Napoca, North University Center of Baia Mare, 430083 Baia Mare, Romania
| | - Stefan Oniga
- Department of Electric, Electronic and Computer Engineering, Technical University of Cluj-Napoca, North University Center of Baia Mare, 430083 Baia Mare, Romania
- Department of IT Systems and Networks, Faculty of Informatics, University of Debrecen, 4032 Debrecen, Hungary
| |
Collapse
|