1
|
Riad R, Denais M, de Gennes M, Lesage A, Oustric V, Cao XN, Mouchabac S, Bourla A. Automated Speech Analysis for Risk Detection of Depression, Anxiety, Insomnia, and Fatigue: Algorithm Development and Validation Study. J Med Internet Res 2024; 26:e58572. [PMID: 39324329 PMCID: PMC11565087 DOI: 10.2196/58572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 09/07/2024] [Accepted: 09/25/2024] [Indexed: 09/27/2024] Open
Abstract
BACKGROUND While speech analysis holds promise for mental health assessment, research often focuses on single symptoms, despite symptom co-occurrences and interactions. In addition, predictive models in mental health do not properly assess the limitations of speech-based systems, such as uncertainty, or fairness for a safe clinical deployment. OBJECTIVE We investigated the predictive potential of mobile-collected speech data for detecting and estimating depression, anxiety, fatigue, and insomnia, focusing on other factors than mere accuracy, in the general population. METHODS We included 865 healthy adults and recorded their answers regarding their perceived mental and sleep states. We asked how they felt and if they had slept well lately. Clinically validated questionnaires measuring depression, anxiety, insomnia, and fatigue severity were also used. We developed a novel speech and machine learning pipeline involving voice activity detection, feature extraction, and model training. We automatically modeled speech with pretrained deep learning models that were pretrained on a large, open, and free database, and we selected the best one on the validation set. Based on the best speech modeling approach, clinical threshold detection, individual score prediction, model uncertainty estimation, and performance fairness across demographics (age, sex, and education) were evaluated. We used a train-validation-test split for all evaluations: to develop our models, select the best ones, and assess the generalizability of held-out data. RESULTS The best model was Whisper M with a max pooling and oversampling method. Our methods achieved good detection performance for all symptoms, depression (Patient Health Questionnaire-9: area under the curve [AUC]=0.76; F1-score=0.49 and Beck Depression Inventory: AUC=0.78; F1-score=0.65), anxiety (Generalized Anxiety Disorder 7-item scale: AUC=0.77; F1-score=0.50), insomnia (Athens Insomnia Scale: AUC=0.73; F1-score=0.62), and fatigue (Multidimensional Fatigue Inventory total score: AUC=0.68; F1-score=0.88). The system performed well when it needed to abstain from making predictions, as demonstrated by low abstention rates in depression detection with the Beck Depression Inventory and fatigue, with risk-coverage AUCs below 0.4. Individual symptom scores were accurately predicted (correlations were all significant with Pearson strengths between 0.31 and 0.49). Fairness analysis revealed that models were consistent for sex (average disparity ratio [DR] 0.86, SD 0.13), to a lesser extent for education level (average DR 0.47, SD 0.30), and worse for age groups (average DR 0.33, SD 0.30). CONCLUSIONS This study demonstrates the potential of speech-based systems for multifaceted mental health assessment in the general population, not only for detecting clinical thresholds but also for estimating their severity. Addressing fairness and incorporating uncertainty estimation with selective classification are key contributions that can enhance the clinical utility and responsible implementation of such systems.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Stéphane Mouchabac
- Department of Psychiatry, Saint-Antoine Hospital, Sorbonne University, Assistance publique - Hôpitaux de Paris, Paris, France
- Infrastructure for Clinical Research in Neurosciences, Paris Brain Institute, Paris, France
| | - Alexis Bourla
- Department of Psychiatry, Saint-Antoine Hospital, Sorbonne University, Assistance publique - Hôpitaux de Paris, Paris, France
- Infrastructure for Clinical Research in Neurosciences, Paris Brain Institute, Paris, France
- Medical Strategy and Innovation Department, Clariane, Paris, France
- NeuroStim Psychiatry Practice, Paris, France
| |
Collapse
|
2
|
Soleimani L, Ouyang Y, Cho S, Kia A, Beeri MS, Lin H, Ravona‐Springer R, Ramsingh N, Liberman MY, Grossman M, Nevler N. Speech markers of depression dimensions across cognitive status. ALZHEIMER'S & DEMENTIA (AMSTERDAM, NETHERLANDS) 2024; 16:e12604. [PMID: 39092182 PMCID: PMC11292393 DOI: 10.1002/dad2.12604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 04/20/2024] [Accepted: 04/24/2024] [Indexed: 08/04/2024]
Abstract
Introduction Depression and its components significantly impact dementia prediction and severity, necessitating reliable objective measures for quantification. Methods We investigated associations between emotion-based speech measures (valence, arousal, and dominance) during picture descriptions and depression dimensions derived from the geriatric depression scale (GDS, dysphoria, withdrawal-apathy-vigor (WAV), anxiety, hopelessness, and subjective memory complaint). Results Higher WAV was associated with more negative valence (estimate = -0.133, p = 0.030). While interactions of apolipoprotein E (APOE) 4 status with depression dimensions on emotional valence did not reach significance, there was a trend for more negative valence with higher dysphoria in those with at least one APOE4 allele (estimate = -0.404, p = 0.0846). Associations were similar irrespective of dementia severity. Discussion Our study underscores the potential utility of speech biomarkers in characterizing depression dimensions. In future research, using emotionally charged stimuli may enhance emotional measure elicitation. The role of APOE on the interaction of speech markers and depression dimensions warrants further exploration with greater sample sizes. Highlights Participants reporting higher apathy used more negative words to describe a neutral picture.Those with higher dysphoria and at least one APOE4 allele also tended to use more negative words.Our results suggest the potential use of speech biomarkers in characterizing depression dimensions.
Collapse
Affiliation(s)
| | - Yuxia Ouyang
- Icahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Sunghye Cho
- Linguistic Data ConsortiumUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Arash Kia
- Icahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | | | - Hung‐Mo Lin
- Department of AnesthesiologyYale School of MedicineNew HavenConnecticutUSA
| | - Ramit Ravona‐Springer
- The Joseph Sagol Neuroscience CenterSheba Medical CenterTel‐HashomerIsrael
- Sackler Faculty of MedicineTel Aviv UniversityTel AvivIsrael
| | - Nadia Ramsingh
- Icahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Mark Y Liberman
- Linguistic Data ConsortiumUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Murray Grossman
- Perelman School of MedicineUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Naomi Nevler
- Perelman School of MedicineUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| |
Collapse
|
3
|
Li W, Tang LM, Montayre J, Harris CB, West S, Antoniou M. Investigating Health and Well-Being Challenges Faced by an Aging Workforce in the Construction and Nursing Industries: Computational Linguistic Analysis of Twitter Data. J Med Internet Res 2024; 26:e49450. [PMID: 38838308 PMCID: PMC11187510 DOI: 10.2196/49450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 11/07/2023] [Accepted: 02/09/2024] [Indexed: 06/07/2024] Open
Abstract
BACKGROUND Construction and nursing are critical industries. Although both careers involve physically and mentally demanding work, the risks to workers during the COVID-19 pandemic are not well understood. Nurses (both younger and older) are more likely to experience the ill effects of burnout and stress than construction workers, likely due to accelerated work demands and increased pressure on nurses during the COVID-19 pandemic. In this study, we analyzed a large social media data set using advanced natural language processing techniques to explore indicators of the mental status of workers across both industries before and during the COVID-19 pandemic. OBJECTIVE This social media analysis aims to fill a knowledge gap by comparing the tweets of younger and older construction workers and nurses to obtain insights into any potential risks to their mental health due to work health and safety issues. METHODS We analyzed 1,505,638 tweets published on Twitter (subsequently rebranded as X) by younger and older (aged <45 vs >45 years) construction workers and nurses. The study period spanned 54 months, from January 2018 to June 2022, which equates to approximately 27 months before and 27 months after the World Health Organization declared COVID-19 a global pandemic on March 11, 2020. The tweets were analyzed using big data analytics and computational linguistic analyses. RESULTS Text analyses revealed that nurses made greater use of hashtags and keywords (both monograms and bigrams) associated with burnout, health issues, and mental health compared to construction workers. The COVID-19 pandemic had a pronounced effect on nurses' tweets, and this was especially noticeable in younger nurses. Tweets about health and well-being contained more first-person singular pronouns and affect words, and health-related tweets contained more affect words. Sentiment analyses revealed that, overall, nurses had a higher proportion of positive sentiment in their tweets than construction workers. However, this changed markedly during the COVID-19 pandemic. Since early 2020, sentiment switched, and negative sentiment dominated the tweets of nurses. No such crossover was observed in the tweets of construction workers. CONCLUSIONS The social media analysis revealed that younger nurses had language use patterns consistent with someone experiencing the ill effects of burnout and stress. Older construction workers had more negative sentiments than younger workers, who were more focused on communicating about social and recreational activities rather than work matters. More broadly, these findings demonstrate the utility of large data sets enabled by social media to understand the well-being of target populations, especially during times of rapid societal change.
Collapse
Affiliation(s)
- Weicong Li
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, Australia
| | - Liyaning Maggie Tang
- School of Architecture and Built Environment, The University of Newcastle, Callaghan, Australia
| | - Jed Montayre
- Centre of Evidence-based Practice for Health Care Policy, The Hong Kong Polytechnic University, Hung Hom, China (Hong Kong)
- School of Nursing and Midwifery, Western Sydney University, Penrith, Australia
| | - Celia B Harris
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, Australia
| | - Sancia West
- Centre for Work Health and Safety, New South Wales Government, Gosford, Australia
| | - Mark Antoniou
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, Australia
| |
Collapse
|
4
|
Khoo LS, Lim MK, Chong CY, McNaney R. Machine Learning for Multimodal Mental Health Detection: A Systematic Review of Passive Sensing Approaches. SENSORS (BASEL, SWITZERLAND) 2024; 24:348. [PMID: 38257440 PMCID: PMC10820860 DOI: 10.3390/s24020348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/14/2023] [Accepted: 12/18/2023] [Indexed: 01/24/2024]
Abstract
As mental health (MH) disorders become increasingly prevalent, their multifaceted symptoms and comorbidities with other conditions introduce complexity to diagnosis, posing a risk of underdiagnosis. While machine learning (ML) has been explored to mitigate these challenges, we hypothesized that multiple data modalities support more comprehensive detection and that non-intrusive collection approaches better capture natural behaviors. To understand the current trends, we systematically reviewed 184 studies to assess feature extraction, feature fusion, and ML methodologies applied to detect MH disorders from passively sensed multimodal data, including audio and video recordings, social media, smartphones, and wearable devices. Our findings revealed varying correlations of modality-specific features in individualized contexts, potentially influenced by demographics and personalities. We also observed the growing adoption of neural network architectures for model-level fusion and as ML algorithms, which have demonstrated promising efficacy in handling high-dimensional features while modeling within and cross-modality relationships. This work provides future researchers with a clear taxonomy of methodological approaches to multimodal detection of MH disorders to inspire future methodological advancements. The comprehensive analysis also guides and supports future researchers in making informed decisions to select an optimal data source that aligns with specific use cases based on the MH disorder of interest.
Collapse
Affiliation(s)
- Lin Sze Khoo
- Department of Human-Centered Computing, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia;
| | - Mei Kuan Lim
- School of Information Technology, Monash University Malaysia, Subang Jaya 46150, Malaysia; (M.K.L.); (C.Y.C.)
| | - Chun Yong Chong
- School of Information Technology, Monash University Malaysia, Subang Jaya 46150, Malaysia; (M.K.L.); (C.Y.C.)
| | - Roisin McNaney
- Department of Human-Centered Computing, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia;
| |
Collapse
|
5
|
Olah J, Diederen K, Gibbs-Dean T, Kempton MJ, Dobson R, Spencer T, Cummins N. Online speech assessment of the psychotic spectrum: Exploring the relationship between overlapping acoustic markers of schizotypy, depression and anxiety. Schizophr Res 2023; 259:11-19. [PMID: 37080802 DOI: 10.1016/j.schres.2023.03.044] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 03/22/2023] [Accepted: 03/23/2023] [Indexed: 04/22/2023]
Abstract
BACKGROUND Remote assessment of acoustic alterations in speech holds promise to increase scalability and validity in research across the psychosis spectrum. A feasible first step in establishing a procedure for online assessments is to assess acoustic alterations in psychometric schizotypy. However, to date, the complex relationship between alterations in speech related to schizotypy and those related to comorbid conditions such as symptoms of depression and anxiety has not been investigated. This study tested whether (1) depression, generalized anxiety and high psychometric schizotypy have similar voice characteristics, (2) which acoustic markers of online collected speech are the strongest predictors of psychometric schizotypy, (3) whether including generalized anxiety and depression symptoms in the model can improve the prediction of schizotypy. METHODS We collected cross-sectional, online-recorded speech data from 441 participants, assessing demographics, symptoms of depression, generalized anxiety and psychometric schizotypy. RESULTS Speech samples collected online could predict psychometric schizotypy, depression, and anxiety symptoms with weak to moderate predictive power, and with moderate and good predictive power when basic demographic variables were added to the models. Most influential features of these models largely overlapped. The predictive power of speech marker-based models of schizotypy significantly improved after including symptom scores of depression and generalized anxiety in the models (from R2 = 0.296 to R2 = 0. 436). CONCLUSIONS Acoustic features of online collected speech are predictive of psychometric schizotypy as well as generalized anxiety and depression symptoms. The acoustic characteristics of schizotypy, depression and anxiety symptoms significantly overlap. Speech models that are designed to predict schizotypy or symptoms of the schizophrenia spectrum might therefore benefit from controlling for symptoms of depression and anxiety.
Collapse
Affiliation(s)
- Julianna Olah
- Institute of Psychiatry, Psychology and Neuroscience, Department of Psychosis Studies, King's College London, London SE5 8AF, UK.
| | - Kelly Diederen
- Institute of Psychiatry, Psychology and Neuroscience, Department of Psychosis Studies, King's College London, London SE5 8AF, UK
| | - Toni Gibbs-Dean
- Institute of Psychiatry, Psychology and Neuroscience, Department of Psychosis Studies, King's College London, London SE5 8AF, UK
| | - Matthew J Kempton
- Institute of Psychiatry, Psychology and Neuroscience, Department of Psychosis Studies, King's College London, London SE5 8AF, UK
| | - Richard Dobson
- Institute of Psychiatry, Psychology and Neuroscience, Department of Biostatistics & Health Informatics, King's College London, London SE5 8AF, UK
| | - Thomas Spencer
- Institute of Psychiatry, Psychology and Neuroscience, Department of Psychosis Studies, King's College London, London SE5 8AF, UK
| | - Nicholas Cummins
- Institute of Psychiatry, Psychology and Neuroscience, Department of Biostatistics & Health Informatics, King's College London, London SE5 8AF, UK
| |
Collapse
|
6
|
Teferra BG, Rose J. Predicting Generalized Anxiety Disorder from Impromptu Speech Transcripts Using Context-Aware Transformer-Based Neural Networks: Model Evaluation study (Preprint). JMIR Ment Health 2022; 10:e44325. [PMID: 36976636 PMCID: PMC10131846 DOI: 10.2196/44325] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/21/2023] [Accepted: 02/23/2023] [Indexed: 02/25/2023] Open
Abstract
BACKGROUND The ability to automatically detect anxiety disorders from speech could be useful as a screening tool for an anxiety disorder. Prior studies have shown that individual words in textual transcripts of speech have an association with anxiety severity. Transformer-based neural networks are models that have been recently shown to have powerful predictive capabilities based on the context of more than one input word. Transformers detect linguistic patterns and can be separately trained to make specific predictions based on these patterns. OBJECTIVE This study aimed to determine whether a transformer-based language model can be used to screen for generalized anxiety disorder from impromptu speech transcripts. METHODS A total of 2000 participants provided an impromptu speech sample in response to a modified version of the Trier Social Stress Test (TSST). They also completed the Generalized Anxiety Disorder 7-item (GAD-7) scale. A transformer-based neural network model (pretrained on large textual corpora) was fine-tuned on the speech transcripts and the GAD-7 to predict whether a participant was above or below a screening threshold of the GAD-7. We reported the area under the receiver operating characteristic curve (AUROC) on the test data and compared the results with a baseline logistic regression model using the Linguistic Inquiry and Word Count (LIWC) features as input. Using the integrated gradient method to determine specific words that strongly affect the predictions, we inferred specific linguistic patterns that influence the predictions. RESULTS The baseline LIWC-based logistic regression model had an AUROC value of 0.58. The fine-tuned transformer model achieved an AUROC value of 0.64. Specific words that were often implicated in the predictions were also dependent on the context. For example, the first-person singular pronoun "I" influenced toward an anxious prediction 88% of the time and a nonanxious prediction 12% of the time, depending on the context. Silent pauses in speech, also often implicated in predictions, influenced toward an anxious prediction 20% of the time and a nonanxious prediction 80% of the time. CONCLUSIONS There is evidence that a transformer-based neural network model has increased predictive power compared with the single word-based LIWC model. We also showed that the use of specific words in a specific context-a linguistic pattern-is part of the reason for the better prediction. This suggests that such transformer-based models could play a useful role in anxiety screening systems.
Collapse
Affiliation(s)
- Bazen Gashaw Teferra
- The Edward S Rogers Sr Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Jonathan Rose
- The Edward S Rogers Sr Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada
- The Centre for Addiction and Mental Health, Toronto, ON, Canada
| |
Collapse
|
7
|
Loch AA, Lopes-Rocha AC, Ara A, Gondim JM, Cecchi GA, Corcoran CM, Mota NB, Argolo FC. Ethical Implications of the Use of Language Analysis Technologies for the Diagnosis and Prediction of Psychiatric Disorders. JMIR Ment Health 2022; 9:e41014. [PMID: 36318266 PMCID: PMC9667377 DOI: 10.2196/41014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 09/09/2022] [Accepted: 10/04/2022] [Indexed: 11/05/2022] Open
Abstract
Recent developments in artificial intelligence technologies have come to a point where machine learning algorithms can infer mental status based on someone's photos and texts posted on social media. More than that, these algorithms are able to predict, with a reasonable degree of accuracy, future mental illness. They potentially represent an important advance in mental health care for preventive and early diagnosis initiatives, and for aiding professionals in the follow-up and prognosis of their patients. However, important issues call for major caution in the use of such technologies, namely, privacy and the stigma related to mental disorders. In this paper, we discuss the bioethical implications of using such technologies to diagnose and predict future mental illness, given the current scenario of swiftly growing technologies that analyze human language and the online availability of personal information given by social media. We also suggest future directions to be taken to minimize the misuse of such important technologies.
Collapse
Affiliation(s)
- Alexandre Andrade Loch
- Institute of Psychiatry, University of Sao Paulo, Sao Paulo, Brazil.,Instituto Nacional de Biomarcadores em Neuropsiquiatria, Conselho Nacional de Desenvolvimento Científico e Tecnológico, Brazilia, Brazil
| | | | - Anderson Ara
- Departamento de Estatística, Universidade Federal do Paraná, Curitiba, Brazil
| | | | - Guillermo A Cecchi
- IBM Thomas J. Watson Research Center, Yorktown Heights, NY, United States
| | | | - Natália Bezerra Mota
- Instituto de Psiquiatria, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil.,Research Department at Motrix Lab, Motrix, Rio de Janeiro, Brazil
| | - Felipe C Argolo
- Institute of Psychiatry, University of Sao Paulo, Sao Paulo, Brazil
| |
Collapse
|