1
|
Weisenburger RL, Mullarkey MC, Labrada J, Labrousse D, Yang MY, MacPherson AH, Hsu KJ, Ugail H, Shumake J, Beevers CG. Conversational assessment using artificial intelligence is as clinically useful as depression scales and preferred by users. J Affect Disord 2024; 351:489-498. [PMID: 38290584 DOI: 10.1016/j.jad.2024.01.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 01/15/2024] [Accepted: 01/22/2024] [Indexed: 02/01/2024]
Abstract
BACKGROUND Depression is prevalent, chronic, and burdensome. Due to limited screening access, depression often remains undiagnosed. Artificial intelligence (AI) models based on spoken responses to interview questions may offer an effective, efficient alternative to other screening methods. OBJECTIVE The primary aim was to use a demographically diverse sample to validate an AI model, previously trained on human-administered interviews, on novel bot-administered interviews, and to check for algorithmic biases related to age, sex, race, and ethnicity. METHODS Using the Aiberry app, adults recruited via social media (N = 393) completed a brief bot-administered interview and a depression self-report form. An AI model was used to predict form scores based on interview responses alone. For all meaningful discrepancies between model inference and form score, clinicians performed a masked review to determine which one they preferred. RESULTS There was strong concurrent validity between the model predictions and raw self-report scores (r = 0.73, MAE = 3.3). 90 % of AI predictions either agreed with self-report or with clinical expert opinion when AI contradicted self-report. There was no differential model performance across age, sex, race, or ethnicity. LIMITATIONS Limitations include access restrictions (English-speaking ability and access to smartphone or computer with broadband internet) and potential self-selection of participants more favorably predisposed toward AI technology. CONCLUSION The Aiberry model made accurate predictions of depression severity based on remotely collected spoken responses to a bot-administered interview. This study shows promising results for the use of AI as a mental health screening tool on par with self-report measures.
Collapse
Affiliation(s)
- Rachel L Weisenburger
- Department of Psychology and Institute for Mental Health Research, The University of Texas at Austin, United States of America.
| | | | | | - Daniel Labrousse
- Department of Psychiatry, Georgetown University Medical Center, United States of America
| | - Michelle Y Yang
- Department of Psychiatry, Georgetown University Medical Center, United States of America
| | - Allison Huff MacPherson
- Department of Family and Community Medicine, College of Medicine, University of Arizona, United States of America
| | - Kean J Hsu
- Department of Psychiatry, Georgetown University Medical Center, United States of America; Department of Psychology, National University of Singapore, Singapore
| | - Hassan Ugail
- Centre for Visual Computing, University of Bradford, United Kingdom of Great Britain and Northern Ireland
| | | | - Christopher G Beevers
- Department of Psychology and Institute for Mental Health Research, The University of Texas at Austin, United States of America
| |
Collapse
|
2
|
Larsen E, Murton O, Song X, Joachim D, Watts D, Kapczinski F, Venesky L, Hurowitz G. Validating the efficacy and value proposition of mental fitness vocal biomarkers in a psychiatric population: prospective cohort study. Front Psychiatry 2024; 15:1342835. [PMID: 38505797 PMCID: PMC10948552 DOI: 10.3389/fpsyt.2024.1342835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 02/14/2024] [Indexed: 03/21/2024] Open
Abstract
Background The utility of vocal biomarkers for mental health assessment has gained increasing attention. This study aims to further this line of research by introducing a novel vocal scoring system designed to provide mental fitness tracking insights to users in real-world settings. Methods A prospective cohort study with 104 outpatient psychiatric participants was conducted to validate the "Mental Fitness Vocal Biomarker" (MFVB) score. The MFVB score was derived from eight vocal features, selected based on literature review. Participants' mental health symptom severity was assessed using the M3 Checklist, which serves as a transdiagnostic tool for measuring depression, anxiety, post-traumatic stress disorder, and bipolar symptoms. Results The MFVB demonstrated an ability to stratify individuals by their risk of elevated mental health symptom severity. Continuous observation enhanced the MFVB's efficacy, with risk ratios improving from 1.53 (1.09-2.14, p=0.0138) for single 30-second voice samples to 2.00 (1.21-3.30, p=0.0068) for data aggregated over two weeks. A higher risk ratio of 8.50 (2.31-31.25, p=0.0013) was observed in participants who used the MFVB 5-6 times per week, underscoring the utility of frequent and continuous observation. Participant feedback confirmed the user-friendliness of the application and its perceived benefits. Conclusions The MFVB is a promising tool for objective mental health tracking in real-world conditions, with potential to be a cost-effective, scalable, and privacy-preserving adjunct to traditional psychiatric assessments. User feedback suggests that vocal biomarkers can offer personalized insights and support clinical therapy and other beneficial activities that are associated with improved mental health risks and outcomes.
Collapse
Affiliation(s)
| | | | | | | | - Devon Watts
- Neuroscience Graduate Program, Department of Health Sciences, McMaster University, Hamilton, ON, Canada
- St. Joseph’s Healthcare Hamilton, Hamilton, ON, Canada
| | - Flavio Kapczinski
- Neuroscience Graduate Program, Department of Health Sciences, McMaster University, Hamilton, ON, Canada
- Department of Psychiatry, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | | | | |
Collapse
|
3
|
Wang L, Liu R, Wang Y, Xu X, Zhang R, Wei Y, Zhu R, Zhang X, Wang F. Effectiveness of a Biofeedback Intervention Targeting Mental and Physical Health Among College Students Through Speech and Physiology as Biomarkers Using Machine Learning: A Randomized Controlled Trial. Appl Psychophysiol Biofeedback 2024; 49:71-83. [PMID: 38165498 DOI: 10.1007/s10484-023-09612-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2023] [Indexed: 01/03/2024]
Abstract
Biofeedback therapy is mainly based on the analysis of physiological features to improve an individual's affective state. There are insufficient objective indicators to assess symptom improvement after biofeedback. In addition to psychological and physiological features, speech features can precisely convey information about emotions. The use of speech features can improve the objectivity of psychiatric assessments. Therefore, biofeedback based on subjective symptom scales, objective speech, and physiological features to evaluate efficacy provides a new approach for early screening and treatment of emotional problems in college students. A 4-week, randomized, controlled, parallel biofeedback therapy study was conducted with college students with symptoms of anxiety or depression. Speech samples, physiological samples, and clinical symptoms were collected at baseline and at the end of treatment, and the extracted speech features and physiological features were used for between-group comparisons and correlation analyses between the biofeedback and wait-list groups. Based on the speech features with differences between the biofeedback intervention and wait-list groups, an artificial neural network was used to predict the therapeutic effect and response after biofeedback therapy. Through biofeedback therapy, improvements in depression (p = 0.001), anxiety (p = 0.001), insomnia (p = 0.013), and stress (p = 0.004) severity were observed in college-going students (n = 52). The speech and physiological features in the biofeedback group also changed significantly compared to the waitlist group (n = 52) and were related to the change in symptoms. The energy parameters and Mel-Frequency Cepstral Coefficients (MFCC) of speech features can predict whether biofeedback intervention effectively improves anxiety and insomnia symptoms and treatment response. The accuracy of the classification model built using the artificial neural network (ANN) for treatment response and non-response was approximately 60%. The results of this study provide valuable information about biofeedback in improving the mental health of college-going students. The study identified speech features, such as the energy parameters, and MFCC as more accurate and objective indicators for tracking biofeedback therapy response and predicting efficacy. Trial Registration ClinicalTrials.gov ChiCTR2100045542.
Collapse
Affiliation(s)
- Lifei Wang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China
| | - Rongxun Liu
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China
- Henan Key Laboratory of Immunology and Targeted Drugs, School of Laboratory Medicine, Xinxiang Medical University, Xinxiang, People's Republic of China
| | - Yang Wang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China
- Psychology Institute, Inner Mongolia Normal University, Hohhot, Inner Mongolia, People's Republic of China
| | - Xiao Xu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Ran Zhang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China
| | - Yange Wei
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China
| | - Rongxin Zhu
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China
| | - Xizhe Zhang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China.
| | - Fei Wang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, People's Republic of China.
- Functional Brain Imaging Institute of Nanjing Medical University, Nanjing, People's Republic of China.
- Department of Mental Health, School of Public Health, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
4
|
Silva WJ, Lopes L, Galdino MKC, Almeida AA. Voice Acoustic Parameters as Predictors of Depression. J Voice 2024; 38:77-85. [PMID: 34353686 DOI: 10.1016/j.jvoice.2021.06.018] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 05/24/2021] [Accepted: 06/02/2021] [Indexed: 10/20/2022]
Abstract
OBJECTIVE To analyze whether voice acoustic parameters are discriminant and predictive in patients with and without depression. METHODS Observational case-control study. The following instruments were administered to the participants: Self-Reporting Questionnaire (SRQ-20), Beck Depression Inventory-Second Edition (BDI-II), Voice Symptom Scale (VoiSS) and voice collection for subsequent extraction of the following acoustic parameters: mean, mode and standard deviation (SD) of the fundamental frequency (F0); jitter; shimmer; glottal to noise excitation ratio (GNE); cepstral peak prominence-smoothed (CPPS); and spectral tilt. A total of 144 individuals participated in the study: 54 patients diagnosed with depression (case group) and 90 without a diagnosis of depression (control group). RESULTS The means of the acoustic parameters showed differences between the groups: F0 (SD), jitter, and shimmer values were high, while values for GNE, CPPS and spectral tilt were lower in the case group than in the control group. There was a significant association between BDI-II and jitter, shimmer, CPPS, and spectral tilt and between CPPS and the class of antidepressants used. The multiple linear regression model showed that jitter and CPPS were predictors of depression, as measured by the BDI-II. CONCLUSION Acoustic parameters were able to discriminate between patients with and without depression and were associated with BDI-II scores. The class of antidepressants used was associated with CPPS, and the jitter and CPPS parameters were able to predict the presence of depression, as measured by the BDI-II clinical score.
Collapse
Affiliation(s)
- Wegina Jordana Silva
- Department of Speech Therapy, Federal University of Paraíba (UFPB) and Federal University of Rio Grande do Norte (UFRN), João Pessoa, Paraíba, Brazil.
| | - Leonardo Lopes
- Department of Speech Therapy, Federal University of Paraíba (UFPB), Graduate Program in Speech Therapy, Federal University of Paraíba (UFPB) and Federal University of Rio Grande do Norte (UFRN - PPgFon), Graduate Program in Decision and Health Models (PPgMDS), and Graduate Program in Linguistic (PROLING) of UFPB, João Pessoa, Paraíba, Brazil.
| | - Melyssa Kellyane Cavalcanti Galdino
- Department of Psychology, Federal University of Paraíba (UFPB), Graduate Program in Cognitive Neuroscience and Behavior (PPgNeC) of UFPB, João Pessoa, Paraíba, Brazil.
| | - Anna Alice Almeida
- Department of Speech Therapy, Federal University of Paraíba (UFPB), Graduate Program in Speech Therapy, Federal University of Paraíba (UFPB) and Federal University of Rio Grande do Norte (UFRN - PPgFon), Graduate Program in Decision and Health Models (PPgMDS), and Graduate Program in Cognitive Neuroscience and Behavior (PPgNeC) of UFPB, João Pessoa, Paraíba, Brazil.
| |
Collapse
|
5
|
Tracey B, Volfson D, Glass J, Haulcy R, Kostrzebski M, Adams J, Kangarloo T, Brodtmann A, Dorsey ER, Vogel A. Towards interpretable speech biomarkers: exploring MFCCs. Sci Rep 2023; 13:22787. [PMID: 38123603 PMCID: PMC10733367 DOI: 10.1038/s41598-023-49352-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 12/07/2023] [Indexed: 12/23/2023] Open
Abstract
While speech biomarkers of disease have attracted increased interest in recent years, a challenge is that features derived from signal processing or machine learning approaches may lack clinical interpretability. As an example, Mel frequency cepstral coefficients (MFCCs) have been identified in several studies as a useful marker of disease, but are regarded as uninterpretable. Here we explore correlations between MFCC coefficients and more interpretable speech biomarkers. In particular we quantify the MFCC2 endpoint, which can be interpreted as a weighted ratio of low- to high-frequency energy, a concept which has been previously linked to disease-induced voice changes. By exploring MFCC2 in several datasets, we show how its sensitivity to disease can be increased by adjusting computation parameters.
Collapse
Affiliation(s)
- Brian Tracey
- Takeda Pharamaceuticals, Data Science Institute, Cambridge, MA, 02142, USA.
| | - Dmitri Volfson
- Takeda Pharamaceuticals, Data Science Institute, Cambridge, MA, 02142, USA
| | - James Glass
- Massachusetts Institute of Technology, CSAIL, Cambridge, MA, 02139, USA
| | - R'mani Haulcy
- Massachusetts Institute of Technology, CSAIL, Cambridge, MA, 02139, USA
| | - Melissa Kostrzebski
- Center for Health + Technology (CHeT), University of Rochester Medical Center, Rochester, NY, USA
| | - Jamie Adams
- Center for Health + Technology (CHeT), University of Rochester Medical Center, Rochester, NY, USA
| | - Tairmae Kangarloo
- Takeda Pharamaceuticals, Data Science Institute, Cambridge, MA, 02142, USA
| | - Amy Brodtmann
- Monash University, Melbourne, VIC, Australia
- University of Melbourne, Parkville, VIC, 3010, Australia
| | - E Ray Dorsey
- Center for Health + Technology (CHeT), University of Rochester Medical Center, Rochester, NY, USA
| | - Adam Vogel
- University of Melbourne, Parkville, VIC, 3010, Australia
- Redenlab Inc, Melbourne, VIC, 3010, Australia
| |
Collapse
|
6
|
Seifpanahi MS, Ghaemi T, Ghaleiha A, Sobhani-Rad D, Zarabian MK. The Association between Depression Severity, Prosody, and Voice Acoustic Features in Women with Depression. ScientificWorldJournal 2023; 2023:9928446. [PMID: 38089742 PMCID: PMC10715859 DOI: 10.1155/2023/9928446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 10/19/2023] [Accepted: 11/20/2023] [Indexed: 12/18/2023] Open
Abstract
The aim was to define the association between the severity of depression, prosody, and voice acoustic features in women suffering from depression and its comparisons with nondepressed people. Prosody and acoustic features in 30 women with major depression hospitalized in a psychiatric ward and 30 healthy women were investigated in a cross-sectional study. To define the severity of depression, the Hamilton Rating Scale for Depression (HRS-D) was applied. Acoustic parameters such as jitter, shimmer, cepstral peak prominence (CPP), standard deviation of fundamental frequency (SD F0), harmonic-to-noise ratio, and F0 and also some speech prosodic features including the speed of speech, switching pause duration means, and durations of produced sentences with different modals were measured quantitatively. Also, six raters judged the patient's prosody qualitatively. SPSS V.28 was used for all statistical analyses (p < 0.05). There was a significant correlation between HRS-D with jitter, SD F0, speed of speech, and switching pause means (p ≤ 0.05). The means of CPP and duration of producing emotional sentences differed between the depression and control groups. The HRS-D scores were significantly correlated with switching pauses in patients (Pearson coefficient = 0.47, p=0.05). The results of the perceptual evaluation of prosody judged by six raters showed an 85% correlation between them (p ≤ 0.001). Some acoustic and prosodic parameters are different between healthy women and those with depression disorder (e.g., CPP and duration of emotional sentences) and may also have an association with the severity of depression (e.g., jitter, SD F0, speed of speech, and switching pause means) in women with depression disorder. It was indicated that the best sentence modal to assess prosody in patients with depression would be exclamatory ones compared to declarative and interrogative sentences.
Collapse
Affiliation(s)
- Mohammad-Sadegh Seifpanahi
- Department of Speech and Language Pathology, Autism Spectrum Disorders Research Center, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Tina Ghaemi
- Department of Linguistic, The University of Konstanz, Konstanz, Germany
| | - Ali Ghaleiha
- Department of Psychiatry, Research Center for Behavioral Disorders and Substance Abuse, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Davood Sobhani-Rad
- Department of Speech Therapy, School of Paramedical Science, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mohammad-Kazem Zarabian
- Research Center for Behavioral Disorders and Substance Abuse, Hamadan University of Medical Sciences, Hamadan, Iran
| |
Collapse
|
7
|
Gomez-Zaragoza L, Marin-Morales J, Vargas EP, Giglioli IAC, Raya MA. An Online Attachment Style Recognition System Based on Voice and Machine Learning. IEEE J Biomed Health Inform 2023; 27:5576-5587. [PMID: 37566508 DOI: 10.1109/jbhi.2023.3304369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/13/2023]
Abstract
Attachment styles are known to have significant associations with mental and physical health. Specifically, insecure attachment leads individuals to higher risk of suffering from mental disorders and chronic diseases. The aim of this study is to develop an attachment recognition model that can distinguish between secure and insecure attachment styles from voice recordings, exploring the importance of acoustic features while also evaluating gender differences. A total of 199 participants recorded their responses to four open questions intended to trigger their attachment system using a web-based interrogation system. The recordings were processed to obtain the standard acoustic feature set eGeMAPS, and recursive feature elimination was applied to select the relevant features. Different supervised machine learning models were trained to recognize attachment styles using both gender-dependent and gender-independent approaches. The gender-independent model achieved a test accuracy of 58.88%, whereas the gender-dependent models obtained 63.88% and 83.63% test accuracy for women and men respectively, indicating a strong influence of gender on attachment style recognition and the need to consider them separately in further studies. These results also demonstrate the potential of acoustic properties for remote assessment of attachment style, enabling fast and objective identification of this health risk factor, and thus supporting the implementation of large-scale mobile screening systems.
Collapse
|
8
|
Duey AH, Rana A, Siddi F, Hussein H, Onnela JP, Smith TR. Daily Pain Prediction Using Smartphone Speech Recordings of Patients With Spine Disease. Neurosurgery 2023; 93:670-677. [PMID: 36995101 DOI: 10.1227/neu.0000000000002474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 02/02/2023] [Indexed: 03/31/2023] Open
Abstract
BACKGROUND Pain evaluation remains largely subjective in neurosurgical practice, but machine learning provides the potential for objective pain assessment tools. OBJECTIVE To predict daily pain levels using speech recordings from personal smartphones of a cohort of patients with diagnosed neurological spine disease. METHODS Patients with spine disease were enrolled through a general neurosurgical clinic with approval from the institutional ethics committee. At-home pain surveys and speech recordings were administered at regular intervals through the Beiwe smartphone application. Praat audio features were extracted from the speech recordings to be used as input to a K-nearest neighbors (KNN) machine learning model. The pain scores were transformed from a 0 to 10 scale to low and high pain for better discriminative capacity. RESULTS A total of 60 patients were enrolled, and 384 observations were used to train and test the prediction model. Using the KNN prediction model, an accuracy of 71% with a positive predictive value of 0.71 was achieved in classifying pain intensity into high and low. The model showed 0.71 precision for high pain and 0.70 precision for low pain. Recall of high pain was 0.74, and recall of low pain was 0.67. The overall F1 score was 0.73. CONCLUSION Our study uses a KNN to model the relationship between speech features and pain levels collected from personal smartphones of patients with spine disease. The proposed model is a stepping stone for the development of objective pain assessment in neurosurgery clinical practice.
Collapse
Affiliation(s)
- Akiro H Duey
- Department of Neurosurgery, Computational Neuroscience Outcomes Center, Brigham and Women's Hospital, Harvard Medical School, Boston , Massachusetts , USA
- Icahn School of Medicine at Mount Sinai, New York , New York , USA
| | - Aakanksha Rana
- Department of Neurosurgery, Computational Neuroscience Outcomes Center, Brigham and Women's Hospital, Harvard Medical School, Boston , Massachusetts , USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge , Massachusetts , USA
| | - Francesca Siddi
- Department of Neurosurgery, Computational Neuroscience Outcomes Center, Brigham and Women's Hospital, Harvard Medical School, Boston , Massachusetts , USA
- Departments of Neurosurgery, Leiden University Medical Center, Leiden , The Netherlands
| | - Helweh Hussein
- Department of Neurosurgery, Computational Neuroscience Outcomes Center, Brigham and Women's Hospital, Harvard Medical School, Boston , Massachusetts , USA
| | - Jukka-Pekka Onnela
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston , Massachusetts , USA
| | - Timothy R Smith
- Department of Neurosurgery, Computational Neuroscience Outcomes Center, Brigham and Women's Hospital, Harvard Medical School, Boston , Massachusetts , USA
- Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston , Massachusetts , USA
| |
Collapse
|
9
|
Sara JDS, Orbelo D, Maor E, Lerman LO, Lerman A. Guess What We Can Hear-Novel Voice Biomarkers for the Remote Detection of Disease. Mayo Clin Proc 2023; 98:1353-1375. [PMID: 37661144 PMCID: PMC10043966 DOI: 10.1016/j.mayocp.2023.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 02/08/2023] [Accepted: 03/16/2023] [Indexed: 03/30/2023]
Abstract
The advancement of digital biomarkers and the provision of remote health care greatly progressed during the coronavirus disease 2019 global pandemic. Combining voice/speech data with artificial intelligence and machine-based learning offers a novel solution to the growing demand for telemedicine. Voice biomarkers, obtained from the extraction of characteristic acoustic and linguistic features, are associated with a variety of diseases and even coronavirus disease 2019. In the current review, we (1) describe the basis on which digital voice biomarkers could facilitate "telemedicine," (2) discuss potential mechanisms that may explain the association between voice biomarkers and disease, (3) offer a novel classification system to conceptualize voice biomarkers depending on different methods for recording and analyzing voice/speech samples, (4) outline evidence revealing an association between voice biomarkers and a number of disease states, and (5) describe the process of developing a voice biomarker from recording, storing voice samples, and extracting acoustic and linguistic features relevant to training and testing deep and machine-based learning algorithms to detect disease. We further explore several important future considerations in this area of research, including the necessity for clinical trials and the importance of safeguarding data and individual privacy. To this end, we searched PubMed and Google Scholar to identify studies evaluating the relationship between voice/speech features and biomarkers and various diseases. Search terms included digital biomarker, telemedicine, voice features, voice biomarker, speech features, speech biomarkers, acoustics, linguistics, cardiovascular disease, neurologic disease, psychiatric disease, and infectious disease. The search was limited to studies published in English in peer-reviewed journals between 1980 and the present. To identify potential studies not captured by our database search strategy, we also searched studies listed in the bibliography of relevant publications and reviews.
Collapse
Affiliation(s)
| | - Diana Orbelo
- Division of Otolaryngology, Mayo Clinic College of Medicine and Science, Rochester, MN; Chaim Sheba Medical Center, Tel HaShomer, Israel
| | - Elad Maor
- Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Lilach O Lerman
- Division of Nephrology and Hypertension, Mayo Clinic Rochester, MN
| | - Amir Lerman
- Department of Cardiovascular Medicine, Mayo Clinic College of Medicine and Science, Rochester, MN.
| |
Collapse
|
10
|
Pan W, Deng F, Wang X, Hang B, Zhou W, Zhu T. Exploring the ability of vocal biomarkers in distinguishing depression from bipolar disorder, schizophrenia, and healthy controls. Front Psychiatry 2023; 14:1079448. [PMID: 37575564 PMCID: PMC10415910 DOI: 10.3389/fpsyt.2023.1079448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 06/30/2023] [Indexed: 08/15/2023] Open
Abstract
Background Vocal features have been exploited to distinguish depression from healthy controls. While there have been some claims for success, the degree to which changes in vocal features are specific to depression has not been systematically studied. Hence, we examined the performances of vocal features in differentiating depression from bipolar disorder (BD), schizophrenia and healthy controls, as well as pairwise classifications for the three disorders. Methods We sampled 32 bipolar disorder patients, 106 depression patients, 114 healthy controls, and 20 schizophrenia patients. We extracted i-vectors from Mel-frequency cepstrum coefficients (MFCCs), and built logistic regression models with ridge regularization and 5-fold cross-validation on the training set, then applied models to the test set. There were seven classification tasks: any disorder versus healthy controls; depression versus healthy controls; BD versus healthy controls; schizophrenia versus healthy controls; depression versus BD; depression versus schizophrenia; BD versus schizophrenia. Results The area under curve (AUC) score for classifying depression and bipolar disorder was 0.5 (F-score = 0.44). For other comparisons, the AUC scores ranged from 0.75 to 0.92, and the F-scores ranged from 0.73 to 0.91. The model performance (AUC) of classifying depression and bipolar disorder was significantly worse than that of classifying bipolar disorder and schizophrenia (corrected p < 0.05). While there were no significant differences in the remaining pairwise comparisons of the 7 classification tasks. Conclusion Vocal features showed discriminatory potential in classifying depression and the healthy controls, as well as between depression and other mental disorders. Future research should systematically examine the mechanisms of voice features in distinguishing depression with other mental disorders and develop more sophisticated machine learning models so that voice can assist clinical diagnosis better.
Collapse
Affiliation(s)
- Wei Pan
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Fusong Deng
- Wuhan Wuchang Hospital, Wuchang Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Xianbin Wang
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Bowen Hang
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Wenwei Zhou
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Tingshao Zhu
- Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
11
|
Liu L, Peng D, Zheng WL, Lu BL. Objective Depression Detection Using EEG and Eye Movement Signals Induced by Oil Paintings. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083413 DOI: 10.1109/embc40787.2023.10341095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Depression is a mental disorder characterized by persistent sadness and loss of interest, which has become one of the leading causes of disability worldwide. There are currently no objective diagnostic standards for depression in clinical practice. Previous studies have shown that depression causes both brain abnormalities and behavioral disorders. In this study, both electroencephalography (EEG) and eye movement signals were used to objectively detect depression. By presenting 40 carefully selected oil paintings-20 positive and 20 negative-as stimuli, we were able to successfully evoke emotions in 48 depressed patients (DPs) and 40 healthy controls (HCs) from three centers. We then used Transformer, a deep learning model, to conduct emotion recognition and depression detection. The experimental results demonstrate that: a) Transformer achieves the best accuracies of 89.21% and 92.19% in emotion recognition and depression detection, respectively; b) The HC group has higher accuracies than the DP group in emotion recognition for both subject-dependent and subject-independent experiments; c) The neural pattern differences do exist between DPs and HCs, and we find the consistent asymmetry of the neural patterns in DPs; d) For depression detection, using single oil painting achieves the best accuracies, and using negative oil paintings has higher accuracies than using positive oil paintings. These findings suggest that EEG and eye movement signals induced by oil paintings can be used to objectively identify depression.
Collapse
|
12
|
Wang Y, Liang L, Zhang Z, Xu X, Liu R, Fang H, Zhang R, Wei Y, Liu Z, Zhu R, Zhang X, Wang F. Fast and accurate assessment of depression based on voice acoustic features: a cross-sectional and longitudinal study. Front Psychiatry 2023; 14:1195276. [PMID: 37415683 PMCID: PMC10320390 DOI: 10.3389/fpsyt.2023.1195276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 06/02/2023] [Indexed: 07/08/2023] Open
Abstract
Background Depression is a widespread mental disorder that affects a significant portion of the population. However, the assessment of depression is often subjective, relying on standard questions or interviews. Acoustic features have been suggested as a reliable and objective alternative for depression assessment. Therefore, in this study, we aim to identify and explore voice acoustic features that can effectively and rapidly predict the severity of depression, as well as investigate the potential correlation between specific treatment options and voice acoustic features. Methods We utilized voice acoustic features correlated with depression scores to train a prediction model based on artificial neural network. Leave-one-out cross-validation was performed to evaluate the performance of the model. We also conducted a longitudinal study to analyze the correlation between the improvement of depression and changes in voice acoustic features after an Internet-based cognitive-behavioral therapy (ICBT) program consisting of 12 sessions. Results Our study showed that the neural network model trained based on the 30 voice acoustic features significantly correlated with HAMD scores can accurately predict the severity of depression with an absolute mean error of 3.137 and a correlation coefficient of 0.684. Furthermore, four out of the 30 features significantly decreased after ICBT, indicating their potential correlation with specific treatment options and significant improvement in depression (p < 0.05). Conclusion Voice acoustic features can effectively and rapidly predict the severity of depression, providing a low-cost and efficient method for screening patients with depression on a large scale. Our study also identified potential acoustic features that may be significantly related to specific treatment options for depression.
Collapse
Affiliation(s)
- Yang Wang
- Psychology Institute, Inner Mongolia Normal University, Hohhot, Inner Mongolia, China
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China
- Functional Brain Imaging Institute, Nanjing Medical University, Nanjing, China
| | - Lijuan Liang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China
- Functional Brain Imaging Institute, Nanjing Medical University, Nanjing, China
- Laboratory of Psychology, The First Affiliated Hospital of Hainan Medical University, Haikou, Hainan, China
| | - Zhongguo Zhang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China
- Functional Brain Imaging Institute, Nanjing Medical University, Nanjing, China
- The Fourth People’s Hospital of Yancheng, Yancheng, Jiangsu, China
| | - Xiao Xu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Rongxun Liu
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China
- Functional Brain Imaging Institute, Nanjing Medical University, Nanjing, China
- College of Medical Engineering, Xinxiang Medical University, Xinxiang, Henan, China
| | - Hanzheng Fang
- School of Computer Science and Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Ran Zhang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China
- Functional Brain Imaging Institute, Nanjing Medical University, Nanjing, China
| | - Yange Wei
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China
- Functional Brain Imaging Institute, Nanjing Medical University, Nanjing, China
| | - Zhongchun Liu
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan, Hubei, China
| | - Rongxin Zhu
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China
- Functional Brain Imaging Institute, Nanjing Medical University, Nanjing, China
| | - Xizhe Zhang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Fei Wang
- Early Intervention Unit, Department of Psychiatry, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing, China
- Functional Brain Imaging Institute, Nanjing Medical University, Nanjing, China
| |
Collapse
|
13
|
Martin JC, Clark SR, Schubert KO. Towards a Neurophenomenological Understanding of Self-Disorder in Schizophrenia Spectrum Disorders: A Systematic Review and Synthesis of Anatomical, Physiological, and Neurocognitive Findings. Brain Sci 2023; 13:845. [PMID: 37371325 DOI: 10.3390/brainsci13060845] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 05/18/2023] [Accepted: 05/18/2023] [Indexed: 06/29/2023] Open
Abstract
The concept of anomalous self-experience, also termed Self-Disorder, has attracted both clinical and research interest, as empirical studies suggest such experiences specifically aggregate in and are a core feature of schizophrenia spectrum disorders. A comprehensive neurophenomenological understanding of Self-Disorder may improve diagnostic and therapeutic practice. This systematic review aims to evaluate anatomical, physiological, and neurocognitive correlates of Self-Disorder (SD), considered a core feature of Schizophrenia Spectrum Disorders (SSDs), towards developing a neurophenomenological understanding. A search of the PubMed database retrieved 285 articles, which were evaluated for inclusion using PRISMA guidelines. Non-experimental studies, studies with no validated measure of Self-Disorder, or those with no physiological variable were excluded. In total, 21 articles were included in the review. Findings may be interpreted in the context of triple-network theory and support a core dysfunction of signal integration within two anatomical components of the Salience Network (SN), the anterior insula and dorsal anterior cingulate cortex, which may mediate connectivity across both the Default Mode Network (DMN) and Fronto-Parietal Network (FPN). We propose a theoretical Triple-Network Model of Self-Disorder characterized by increased connectivity between the Salience Network (SN) and the DMN, increased connectivity between the SN and FPN, decreased connectivity between the DMN and FPN, and increased connectivity within both the DMN and FPN. We go on to describe translational opportunities for clinical practice and provide suggestions for future research.
Collapse
Affiliation(s)
- James C Martin
- Discipline of Psychiatry, Adelaide Medical School, The University of Adelaide, Adelaide, SA 5000, Australia
| | - Scott R Clark
- Discipline of Psychiatry, Adelaide Medical School, The University of Adelaide, Adelaide, SA 5000, Australia
- Basil Hetzel Institute, Woodville, SA 5011, Australia
| | - K Oliver Schubert
- Discipline of Psychiatry, Adelaide Medical School, The University of Adelaide, Adelaide, SA 5000, Australia
- Division of Mental Health, Northern Adelaide Local Health Network, SA Health, Adelaide, SA 5000, Australia
- Headspace Early Psychosis, Sonder, Adelaide, SA 5000, Australia
| |
Collapse
|
14
|
Wasserzug Y, Degani Y, Bar-Shaked M, Binyamin M, Klein A, Hershko S, Levkovitch Y. Development and validation of a machine learning-based vocal predictive model for major depressive disorder. J Affect Disord 2023; 325:627-632. [PMID: 36586600 DOI: 10.1016/j.jad.2022.12.117] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 08/25/2022] [Accepted: 12/23/2022] [Indexed: 12/29/2022]
Abstract
BACKGROUND Variations in speech intonation are known to be associated with changes in mental state over time. Behavioral vocal analysis is an algorithmic method of determining individuals' behavioral and emotional characteristics from their vocal patterns. It can provide biomarkers for use in psychiatric assessment and monitoring, especially when remote assessment is needed, such as in the COVID-19 pandemic. The objective of this study was to design and validate an effective prototype of automatic speech analysis based on algorithms for classifying the speech features related to MDD using a remote assessment system combining a mobile app for speech recording and central cloud processing for the prosodic vocal patterns. METHODS Machine learning compared the vocal patterns of 40 patients diagnosed with MDD to the patterns of 104 non-clinical participants. The vocal patterns of 40 patients in the acute phase were also compared to 14 of these patients in the remission phase of MDD. RESULTS A vocal depression predictive model was successfully generated. The vocal depression scores of MDD patients were significantly higher than the scores of the non-patient participants (p < 0.0001). The vocal depression scores of the MDD patients in the acute phase were significantly higher than in remission (p < 0.02). LIMITATIONS The main limitation of this study is its relatively small sample size, since machine learning validity improves with big data. CONCLUSIONS The computerized analysis of prosodic changes may be used to generate biomarkers for the early detection of MDD, remote monitoring, and the evaluation of responses to treatment.
Collapse
Affiliation(s)
- Yael Wasserzug
- Merhavim Beer Yaakov-Ness Ziona Mental Health Center, Israel.
| | | | - Mili Bar-Shaked
- Merhavim Beer Yaakov-Ness Ziona Mental Health Center, Israel
| | - Milana Binyamin
- Merhavim Beer Yaakov-Ness Ziona Mental Health Center, Israel
| | | | | | | |
Collapse
|
15
|
Tan EJ, Neill E, Kleiner JL, Rossell SL. Depressive symptoms are specifically related to speech pauses in schizophrenia spectrum disorders. Psychiatry Res 2023; 321:115079. [PMID: 36716551 DOI: 10.1016/j.psychres.2023.115079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 01/03/2023] [Accepted: 01/25/2023] [Indexed: 01/28/2023]
Abstract
Depression is a common and debilitating mental illness associated with sadness and negativity and is often comorbid with other psychiatric conditions, such as schizophrenia. Depressive symptoms are presently primarily assessed through clinical interviews, however there are other behavioural indicators being investigated as more objective methods of depressive symptom assessment. The present study aimed to evaluate the utility of assessing depression using quantitative speech parameters by comparing speech between 23 schizophrenia/schizoaffective patients with clinically significant depressive symptoms (DP) 19 schizophrenia/schizoaffective patients without depressive symptoms (NDP) and 22 healthy controls with no psychiatric history (HC). Participant audio recordings were transcribed and analyzed to extract five types of speech variables: utterances, words, speaking rate, formulation errors and pauses. The results indicated that DP patients produced significantly more pauses within utterances, and had more utterances with pauses compared to NDP patients and HCs (p = <.05), who performed similarly to each other. Word, speaking rate and formulation errors variables were not significantly different between the patient groups (p > .05). The findings suggest that depressive symptoms may have a specific relationship to speech pauses, and support the potential future use of speech pause assessments as an alternative and objective depression rating and monitoring tool.
Collapse
Affiliation(s)
- Eric J Tan
- Centre for Mental Health and Brain Sciences, Swinburne University of Technology, Melbourne, Australia; Department of Psychiatry, St Vincent's Hospital, Melbourne, Australia.
| | - Erica Neill
- Centre for Mental Health and Brain Sciences, Swinburne University of Technology, Melbourne, Australia; Department of Psychiatry, St Vincent's Hospital, Melbourne, Australia
| | - Jacqui L Kleiner
- Centre for Mental Health and Brain Sciences, Swinburne University of Technology, Melbourne, Australia
| | - Susan L Rossell
- Centre for Mental Health and Brain Sciences, Swinburne University of Technology, Melbourne, Australia; Department of Psychiatry, St Vincent's Hospital, Melbourne, Australia
| |
Collapse
|
16
|
Du M, Liu S, Wang T, Zhang W, Ke Y, Chen L, Ming D. Depression recognition using a proposed speech chain model fusing speech production and perception features. J Affect Disord 2023; 323:299-308. [PMID: 36462607 DOI: 10.1016/j.jad.2022.11.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 10/22/2022] [Accepted: 11/20/2022] [Indexed: 12/05/2022]
Abstract
BACKGROUND Increasing depression patients puts great pressure on clinical diagnosis. Audio-based diagnosis is a helpful auxiliary tool for early mass screening. However, current methods consider only speech perception features, ignoring patients' vocal tract changes, which may partly result in the poor recognition. METHODS This work proposes a novel machine speech chain model for depression recognition (MSCDR) that can capture text-independent depressive speech representation from the speaker's mouth to the listener's ear to improve recognition performance. In the proposed MSCDR, linear predictive coding (LPC) and Mel-frequency cepstral coefficients (MFCC) features are extracted to describe the processes of speech generation and of speech perception, respectively. Then, a one-dimensional convolutional neural network and a long short-term memory network sequentially capture intra- and inter-segment dynamic depressive features for classification. RESULTS We tested the MSCDR on two public datasets with different languages and paradigms, namely, the Distress Analysis Interview Corpus-Wizard of Oz and the Multi-modal Open Dataset for Mental-disorder Analysis. The accuracy of the MSCDR on the two datasets was 0.77 and 0.86, and the average F1 score was 0.75 and 0.86, which were better than the other existing methods. This improvement reveals the complementarity of speech production and perception features in carrying depressive information. LIMITATIONS The sample size was relatively small, which may limit the application in clinical translation to some extent. CONCLUSION This experiment proves the good generalization ability and superiority of the proposed MSCDR and suggests that the vocal tract changes in patients with depression deserve attention for audio-based depression diagnosis.
Collapse
Affiliation(s)
- Minghao Du
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Shuang Liu
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China.
| | - Tao Wang
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Wenquan Zhang
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Yufeng Ke
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Long Chen
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Dong Ming
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China; Lab of Neural Engineering & Rehabilitation, Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin, China.
| |
Collapse
|
17
|
Omiya Y, Mizuguchi D, Tokuno S. Distinguish the Severity of Illness Associated with Novel Coronavirus (COVID-19) Infection via Sustained Vowel Speech Features. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3415. [PMID: 36834110 PMCID: PMC9960121 DOI: 10.3390/ijerph20043415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 02/10/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
The authors are currently conducting research on methods to estimate psychiatric and neurological disorders from a voice by focusing on the features of speech. It is empirically known that numerous psychosomatic symptoms appear in voice biomarkers; in this study, we examined the effectiveness of distinguishing changes in the symptoms associated with novel coronavirus infection using speech features. Multiple speech features were extracted from the voice recordings, and, as a countermeasure against overfitting, we selected features using statistical analysis and feature selection methods utilizing pseudo data and built and verified machine learning algorithm models using LightGBM. Applying 5-fold cross-validation, and using three types of sustained vowel sounds of /Ah/, /Eh/, and /Uh/, we achieved a high performance (accuracy and AUC) of over 88% in distinguishing "asymptomatic or mild illness (symptoms)" and "moderate illness 1 (symptoms)". Accordingly, the results suggest that the proposed index using voice (speech features) can likely be used in distinguishing the symptoms associated with novel coronavirus infection.
Collapse
Affiliation(s)
- Yasuhiro Omiya
- PST Inc., Yokohama 231-0023, Japan
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, Tokyo 113-8656, Japan
| | | | - Shinichi Tokuno
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, Tokyo 113-8656, Japan
- Graduate School of Health Innovation, Kanagawa University of Human Services, Yokosuka 210-0821, Japan
| |
Collapse
|
18
|
Eysenbach G, Jang EH, Lee SH, Choi KY, Park JG, Shin HC. Automatic Depression Detection Using Smartphone-Based Text-Dependent Speech Signals: Deep Convolutional Neural Network Approach. J Med Internet Res 2023; 25:e34474. [PMID: 36696160 PMCID: PMC9909514 DOI: 10.2196/34474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 05/20/2022] [Accepted: 12/18/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Automatic diagnosis of depression based on speech can complement mental health treatment methods in the future. Previous studies have reported that acoustic properties can be used to identify depression. However, few studies have attempted a large-scale differential diagnosis of patients with depressive disorders using acoustic characteristics of non-English speakers. OBJECTIVE This study proposes a framework for automatic depression detection using large-scale acoustic characteristics based on the Korean language. METHODS We recruited 153 patients who met the criteria for major depressive disorder and 165 healthy controls without current or past mental illness. Participants' voices were recorded on a smartphone while performing the task of reading predefined text-based sentences. Three approaches were evaluated and compared to detect depression using data sets with text-dependent read speech tasks: conventional machine learning models based on acoustic features, a proposed model that trains and classifies log-Mel spectrograms by applying a deep convolutional neural network (CNN) with a relatively small number of parameters, and models that train and classify log-Mel spectrograms by applying well-known pretrained networks. RESULTS The acoustic characteristics of the predefined text-based sentence reading automatically detected depression using the proposed CNN model. The highest accuracy achieved with the proposed CNN on the speech data was 78.14%. Our results show that the deep-learned acoustic characteristics lead to better performance than those obtained using the conventional approach and pretrained models. CONCLUSIONS Checking the mood of patients with major depressive disorder and detecting the consistency of objective descriptions are very important research topics. This study suggests that the analysis of speech data recorded while reading text-dependent sentences could help predict depression status automatically by capturing the characteristics of depression. Our method is smartphone based, is easily accessible, and can contribute to the automatic identification of depressive states.
Collapse
Affiliation(s)
| | - Eun Hye Jang
- Medical Information Research Section, Electronics and Telecommunications Research Institute, Dajeon, Republic of Korea
| | - Seung-Hwan Lee
- Clinical Emotion and Cognition Research Laboratory, Inje University, Goyang, Republic of Korea.,Department of Psychiatry, Inje University, Ilsan-Paik Hospital, Goyang, Republic of Korea.,Bwave Inc, Goyang, Republic of Korea
| | - Kwang-Yeon Choi
- Department of Psychiatry, College of Medicine, Chungnam National University, Daejeon, Republic of Korea
| | - Jeon Gue Park
- Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute, Dajeon, Republic of Korea.,Tutorus Labs Inc, Seoul, Republic of Korea
| | - Hyun-Chool Shin
- Department of Electronics Engineering, Soongsil University, Seoul, Republic of Korea
| |
Collapse
|
19
|
Ettore E, Müller P, Hinze J, Benoit M, Giordana B, Postin D, Lecomte A, Lindsay H, Robert P, König A. Digital Phenotyping for Differential Diagnosis of Major Depressive Episode: Narrative Review. JMIR Ment Health 2023; 10:e37225. [PMID: 36689265 PMCID: PMC9903183 DOI: 10.2196/37225] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 09/02/2022] [Accepted: 09/30/2022] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Major depressive episode (MDE) is a common clinical syndrome. It can be found in different pathologies such as major depressive disorder (MDD), bipolar disorder (BD), posttraumatic stress disorder (PTSD), or even occur in the context of psychological trauma. However, only 1 syndrome is described in international classifications (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition [DSM-5]/International Classification of Diseases 11th Revision [ICD-11]), which do not take into account the underlying pathology at the origin of the MDE. Clinical interviews are currently the best source of information to obtain the etiological diagnosis of MDE. Nevertheless, it does not allow an early diagnosis and there are no objective measures of extracted clinical information. To remedy this, the use of digital tools and their correlation with clinical symptomatology could be useful. OBJECTIVE We aimed to review the current application of digital tools for MDE diagnosis while highlighting shortcomings for further research. In addition, our work was focused on digital devices easy to use during clinical interview and mental health issues where depression is common. METHODS We conducted a narrative review of the use of digital tools during clinical interviews for MDE by searching papers published in PubMed/MEDLINE, Web of Science, and Google Scholar databases since February 2010. The search was conducted from June to September 2021. Potentially relevant papers were then compared against a checklist for relevance and reviewed independently for inclusion, with focus on 4 allocated topics of (1) automated voice analysis, behavior analysis by (2) video and physiological measures, (3) heart rate variability (HRV), and (4) electrodermal activity (EDA). For this purpose, we were interested in 4 frequently found clinical conditions in which MDE can occur: (1) MDD, (2) BD, (3) PTSD, and (4) psychological trauma. RESULTS A total of 74 relevant papers on the subject were qualitatively analyzed and the information was synthesized. Thus, a digital phenotype of MDE seems to emerge consisting of modifications in speech features (namely, temporal, prosodic, spectral, source, and formants) and in speech content, modifications in nonverbal behavior (head, hand, body and eyes movement, facial expressivity, and gaze), and a decrease in physiological measurements (HRV and EDA). We not only found similarities but also differences when MDE occurs in MDD, BD, PTSD, or psychological trauma. However, comparative studies were rare in BD or PTSD conditions, which does not allow us to identify clear and distinct digital phenotypes. CONCLUSIONS Our search identified markers from several modalities that hold promise for helping with a more objective diagnosis of MDE. To validate their potential, further longitudinal and prospective studies are needed.
Collapse
Affiliation(s)
- Eric Ettore
- Department of Psychiatry and Memory Clinic, University Hospital of Nice, Nice, France
| | - Philipp Müller
- Research Department Cognitive Assistants, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Saarbrücken, Germany
| | - Jonas Hinze
- Department of Psychiatry and Psychotherapy, Saarland University Medical Center, Hombourg, Germany
| | - Michel Benoit
- Department of Psychiatry, Hopital Pasteur, University Hospital of Nice, Nice, France
| | - Bruno Giordana
- Department of Psychiatry, Hopital Pasteur, University Hospital of Nice, Nice, France
| | - Danilo Postin
- Department of Psychiatry, School of Medicine and Health Sciences, Carl von Ossietzky University of Oldenburg, Bad Zwischenahn, Germany
| | - Amandine Lecomte
- Research Department Sémagramme Team, Institut national de recherche en informatique et en automatique, Nancy, France
| | - Hali Lindsay
- Research Department Cognitive Assistants, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Saarbrücken, Germany
| | - Philippe Robert
- Research Department, Cognition-Behaviour-Technology Lab, University Côte d'Azur, Nice, France
| | - Alexandra König
- Research Department Stars Team, Institut national de recherche en informatique et en automatique, Sophia Antipolis - Valbonne, France
| |
Collapse
|
20
|
Ostermann TA, Fuchs M, Hinz A, Engel C, Berger T. Associations of Personality, Physical and Mental Health with Voice Range Profiles. J Voice 2023:S0892-1997(22)00377-0. [PMID: 36599716 DOI: 10.1016/j.jvoice.2022.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 11/17/2022] [Accepted: 11/18/2022] [Indexed: 01/03/2023]
Abstract
OBJECTIVES There is evidence in the literature that voice characteristics are linked to mental and physical health. The aim of this explorative study was to determine associations between voice parameters measured by a voice range profile (VRP) and personality, mental and physical health. STUDY DESIGN Cross-sectional population-based study. METHODS As part of the LIFE-Adult-Study, 2639 individuals aged 18-80 years, randomly sampled from the general population, completed both speaking and singing voice tasks and answered questionnaires on depression, anxiety, life satisfaction, personality and quality of life. The voice parameters used were fundamental frequency, sound pressure level, their ranges and maximum phonation time. The associations were examined with the help of correlation and regression analyses. RESULTS Wider ranges between the lowest and highest frequency, between the lowest and highest sound pressure level and longer maximum phonation time were significantly correlated with extraversion and quality of life in both sexes, as well as openness and agreeableness in women. Smaller ranges and shorter maximum phonation time were significantly correlated with depression. Neuroticism in men was inversely correlated with the maximum phonation time. In the speaking VRP, the associations for sound pressure level were more pronounced than for the fundamental frequency. The same was true in reverse for the singing VRP. Few associations were found for anxiety, life satisfaction and conscientiousness. CONCLUSIONS Weak associations between voice parameters derived from the VRP and mental and physical health, as well as personality were seen in this exploratory study. The results indicate that the VRP measurements in a clinical context are not significantly affected by these parameters and thus are a robust measurement method for voice parameters.
Collapse
Affiliation(s)
- Thomas A Ostermann
- Phoniatrics and Audiology, Department of Otorhinolaryngology, University of Leipzig, Leipzig, Germany; LIFE Leipzig Research Centre for Civilization Diseases, University of Leipzig, Leipzig, Germany; Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, Germany
| | - Michael Fuchs
- Phoniatrics and Audiology, Department of Otorhinolaryngology, University of Leipzig, Leipzig, Germany; LIFE Leipzig Research Centre for Civilization Diseases, University of Leipzig, Leipzig, Germany
| | - Andreas Hinz
- LIFE Leipzig Research Centre for Civilization Diseases, University of Leipzig, Leipzig, Germany; Department of Medical Psychology and Medical Sociology, University of Leipzig, Leipzig, Germany
| | - Christoph Engel
- LIFE Leipzig Research Centre for Civilization Diseases, University of Leipzig, Leipzig, Germany; Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, Germany
| | - Thomas Berger
- Phoniatrics and Audiology, Department of Otorhinolaryngology, University of Leipzig, Leipzig, Germany; LIFE Leipzig Research Centre for Civilization Diseases, University of Leipzig, Leipzig, Germany.
| |
Collapse
|
21
|
Applications of Speech Analysis in Psychiatry. Harv Rev Psychiatry 2023; 31:1-13. [PMID: 36608078 DOI: 10.1097/hrp.0000000000000356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
ABSTRACT The need for objective measurement in psychiatry has stimulated interest in alternative indicators of the presence and severity of illness. Speech may offer a source of information that bridges the subjective and objective in the assessment of mental disorders. We systematically reviewed the literature for articles exploring speech analysis for psychiatric applications. The utility of speech analysis depends on how accurately speech features represent clinical symptoms within and across disorders. We identified four domains of the application of speech analysis in the literature: diagnostic classification, assessment of illness severity, prediction of onset of illness, and prognosis and treatment outcomes. We discuss the findings in each of these domains, with a focus on how types of speech features characterize different aspects of psychopathology. Models that bring together multiple speech features can distinguish speakers with psychiatric disorders from healthy controls with high accuracy. Differentiating between types of mental disorders and symptom dimensions are more complex problems that expose the transdiagnostic nature of speech features. Convergent progress in speech research and computer sciences opens avenues for implementing speech analysis to enhance objectivity of assessment in clinical practice. Application of speech analysis will need to address issues of ethics and equity, including the potential to perpetuate discriminatory bias through models that learn from clinical assessment data. Methods that mitigate bias are available and should play a key role in the implementation of speech analysis.
Collapse
|
22
|
Liu D, Liu B, Lin T, Liu G, Yang G, Qi D, Qiu Y, Lu Y, Yuan Q, Shuai SC, Li X, Liu O, Tang X, Shuai J, Cao Y, Lin H. Measuring depression severity based on facial expression and body movement using deep convolutional neural network. Front Psychiatry 2022; 13:1017064. [PMID: 36620657 PMCID: PMC9810804 DOI: 10.3389/fpsyt.2022.1017064] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 12/02/2022] [Indexed: 12/24/2022] Open
Abstract
Introduction Real-time evaluations of the severity of depressive symptoms are of great significance for the diagnosis and treatment of patients with major depressive disorder (MDD). In clinical practice, the evaluation approaches are mainly based on psychological scales and doctor-patient interviews, which are time-consuming and labor-intensive. Also, the accuracy of results mainly depends on the subjective judgment of the clinician. With the development of artificial intelligence (AI) technology, more and more machine learning methods are used to diagnose depression by appearance characteristics. Most of the previous research focused on the study of single-modal data; however, in recent years, many studies have shown that multi-modal data has better prediction performance than single-modal data. This study aimed to develop a measurement of depression severity from expression and action features and to assess its validity among the patients with MDD. Methods We proposed a multi-modal deep convolutional neural network (CNN) to evaluate the severity of depressive symptoms in real-time, which was based on the detection of patients' facial expression and body movement from videos captured by ordinary cameras. We established behavioral depression degree (BDD) metrics, which combines expression entropy and action entropy to measure the depression severity of MDD patients. Results We found that the information extracted from different modes, when integrated in appropriate proportions, can significantly improve the accuracy of the evaluation, which has not been reported in previous studies. This method presented an over 74% Pearson similarity between BDD and self-rating depression scale (SDS), self-rating anxiety scale (SAS), and Hamilton depression scale (HAMD). In addition, we tracked and evaluated the changes of BDD in patients at different stages of a course of treatment and the results obtained were in agreement with the evaluation from the scales. Discussion The BDD can effectively measure the current state of patients' depression and its changing trend according to the patient's expression and action features. Our model may provide an automatic auxiliary tool for the diagnosis and treatment of MDD.
Collapse
Affiliation(s)
- Dongdong Liu
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
| | - Bowen Liu
- Department of Psychiatry, National Clinical Research Center for Mental Disorders, The Second Xiangya Hospital of Central South University, Changsha, China
- Department of Psychiatry, Baoan Mental Health Center, Shenzhen Baoan Center for Chronic Disease Control, Shenzhen, China
| | - Tao Lin
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Guangya Liu
- Integrated Chinese and Western Therapy of Depression Ward, Hunan Brain Hospital, Changsha, China
| | - Guoyu Yang
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
| | - Dezhen Qi
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
| | - Ye Qiu
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Yuer Lu
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Qinmei Yuan
- Department of Psychiatry, National Clinical Research Center for Mental Disorders, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Stella C. Shuai
- Department of Biological Sciences, Northwestern University, Evanston, IL, United States
| | - Xiang Li
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
| | - Ou Liu
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Xiangdong Tang
- Sleep Medicine Center, Mental Health Center, Department of Respiratory and Critical Care Medicine, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Jianwei Shuai
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China
| | - Yuping Cao
- Department of Psychiatry, National Clinical Research Center for Mental Disorders, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Hai Lin
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| |
Collapse
|
23
|
Lai T, Guan Y, Men S, Shang H, Zhang H. ResNet for recognition of Qi-deficiency constitution and balanced constitution based on voice. Front Psychol 2022; 13:1043955. [PMID: 36544461 PMCID: PMC9762153 DOI: 10.3389/fpsyg.2022.1043955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 11/15/2022] [Indexed: 12/12/2022] Open
Abstract
Background According to traditional Chinese medicine theory, a Qi-deficiency constitution is characterized by a lower voice frequency, shortness of breath, reluctance to speak, an introverted personality, emotional instability, and timidity. People with Qi-deficiency constitution are prone to repeated colds and have a higher probability of chronic diseases and depression. However, a person with a Balanced constitution is relatively healthy in all physical and psychological aspects. At present, the determination of whether one has a Qi-deficiency constitution or a Balanced constitution are mostly based on a scale, which is easily affected by subjective factors. As an objective method of diagnosis, the human voice is worthy of research. Therefore, the purpose of this study is to improve the objectivity of determining Qi-deficiency constitution and Balanced constitution through one's voice and to explore the feasibility of deep learning in TCM constitution recognition. Methods The voices of 48 subjects were collected, and the constitution classification results were obtained from the classification and determination of TCM constitutions. Then, the constitution was classified according to the ResNet residual neural network model. Results A total of 720 voice data points were collected from 48 subjects. The classification accuracy rate of the Qi-deficiency constitution and Balanced constitution was 81.5% according to ResNet. The loss values of the model training and test sets gradually decreased to 0, while the ACC values of the training and test sets tended to increase, and the ACC values of the training set approached 1. The ROC curve shows an AUC value of 0.85. Conclusion The Qi-deficiency constitution and Balanced constitution determination method based on the ResNet residual neural network model proposed in this study can improve the efficiency of constitution recognition and provide decision support for clinical practice.
Collapse
Affiliation(s)
- Tong Lai
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Yutong Guan
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Shaoyang Men
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Hongcai Shang
- Key Laboratory of Chinese Internal Medicine of Ministry of Education and Beijing, Dongzhimen Hospital Affiliated to Beijing University of Chinese Medicine, Beijing, China
| | - Honglai Zhang
- School of Medical Information Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China,*Correspondence: Honglai Zhang,
| |
Collapse
|
24
|
Kaur B, Rathi S, Agrawal RK. Enhanced depression detection from speech using Quantum Whale Optimization Algorithm for feature selection. Comput Biol Med 2022; 150:106122. [PMID: 36182759 DOI: 10.1016/j.compbiomed.2022.106122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 08/27/2022] [Accepted: 09/17/2022] [Indexed: 11/03/2022]
Abstract
There is an urgent need to detect depression using a non-intrusive approach that is reliable and accurate. In this paper, a simple and efficient unimodal depression detection approach based on speech is proposed, which is non-invasive, cost-effective and computationally inexpensive. A set of spectral, temporal and spectro-temporal features is derived from the speech signal of healthy and depressed subjects. To select a minimal subset of the relevant and non-redundant speech features to detect depression, a two-phase approach based on the nature-inspired wrapper-based feature selection Quantum-based Whale Optimization Algorithm (QWOA) is proposed. Experiments are performed on the publicly available Distress Analysis Interview Corpus Wizard-of-Oz (DAICWOZ) dataset and compared with three established univariate filtering techniques for feature selection and four well-known evolutionary algorithms. The proposed model outperforms all the univariate filter feature selection techniques and the evolutionary algorithms. It has low computational complexity in comparison to traditional wrapper-based evolutionary methods. The performance of the proposed approach is superior in comparison to existing unimodal and multimodal automated depression detection models. The combination of spectral, temporal and spectro-temporal speech features gave the best result with the LDA classifier. The performance achieved with the proposed approach, in terms of F1-score for the depressed class and the non-depressed class and error is 0.846, 0.932 and 0.094 respectively. Statistical tests demonstrate that the acoustic features selected using the proposed approach are non-redundant and discriminatory. Statistical tests also establish that the performance of the proposed approach is significantly better than that of the traditional wrapper-based evolutionary methods.
Collapse
Affiliation(s)
| | - Swati Rathi
- School of Computer and Systems Sciences, Jawaharlal Nehru University, Delhi, India.
| | - R K Agrawal
- School of Computer and Systems Sciences, Jawaharlal Nehru University, Delhi, India.
| |
Collapse
|
25
|
Higuchi M, Nakamura M, Shinohara S, Omiya Y, Takano T, Mizuguchi D, Sonota N, Toda H, Saito T, So M, Takayama E, Terashi H, Mitsuyoshi S, Tokuno S. Detection of Major Depressive Disorder Based on a Combination of Voice Features: An Exploratory Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:11397. [PMID: 36141675 PMCID: PMC9517353 DOI: 10.3390/ijerph191811397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/03/2022] [Accepted: 09/07/2022] [Indexed: 06/16/2023]
Abstract
In general, it is common knowledge that people's feelings are reflected in their voice and facial expressions. This research work focuses on developing techniques for diagnosing depression based on acoustic properties of the voice. In this study, we developed a composite index of vocal acoustic properties that can be used for depression detection. Voice recordings were collected from patients undergoing outpatient treatment for major depressive disorder at a hospital or clinic following a physician's diagnosis. Numerous features were extracted from the collected audio data using openSMILE software. Furthermore, qualitatively similar features were combined using principal component analysis. The resulting components were incorporated as parameters in a logistic regression based classifier, which achieved a diagnostic accuracy of ~90% on the training set and ~80% on the test set. Lastly, the proposed metric could serve as a new measure for evaluation of major depressive disorder.
Collapse
Affiliation(s)
- Masakazu Higuchi
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, Tokyo 113-8656, Japan or
| | - Mitsuteru Nakamura
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, Tokyo 113-8656, Japan or
| | - Shuji Shinohara
- School of Science and Engineering, Tokyo Denki University, Saitama 350-0394, Japan
| | | | | | | | - Noriaki Sonota
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, Tokyo 113-8656, Japan or
| | - Hiroyuki Toda
- Department of Psychiatry, School of Medicine, National Defense Medical College, Saitama 359-8513, Japan
| | - Taku Saito
- Department of Psychiatry, School of Medicine, National Defense Medical College, Saitama 359-8513, Japan
| | - Mirai So
- Department of Neuropsychiatry, Tokyo Dental College, Tokyo 101-0061, Japan
| | - Eiji Takayama
- Department of Oral Biochemistry, Asahi University School of Dentistry, Gifu 501-0296, Japan
| | - Hiroo Terashi
- Department of Neurology, Tokyo Medical University, Tokyo 160-8402, Japan
| | - Shunji Mitsuyoshi
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, Tokyo 113-8656, Japan or
| | - Shinichi Tokuno
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, Tokyo 113-8656, Japan or
- Graduate School of Health Innovation, Kanagawa University of Human Services, Yokosuka 210-0821, Japan
| |
Collapse
|
26
|
Predicting frailty in older adults using vocal biomarkers: a cross-sectional study. BMC Geriatr 2022; 22:549. [PMID: 35778699 PMCID: PMC9248103 DOI: 10.1186/s12877-022-03237-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2021] [Accepted: 06/17/2022] [Indexed: 11/26/2022] Open
Abstract
Background Frailty is a common issue in the aging population. Given that frailty syndrome is little discussed in the literature on the aging voice, the current study aims to examine the relationship between frailty and vocal biomarkers in older people. Methods Participants aged ≥ 60 years visiting geriatric outpatient clinics were recruited. They underwent frailty assessment (Cardiovascular Health Study [CHS] index; Study of Osteoporotic Fractures [SOF] index; and Fatigue, Resistance, Ambulation, Illness, and Loss of weight [FRAIL] index) and were asked to pronounce a sustained vowel /a/ for approximately 1 s. Four voice parameters were assessed: average number of zero crossings (A1), variations in local peaks and valleys (A2), variations in first and second formant frequencies (A3), and spectral energy ratio (A4). Results Among 277 older adults, increased A1 was associated with a lower likelihood of frailty as defined by SOF (odds ratio [OR] 0.84, 95% confidence interval [CI] 0.74–0.96). Participants with larger A2 values were more likely to be frail, as defined by FRAIL and CHS (FRAIL: OR 1.41, 95% CI 1.12–1.79; CHS: OR 1.38, 95% CI 1.10–1.75). Sex differences were observed across the three frailty indices. In male participants, an increase in A3 by 10 points increased the odds of frailty by almost 7% (SOF: OR 1.07, 95% CI 1.02–1.12), 6% (FRAIL: OR 1.06, 95% CI 1.02–1.11), or 6% (CHS: OR 1.06, 95% CI 1.01–1.11). In female participants, an increase in A4 by 0.1 conferred a significant 2.8-fold (SOF: OR 2.81, 95% CI 1.71–4.62), 2.3-fold (FRAIL: OR 2.31, 95% CI 1.45–3.68), or 2.8-fold (CHS: OR 2.82, 95% CI 1.76–4.51, CHS) increased odds of frailty. Conclusions Vocal biomarkers, especially spectral-domain voice parameters, might have potential for estimating frailty, as a non-invasive, instantaneous, objective, and cost-effective estimation tool, and demonstrating sex differences for individualised treatment of frailty. Supplementary Information The online version contains supplementary material available at 10.1186/s12877-022-03237-7.
Collapse
|
27
|
Lin RF, Leung TK, Liu YP, Hu KR. Disclosing Critical Voice Features for Discriminating between Depression and Insomnia—A Preliminary Study for Developing a Quantitative Method. Healthcare (Basel) 2022; 10:healthcare10050935. [PMID: 35628071 PMCID: PMC9142030 DOI: 10.3390/healthcare10050935] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 05/09/2022] [Accepted: 05/16/2022] [Indexed: 02/06/2023] Open
Abstract
Background: Depression and insomnia are highly related—insomnia is a common symptom among depression patients, and insomnia can result in depression. Although depression patients and insomnia patients should be treated with different approaches, the lack of practical biological markers makes it difficult to discriminate between depression and insomnia effectively. Purpose: This study aimed to disclose critical vocal features for discriminating between depression and insomnia. Methods: Four groups of patients, comprising six severe-depression patients, four moderate-depression patients, ten insomnia patients, and four patients with chronic pain disorder (CPD) participated in this preliminary study, which aimed to record their speaking voices. An open-source software, openSMILE, was applied to extract 384 voice features. Analysis of variance was used to analyze the effects of the four patient statuses on these voice features. Results: statistical analyses showed significant relationships between patient status and voice features. Patients with severe depression, moderate depression, insomnia, and CPD reacted differently to certain voice features. Critical voice features were reported based on these statistical relationships. Conclusions: This preliminary study shows the potential in developing discriminating models of depression and insomnia using voice features. Future studies should recruit an adequate number of patients to confirm these voice features and increase the number of data for developing a quantitative method.
Collapse
Affiliation(s)
- Ray F. Lin
- Department of Industrial Engineering and Management, Yuan Ze University, Taoyuan 32003, Taiwan;
- Correspondence:
| | - Ting-Kai Leung
- Department of Radiology, Taoyuan General Hospital, Ministry of Health and Welfare, No. 1492, Zhongshan Rd., Taoyuan City 33004, Taiwan;
- Graduate Institute of Biomedical Materials and Tissue Engineering, College of Biomedical Engineering, Taipei Medical University, Taipei 11031, Taiwan
| | - Yung-Ping Liu
- Department of Industrial Engineering and Management, Chaoyang University of Technology, Taichung 413310, Taiwan;
| | - Kai-Rong Hu
- Department of Industrial Engineering and Management, Yuan Ze University, Taoyuan 32003, Taiwan;
| |
Collapse
|
28
|
Nahar JK, Lopez-Jimenez F. Utilizing Conversational Artificial Intelligence, Voice, and Phonocardiography Analytics in Heart Failure Care. Heart Fail Clin 2022; 18:311-323. [DOI: 10.1016/j.hfc.2021.11.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|
29
|
Abstract
Zusammenfassung
Hintergrund
Die Depression ist neben kognitiven, emotionalen, sozialen und psychomotorischen Beeinträchtigungen durch spezifische stimmliche Merkmale gekennzeichnet. Bisher existieren erst wenige Studien, die diese in klinischen Kontexten und in Abgrenzung zu Gesunden untersuchten.
Fragestellung
Untersucht wurde der Zusammenhang zwischen Depressivität und paraverbalen Merkmalen bei depressiven Patient_innen und gesunden Probanden.
Methode
In einem mehrschrittigen Verfahren wurden die Audioinhalte von Anamneseinterviews mit Depressiven (n = 15) und Gesunden (n = 15) mit einer Software annotiert und transkribiert. Die paraverbalen Merkmale Grundfrequenz der Stimme, Spannweite der Stimme, Sprechgeschwindigkeit und Pausenlänge wurden automatisiert bestimmt. Mithilfe hierarchisch linearer Modelle wurde der Einfluss der Gruppenzugehörigkeit, Depressivität, Ängstlichkeit sowie psychischer und körperlicher Gesundheit auf die paraverbalen Merkmale analysiert.
Ergebnisse
Es zeigte sich ein Zusammenhang zwischen Depressivität und der Sprechgeschwindigkeit. Zwischen der Spannweite der Grundfrequenz, der Pausenlänge und Depressivität konnten tendenzielle Zusammenhänge gefunden werden. Depressive Patient_innen sind im Vergleich zu Gesunden durch eine monotone Sprache, eine geringe Sprechgeschwindigkeit und längere Pausen gekennzeichnet. Sprechgeschwindigkeit und Pausenlänge waren ebenfalls mit Ängstlichkeit assoziiert.
Diskussion
Sprechgeschwindigkeit, Pausenlänge und Spannweite der Grundfrequenz scheinen relevante Indikatoren für Depressivität/ggf. Ängstlichkeit zu sein. Die Spannweite der Grundfrequenz ist eher depressionsspezifisch, während Pausenlänge und Sprechgeschwindigkeit mit Depressivität und Ängstlichkeit assoziiert sind. Zukünftige Studien sollten die Zusammenhänge in größeren Stichproben verschiedener klinischer Störungsbilder untersuchen.
Collapse
|
30
|
Hansen L, Zhang YP, Wolf D, Sechidis K, Ladegaard N, Fusaroli R. A generalizable speech emotion recognition model reveals depression and remission. Acta Psychiatr Scand 2022; 145:186-199. [PMID: 34850386 DOI: 10.1111/acps.13388] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/24/2021] [Accepted: 11/25/2021] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Affective disorders are associated with atypical voice patterns; however, automated voice analyses suffer from small sample sizes and untested generalizability on external data. We investigated a generalizable approach to aid clinical evaluation of depression and remission from voice using transfer learning: We train machine learning models on easily accessible non-clinical datasets and test them on novel clinical data in a different language. METHODS A Mixture of Experts machine learning model was trained to infer happy/sad emotional state using three publicly available emotional speech corpora in German and US English. We examined the model's predictive ability to classify the presence of depression on Danish speaking healthy controls (N = 42), patients with first-episode major depressive disorder (MDD) (N = 40), and the subset of the same patients who entered remission (N = 25) based on recorded clinical interviews. The model was evaluated on raw, de-noised, and speaker-diarized data. RESULTS The model showed separation between healthy controls and depressed patients at the first visit, obtaining an AUC of 0.71. Further, speech from patients in remission was indistinguishable from that of the control group. Model predictions were stable throughout the interview, suggesting that 20-30 s of speech might be enough to accurately screen a patient. Background noise (but not speaker diarization) heavily impacted predictions. CONCLUSION A generalizable speech emotion recognition model can effectively reveal changes in speaker depressive states before and after remission in patients with MDD. Data collection settings and data cleaning are crucial when considering automated voice analysis for clinical purposes.
Collapse
Affiliation(s)
- Lasse Hansen
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.,Department of Affective Disorders, Aarhus University Hospital - Psychiatry, Aarhus, Denmark.,Center for Humanities Computing Aarhus, Aarhus University, Aarhus, Denmark.,Roche Pharmaceutical Research & Early Development Informatics, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Yan-Ping Zhang
- Roche Pharmaceutical Research & Early Development Informatics, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Detlef Wolf
- Roche Pharmaceutical Research & Early Development Informatics, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | | | - Nicolai Ladegaard
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.,Department of Affective Disorders, Aarhus University Hospital - Psychiatry, Aarhus, Denmark
| | - Riccardo Fusaroli
- Cognitive Science, School of Communication and Culture, Aarhus University, Aarhus, Denmark.,The Interacting Minds Centre, Aarhus University, Aarhus, Denmark
| |
Collapse
|
31
|
Calić G, Petrović-Lazić M, Mentus T, Babac S. Acoustic features of voice in adults suffering from depression. PSIHOLOSKA ISTRAZIVANJA 2022. [DOI: 10.5937/psistra25-39224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
In order to examine the differences in people suffering from depression (EG, N=18) compared to the healthy controls (CG1, N=24) and people with the diagnosed psychogenic voice disorder (CG2, N=9), nine acoustic features of voice were assessed among the total of 51 participants using the MDVP software programme ("Kay Elemetrics" Corp., model 4300). Nine acoustic parameters were analysed on the basis of the sustained phonation of the vowel /a/. The results revealed that the mean values of all acoustic parameters differed in the EG compared to both the CG1 and CG2 as follows: the parameters which indicate frequency variability (Jitt, PPQ), amplitude variability (Shim, vAm, APQ) and noise and tremor parameters (NHR, VTI) were higher; only the parameters of fundamental frequency (F0) and soft index phonation (SPI) were lower (F0 compared to CG1, and SPI compared to CG1 and CG2). Only the PPQ parameter was not significant. vAm and APQ had the highest discriminant value for depression. The acoustic features of voice, analysed in this study with regard to the sustained phonation of a vowel, were different and discriminant in the EG compared to CG1 and CG2. In voice analysis, the parameters vAm and APQ could potentially be the markers indicative of depression. The results of this research point to the importance of the voice, that is, its acoustic indicators, in recognizing depression. Important parameters that could help create a programme for the automatic recognition of depression are those from the domain of voice intensity variation.
Collapse
|
32
|
Zhao Q, Fan HZ, Li YL, Liu L, Wu YX, Zhao YL, Tian ZX, Wang ZR, Tan YL, Tan SP. Vocal Acoustic Features as Potential Biomarkers for Identifying/Diagnosing Depression: A Cross-Sectional Study. Front Psychiatry 2022; 13:815678. [PMID: 35573349 PMCID: PMC9095973 DOI: 10.3389/fpsyt.2022.815678] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 03/30/2022] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND At present, there is no established biomarker for the diagnosis of depression. Meanwhile, studies show that acoustic features convey emotional information. Therefore, this study explored differences in acoustic characteristics between depressed patients and healthy individuals to investigate whether these characteristics can identify depression. METHODS Participants included 71 patients diagnosed with depression from a regional hospital in Beijing, China, and 62 normal controls from within the greater community. We assessed the clinical symptoms of depression of all participants using the Hamilton Depression Scale (HAMD), Hamilton Anxiety Scale (HAMA), and Patient Health Questionnaire (PHQ-9), and recorded the voice of each participant as they read positive, neutral, and negative texts. OpenSMILE was used to analyze their voice acoustics and extract acoustic characteristics from the recordings. RESULTS There were significant differences between the depression and control groups in all acoustic characteristics (p < 0.05). Several mel-frequency cepstral coefficients (MFCCs), including MFCC2, MFCC3, MFCC8, and MFCC9, differed significantly between different emotion tasks; MFCC4 and MFCC7 correlated positively with PHQ-9 scores, and correlations were stable in all emotion tasks. The zero-crossing rate in positive emotion correlated positively with HAMA total score and HAMA somatic anxiety score (r = 0.31, r = 0.34, respectively), and MFCC9 of neutral emotion correlated negatively with HAMD anxiety/somatization scores (r = -0.34). Linear regression showed that the MFCC7-negative was predictive on the PHQ-9 score (β = 0.90, p = 0.01) and MFCC9-neutral was predictive on HAMD anxiety/somatization score (β = -0.45, p = 0.049). Logistic regression showed a superior discriminant effect, with a discrimination accuracy of 89.66%. CONCLUSION The acoustic expression of emotion among patients with depression differs from that of normal controls. Some acoustic characteristics are related to the severity of depressive symptoms and may be objective biomarkers of depression. A systematic method of assessing vocal acoustic characteristics could provide an accurate and discreet means of screening for depression; this method may be used instead of-or in conjunction with-traditional screening methods, as it is not subject to the limitations associated with self-reported assessments wherein subjects may be inclined to provide socially acceptable responses rather than being truthful.
Collapse
Affiliation(s)
- Qing Zhao
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Hong-Zhen Fan
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Yan-Li Li
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Lei Liu
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Ya-Xue Wu
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Yan-Li Zhao
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Zhan-Xiao Tian
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Zhi-Ren Wang
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Yun-Long Tan
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| | - Shu-Ping Tan
- Peking University HuiLongGuan Clinical Medical School, Beijing Huilongguan Hospital, Beijing, China
| |
Collapse
|
33
|
Moragrega I, Bridler R, Mohr C, Possenti M, Rochat D, Parramon JS, Stassen HH. Monitoring the effects of therapeutic interventions in depression through self-assessments. RESEARCH IN PSYCHOTHERAPY (MILANO) 2021; 24:548. [PMID: 35047425 PMCID: PMC8715262 DOI: 10.4081/ripppo.2021.548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 09/07/2021] [Indexed: 11/30/2022]
Abstract
The treatment of major psychiatric disorders is an arduous and thorny path for the patients concerned, characterized by polypharmacy, massive adverse side effects, modest prospects of success, and constantly declining response rates. The more important is the early detection of psychiatric disorders prior to the development of clinically relevant symptoms, so that people can benefit from early interventions. A well-proven approach to monitoring mental health relies on voice analysis. This method has been successfully used with psychiatric patients to 'objectively' document the progress of improvement or the onset of relapse. The studies with psychiatric patients over 2-4 weeks demonstrated that daily voice assessments have a notable therapeutic effect in themselves. Therefore, daily voice assessments appear to be a lowthreshold form of therapeutic means that may be realized through self-assessments. To evaluate performance and reliability of this approach, we have carried out a longitudinal study on 82 university students in 3 different countries with daily assessments over 2 weeks. The sample included 41 males (mean age 24.2±3.83 years) and 41 females (mean age 21.6±2.05 years). Unlike other research in the field, this study was not concerned with the classification of individuals in terms of diagnostic categories. The focus lay on the monitoring aspect and the extent to which the effects of therapeutic interventions or of behavioural changes are visible in the results of self-assessment voice analyses. The test persons showed an over-proportionally good adherence to the daily voice analysis scheme. The accumulated data were of generally high quality: sufficiently high signal levels, a very limited number of movement artifacts, and little to no interfering background noise. The method was sufficiently sensitive to detect: i) habituation effects when test persons became used to the daily procedure; and ii) short-term fluctuations that exceeded prespecified thresholds and reached significance. Results are directly interpretable and provide information about what is going well, what is going less well, and where there is a need for action. The proposed self-assessment approach was found to be well-suited to serve as a health-monitoring tool for subjects with an elevated vulnerability to psychiatric disorders or to stress-induced mental health problems. Daily voice assessments are in fact a low-threshold form of therapeutic means that can be realized through selfassessments, that requires only little effort, can be carried out in the test person's own home, and has the potential to strengthen resilience and to induce positive behavioural changes.
Collapse
Affiliation(s)
- Ines Moragrega
- Department of Psychobiology, University of Valencia, Valencia, Spain
| | | | - Christine Mohr
- Department of Psychology, University of Lausanne, Lausanne, Switzerland
| | - Michela Possenti
- Department of Psychology, University of Milano Bicocca, Milano, Italy
| | - Deborah Rochat
- Department of Psychology, University of Lausanne, Lausanne, Switzerland
| | | | - Hans H. Stassen
- Institute for Response-Genetics, Department of Psychiatry, Psychotherapy and Psychosomatics, Psychiatric University Hospital, Zurich, Switzerland
| |
Collapse
|
34
|
DeSouza DD, Robin J, Gumus M, Yeung A. Natural Language Processing as an Emerging Tool to Detect Late-Life Depression. Front Psychiatry 2021; 12:719125. [PMID: 34552519 PMCID: PMC8450440 DOI: 10.3389/fpsyt.2021.719125] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 08/11/2021] [Indexed: 12/14/2022] Open
Abstract
Late-life depression (LLD) is a major public health concern. Despite the availability of effective treatments for depression, barriers to screening and diagnosis still exist. The use of current standardized depression assessments can lead to underdiagnosis or misdiagnosis due to subjective symptom reporting and the distinct cognitive, psychomotor, and somatic features of LLD. To overcome these limitations, there has been a growing interest in the development of objective measures of depression using artificial intelligence (AI) technologies such as natural language processing (NLP). NLP approaches focus on the analysis of acoustic and linguistic aspects of human language derived from text and speech and can be integrated with machine learning approaches to classify depression and its severity. In this review, we will provide rationale for the use of NLP methods to study depression using speech, summarize previous research using NLP in LLD, compare findings to younger adults with depression and older adults with other clinical conditions, and discuss future directions including the use of complementary AI strategies to fully capture the spectrum of LLD.
Collapse
Affiliation(s)
| | | | | | - Anthony Yeung
- Department of Psychiatry, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
35
|
Aydemir E, Tuncer T, Dogan S, Gururajan R, Acharya UR. Automated major depressive disorder detection using melamine pattern with EEG signals. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02426-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
36
|
Lee S, Suh SW, Kim T, Kim K, Lee KH, Lee JR, Han G, Hong JW, Han JW, Lee K, Kim KW. Screening major depressive disorder using vocal acoustic features in the elderly by sex. J Affect Disord 2021; 291:15-23. [PMID: 34022551 DOI: 10.1016/j.jad.2021.04.098] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 01/12/2021] [Accepted: 04/25/2021] [Indexed: 10/21/2022]
Abstract
BACKGROUND Vocal acoustic features are potential biomarkers of elderly depression. Previous automated diagnostic tests for depression have employed unstandardized speech samples, and few studies have considered differences in voice reactivity. We aimed to develop a voice-based screening test for depression measuring vocal acoustic features of elderly Koreans while they read a series of mood-inducing sentences (MIS). METHODS In this case-control study, we recruited 61 individuals with major depressive disorder and 143 healthy controls (mean age [SD]: 72 [6]; female, 70%) from the community-dwelling elderly population. Participants were asked to read MIS and their variation pattern of acoustic features represented by the correlation distance between two MIS were analyzed as input features using the univariate feature selection technique and subsequently classified by AdaBoost. RESULTS Acoustic features showing significant discriminatory performances were spectral and energy-related features for males (sensitivity 0.95, specificity 0.88, and accuracy 0.86) and prosody-related features for females (sensitivity 0.73, specificity 0.86, and accuracy 0.77). The correlation distance between negative and positive MIS was significantly shorter in the depressed group than in the healthy control (F = 18.574, P < 0.001). LIMITATIONS Small sample size and relatively homogenous clinical profile of depression could limit the generalizability. CONCLUSIONS While reading MIS, spectral and energy-related acoustic features for males and prosody-related features for females are good discriminators for major depressive disorder. These features may be used as biomarkers of depression in the elderly.
Collapse
Affiliation(s)
- Subin Lee
- Music and Audio Research Group, Seoul National University, Seoul, Korea
| | - Seung Wan Suh
- Department of Psychiatry, Kangdong Sacred Heart Hospital, Hallym University College of Medicine, Seoul, Korea
| | - Taehyun Kim
- Department of Neuropsychiatry, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Kayoung Kim
- Department of Neuropsychiatry, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Kyoung Hwan Lee
- Department of Neuropsychiatry, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Ju Ri Lee
- Department of Neuropsychiatry, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Guehee Han
- Department of Neuropsychiatry, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Jong Woo Hong
- Department of Neuropsychiatry, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Ji Won Han
- Department of Neuropsychiatry, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Kyogu Lee
- Music and Audio Research Group, Seoul National University, Seoul, Korea; Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea.
| | - Ki Woong Kim
- Department of Neuropsychiatry, Seoul National University Bundang Hospital, Seongnam, Korea; Department of Psychiatry, Seoul National University, College of Medicine, Seoul, Korea; Department of Brain and Cognitive Sciences, Seoul National University, College of Natural Sciences, Seoul, Korea.
| |
Collapse
|
37
|
Niu M, Liu B, Tao J, Li Q. A time-frequency channel attention and vectorization network for automatic depression level prediction. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.04.056] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
38
|
Shin D, Cho WI, Park CHK, Rhee SJ, Kim MJ, Lee H, Kim NS, Ahn YM. Detection of Minor and Major Depression through Voice as a Biomarker Using Machine Learning. J Clin Med 2021; 10:jcm10143046. [PMID: 34300212 PMCID: PMC8303477 DOI: 10.3390/jcm10143046] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 06/23/2021] [Accepted: 07/07/2021] [Indexed: 11/30/2022] Open
Abstract
Both minor and major depression have high prevalence and are important causes of social burden worldwide; however, there is still no objective indicator to detect minor depression. This study aimed to examine if voice could be used as a biomarker to detect minor and major depression. Ninety-three subjects were classified into three groups: the not depressed group (n = 33), the minor depressive episode group (n = 26), and the major depressive episode group (n = 34), based on current depressive status as a dimension. Twenty-one voice features were extracted from semi-structured interview recordings. A three-group comparison was performed through analysis of variance. Seven voice indicators showed differences between the three groups, even after adjusting for age, BMI, and drugs taken for non-psychiatric disorders. Among the machine learning methods, the best performance was obtained using the multi-layer processing method, and an AUC of 65.9%, sensitivity of 65.6%, and specificity of 66.2% were shown. This study further revealed voice differences in depressive episodes and confirmed that not depressed groups and participants with minor and major depression could be accurately distinguished through machine learning. Although this study is limited by a small sample size, it is the first study on voice change in minor depression and suggests the possibility of detecting minor depression through voice.
Collapse
Affiliation(s)
- Daun Shin
- Department of Psychiatry, Seoul National University College of Medicine, Seoul 03080, Korea; (D.S.); (H.L.)
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul 13620, Korea; (S.J.R.); (M.J.K.)
| | - Won Ik Cho
- Department of Electrical and Computer Engineering and INMC, Seoul National University College of Engineering, Seoul 08826, Korea; (W.I.C.); (N.S.K.)
| | | | - Sang Jin Rhee
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul 13620, Korea; (S.J.R.); (M.J.K.)
| | - Min Ji Kim
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul 13620, Korea; (S.J.R.); (M.J.K.)
| | - Hyunju Lee
- Department of Psychiatry, Seoul National University College of Medicine, Seoul 03080, Korea; (D.S.); (H.L.)
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul 13620, Korea; (S.J.R.); (M.J.K.)
| | - Nam Soo Kim
- Department of Electrical and Computer Engineering and INMC, Seoul National University College of Engineering, Seoul 08826, Korea; (W.I.C.); (N.S.K.)
| | - Yong Min Ahn
- Department of Psychiatry, Seoul National University College of Medicine, Seoul 03080, Korea; (D.S.); (H.L.)
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul 13620, Korea; (S.J.R.); (M.J.K.)
- Institute of Human Behavioral Medicine, Seoul National University Medical Research Center, Seoul 03087, Korea
- Correspondence: ; Fax: +82-2-744-7241
| |
Collapse
|
39
|
Little B, Alshabrawy O, Stow D, Ferrier IN, McNaney R, Jackson DG, Ladha K, Ladha C, Ploetz T, Bacardit J, Olivier P, Gallagher P, O'Brien JT. Deep learning-based automated speech detection as a marker of social functioning in late-life depression. Psychol Med 2021; 51:1441-1450. [PMID: 31944174 PMCID: PMC8311821 DOI: 10.1017/s0033291719003994] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 10/23/2019] [Accepted: 12/13/2019] [Indexed: 11/24/2022]
Abstract
BACKGROUND Late-life depression (LLD) is associated with poor social functioning. However, previous research uses bias-prone self-report scales to measure social functioning and a more objective measure is lacking. We tested a novel wearable device to measure speech that participants encounter as an indicator of social interaction. METHODS Twenty nine participants with LLD and 29 age-matched controls wore a wrist-worn device continuously for seven days, which recorded their acoustic environment. Acoustic data were automatically analysed using deep learning models that had been developed and validated on an independent speech dataset. Total speech activity and the proportion of speech produced by the device wearer were both detected whilst maintaining participants' privacy. Participants underwent a neuropsychological test battery and clinical and self-report scales to measure severity of depression, general and social functioning. RESULTS Compared to controls, participants with LLD showed poorer self-reported social and general functioning. Total speech activity was much lower for participants with LLD than controls, with no overlap between groups. The proportion of speech produced by the participants was smaller for LLD than controls. In LLD, both speech measures correlated with attention and psychomotor speed performance but not with depression severity or self-reported social functioning. CONCLUSIONS Using this device, LLD was associated with lower levels of speech than controls and speech activity was related to psychomotor retardation. We have demonstrated that speech activity measured by wearable technology differentiated LLD from controls with high precision and, in this study, provided an objective measure of an aspect of real-world social functioning in LLD.
Collapse
Affiliation(s)
- Bethany Little
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
| | - Ossama Alshabrawy
- Interdisciplinary Computing and Complex BioSystems (ICOS) group, School of Computing, Newcastle University, Newcastle upon Tyne, UK
- Faculty of Science, Damietta University, New Damietta, Egypt
| | - Daniel Stow
- Institute of Health and Society, Newcastle University, Newcastle upon Tyne, UK
| | - I. Nicol Ferrier
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
| | | | - Daniel G. Jackson
- Open Lab, School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Karim Ladha
- Open Lab, School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | | | - Thomas Ploetz
- School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA, USA
| | - Jaume Bacardit
- Interdisciplinary Computing and Complex BioSystems (ICOS) group, School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Patrick Olivier
- Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Peter Gallagher
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
| | - John T. O'Brien
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
- Department of Psychiatry, University of Cambridge, Cambridge, UK
| |
Collapse
|
40
|
Shinohara S, Toda H, Nakamura M, Omiya Y, Higuchi M, Takano T, Saito T, Tanichi M, Boku S, Mitsuyoshi S, So M, Yoshino A, Tokuno S. Evaluation of emotional arousal level and depression severity using voice-derived sound pressure change acceleration. Sci Rep 2021; 11:13615. [PMID: 34193915 PMCID: PMC8245525 DOI: 10.1038/s41598-021-92982-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 06/15/2021] [Indexed: 11/09/2022] Open
Abstract
In this research, we propose a new index of emotional arousal level using sound pressure change acceleration, called the emotional arousal level voice index (EALVI), and investigate the relationship between this index and depression severity. First, EALVI values were calculated from various speech recordings in the interactive emotional dyadic motion capture database, and the correlation with the emotional arousal level of each voice was examined. The resulting correlation coefficient was 0.52 (n = 10,039, p < 2.2 × 10-16). We collected a total of 178 datasets comprising 10 speech phrases and the Hamilton Rating Scale for Depression (HAM-D) score of outpatients with major depression at the Ginza Taimei Clinic (GTC) and the National Defense Medical College (NDMC) Hospital. The correlation coefficients between the EALVI and HAM-D scores were - 0.33 (n = 88, p = 1.8 × 10-3) and - 0.43 (n = 90, p = 2.2 × 10-5) at the GTC and NDMC, respectively. Next, the dataset was divided into "no depression" (HAM-D < 8) and "depression" groups (HAM-D ≥ 8) according to the HAM-D score. The number of patients in the "no depression" and "depression" groups were 10 and 78 in the GTC data, and 65 and 25 in the NDMC data, respectively. There was a significant difference in the mean EALVI values between the two groups in both the GTC and NDMC data (p = 8.9 × 10-3, Cliff's delta = 0.51 and p = 1.6 × 10-3; Cliff's delta = 0.43, respectively). The area under the curve of the receiver operating characteristic curve when discriminating both groups by EALVI was 0.76 in GTC data and 0.72 in NDMC data. Indirectly, the data suggest that there is some relationship between emotional arousal level and depression severity.
Collapse
Affiliation(s)
- Shuji Shinohara
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan.
| | - Hiroyuki Toda
- Department of Psychiatry, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama, 359-8513, Japan
| | - Mitsuteru Nakamura
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Yasuhiro Omiya
- PST Inc., Industry & Trade Center Building 905, 2 Yamashita-cho, Naka-ku, Yokohama, Kanagawa, 231-0023, Japan
| | - Masakazu Higuchi
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Takeshi Takano
- PST Inc., Industry & Trade Center Building 905, 2 Yamashita-cho, Naka-ku, Yokohama, Kanagawa, 231-0023, Japan
| | - Taku Saito
- Department of Psychiatry, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama, 359-8513, Japan
| | - Masaaki Tanichi
- Department of Psychiatry, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama, 359-8513, Japan
| | - Shuken Boku
- Department of Neuropsychiatry, Faculty of Life Sciences, Kumamoto University, 1-1-1 Honjo, Chuo-ku, Kumamoto, Kumamoto, 860-8556, Japan
| | - Shunji Mitsuyoshi
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Mirai So
- Department of Psychiatry, Tokyo Dental College, 2-9-18, Misakicho, Chiyoda-ku, Tokyo, 101-0061, Japan
| | - Aihide Yoshino
- Department of Psychiatry, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama, 359-8513, Japan
| | - Shinichi Tokuno
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| |
Collapse
|
41
|
Thomas JA, Burkhardt HA, Chaudhry S, Ngo AD, Sharma S, Zhang L, Au R, Hosseini Ghomi R. Assessing the Utility of Language and Voice Biomarkers to Predict Cognitive Impairment in the Framingham Heart Study Cognitive Aging Cohort Data. J Alzheimers Dis 2021; 76:905-922. [PMID: 32568190 DOI: 10.3233/jad-190783] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
BACKGROUND There is a need for fast, accessible, low-cost, and accurate diagnostic methods for early detection of cognitive decline. Dementia diagnoses are usually made years after symptom onset, missing a window of opportunity for early intervention. OBJECTIVE To evaluate the use of recorded voice features as proxies for cognitive function by using neuropsychological test measures and existing dementia diagnoses. METHODS This study analyzed 170 audio recordings, transcripts, and paired neuropsychological test results from 135 participants selected from the Framingham Heart Study (FHS), which includes 97 recordings of cognitively normal participants and 73 recordings of cognitively impaired participants. Acoustic and linguistic features of the voice samples were correlated with cognitive performance measures to verify their association. RESULTS Language and voice features, when combined with demographic variables, performed with an AUC of 0.942 (95% CI 0.929-0.983) in predicting cognitive status. Features with good predictive power included the acoustic features mean spectral slope in the 500-1500 Hz band, variation in the F2 bandwidth, and variation in the Mel-Frequency Cepstral Coefficient (MFCC) 1; the demographic features employment, education, and age; and the text features of number of words, number of compound words, number of unique nouns, and number of proper names. CONCLUSION Several linguistic and acoustic biomarkers show correlations and predictive power with regard to neuropsychological testing results and cognitive impairment diagnoses, including dementia. This initial study paves the way for a follow-up comprehensive study incorporating the entire FHS cohort.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Rhoda Au
- Boston University, Boston, MA, USA
| | | |
Collapse
|
42
|
Depressive Mood Assessment Method Based on Emotion Level Derived from Voice: Comparison of Voice Features of Individuals with Major Depressive Disorders and Healthy Controls. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18105435. [PMID: 34069609 PMCID: PMC8161232 DOI: 10.3390/ijerph18105435] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 05/18/2021] [Accepted: 05/18/2021] [Indexed: 11/17/2022]
Abstract
Background: In many developed countries, mood disorders have become problematic, and the economic loss due to treatment costs and interference with work is immeasurable. Therefore, a simple technique to determine individuals’ depressive state and stress level is desired. Methods: We developed a method to assess specific the psychological issues of individuals with major depressive disorders using emotional components contained in their voice. We propose two indices: vitality, a short-term index, and mental activity, a long-term index capturing trends in vitality. To evaluate our method, we used the voices of healthy individuals (n = 14) and patients with major depression (n = 30). The patients were also assessed by specialists using the Hamilton Rating Scale for Depression (HAM-D). Results: A significant negative correlation existed between the vitality extracted from the voices and HAM-D scores (r = −0.33, p < 0.05). Furthermore, we could discriminate the voice data of healthy individuals and patients with depression with a high accuracy using the vitality indicator (p = 0.0085, area under the curve of the receiver operating characteristic curve = 0.76).
Collapse
|
43
|
Albuquerque L, Valente ARS, Teixeira A, Figueiredo D, Sa-Couto P, Oliveira C. Association between acoustic speech features and non-severe levels of anxiety and depression symptoms across lifespan. PLoS One 2021; 16:e0248842. [PMID: 33831018 PMCID: PMC8031302 DOI: 10.1371/journal.pone.0248842] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 03/07/2021] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Several studies have investigated the acoustic effects of diagnosed anxiety and depression. Anxiety and depression are not characteristics of the typical aging process, but minimal or mild symptoms can appear and evolve with age. However, the knowledge about the association between speech and anxiety or depression is scarce for minimal/mild symptoms, typical of healthy aging. As longevity and aging are still a new phenomenon worldwide, posing also several clinical challenges, it is important to improve our understanding of non-severe mood symptoms' impact on acoustic features across lifetime. The purpose of this study was to determine if variations in acoustic measures of voice are associated with non-severe anxiety or depression symptoms in adult population across lifetime. METHODS Two different speech tasks (reading vowels in disyllabic words and describing a picture) were produced by 112 individuals aged 35-97. To assess anxiety and depression symptoms, the Hospital Anxiety Depression Scale (HADS) was used. The association between the segmental and suprasegmental acoustic parameters and HADS scores were analyzed using the linear multiple regression technique. RESULTS The number of participants with presence of anxiety or depression symptoms is low (>7: 26.8% and 10.7%, respectively) and non-severe (HADS-A: 5.4 ± 2.9 and HADS-D: 4.2 ± 2.7, respectively). Adults with higher anxiety symptoms did not present significant relationships associated with the acoustic parameters studied. Adults with increased depressive symptoms presented higher vowel duration, longer total pause duration and short total speech duration. Finally, age presented a positive and significant effect only for depressive symptoms, showing that older participants tend to have more depressive symptoms. CONCLUSIONS Non-severe depression symptoms can be related to some acoustic parameters and age. Depression symptoms can be explained by acoustic parameters even among individuals without severe symptom levels.
Collapse
Affiliation(s)
- Luciana Albuquerque
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal
- Center of Health Technology and Services Research, University of Aveiro, Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
- Department of Education and Psychology, University of Aveiro, Aveiro, Portugal
- * E-mail:
| | - Ana Rita S. Valente
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - António Teixeira
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Daniela Figueiredo
- Center of Health Technology and Services Research, University of Aveiro, Aveiro, Portugal
- School of Health Science, University of Aveiro, Aveiro, Portugal
| | - Pedro Sa-Couto
- Center for Research and Development in Mathematics and Applications, University of Aveiro, Aveiro, Portugal
- Department of Mathematics, University of Aveiro, Aveiro, Portugal
| | - Catarina Oliveira
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal
- School of Health Science, University of Aveiro, Aveiro, Portugal
| |
Collapse
|
44
|
Fagherazzi G, Fischer A, Ismael M, Despotovic V. Voice for Health: The Use of Vocal Biomarkers from Research to Clinical Practice. Digit Biomark 2021; 5:78-88. [PMID: 34056518 PMCID: PMC8138221 DOI: 10.1159/000515346] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 02/18/2021] [Indexed: 12/17/2022] Open
Abstract
Diseases can affect organs such as the heart, lungs, brain, muscles, or vocal folds, which can then alter an individual's voice. Therefore, voice analysis using artificial intelligence opens new opportunities for healthcare. From using vocal biomarkers for diagnosis, risk prediction, and remote monitoring of various clinical outcomes and symptoms, we offer in this review an overview of the various applications of voice for health-related purposes. We discuss the potential of this rapidly evolving environment from a research, patient, and clinical perspective. We also discuss the key challenges to overcome in the near future for a substantial and efficient use of voice in healthcare.
Collapse
Affiliation(s)
- Guy Fagherazzi
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Aurélie Fischer
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Muhannad Ismael
- IT for Innovation in Services Department (ITIS), Luxembourg Institute of Science and Technology (LIST), Esch-sur-Alzette, Luxembourg
| | - Vladimir Despotovic
- Department of Computer Science, Faculty of Science, Technology and Medicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| |
Collapse
|
45
|
Wang CT, Han JY, Fang SH, Lai YH. Ambulatory Phonation Monitoring With Wireless Microphones Based on the Speech Energy Envelope: Algorithm Development and Validation. JMIR Mhealth Uhealth 2020; 8:e16746. [PMID: 33270033 PMCID: PMC7746501 DOI: 10.2196/16746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 06/03/2020] [Accepted: 10/03/2020] [Indexed: 11/13/2022] Open
Abstract
Background Voice disorders mainly result from chronic overuse or abuse, particularly in occupational voice users such as teachers. Previous studies proposed a contact microphone attached to the anterior neck for ambulatory voice monitoring; however, the inconvenience associated with taping and wiring, along with the lack of real-time processing, has limited its clinical application. Objective This study aims to (1) propose an automatic speech detection system using wireless microphones for real-time ambulatory voice monitoring, (2) examine the detection accuracy under controlled environment and noisy conditions, and (3) report the results of the phonation ratio in practical scenarios. Methods We designed an adaptive threshold function to detect the presence of speech based on the energy envelope. We invited 10 teachers to participate in this study and tested the performance of the proposed automatic speech detection system regarding detection accuracy and phonation ratio. Moreover, we investigated whether the unsupervised noise reduction algorithm (ie, log minimum mean square error) can overcome the influence of environmental noise in the proposed system. Results The proposed system exhibited an average accuracy of speech detection of 89.9%, ranging from 81.0% (67,357/83,157 frames) to 95.0% (199,201/209,685 frames). Subsequent analyses revealed a phonation ratio between 44.0% (33,019/75,044 frames) and 78.0% (68,785/88,186 frames) during teaching sessions of 40-60 minutes; the durations of most of the phonation segments were less than 10 seconds. The presence of background noise reduced the accuracy of the automatic speech detection system, and an adjuvant noise reduction function could effectively improve the accuracy, especially under stable noise conditions. Conclusions This study demonstrated an average detection accuracy of 89.9% in the proposed automatic speech detection system with wireless microphones. The preliminary results for the phonation ratio were comparable to those of previous studies. Although the wireless microphones are susceptible to background noise, an additional noise reduction function can alleviate this limitation. These results indicate that the proposed system can be applied for ambulatory voice monitoring in occupational voice users.
Collapse
Affiliation(s)
- Chi-Te Wang
- Department of Otolaryngology Head and Neck Surgery, Far Eastern Memorial Hospital, New Taipei City, Taiwan.,Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan.,Department of Special Education, University of Taipei, Taipei, Taiwan
| | - Ji-Yan Han
- Department of Biomedical Engineering, National Yang-Ming University, Taipei, Taiwan
| | - Shih-Hau Fang
- Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan.,Ministry of Science and Technology Joint Research Center for Artificial Intelligence Technology and All Vista Healthcare, Taoyuan, Taiwan
| | - Ying-Hui Lai
- Department of Biomedical Engineering, National Yang-Ming University, Taipei, Taiwan
| |
Collapse
|
46
|
Stegmann GM, Hahn S, Liss J, Shefner J, Rutkove SB, Kawabata K, Bhandari S, Shelton K, Duncan CJ, Berisha V. Repeatability of Commonly Used Speech and Language Features for Clinical Applications. Digit Biomark 2020; 4:109-122. [PMID: 33442573 DOI: 10.1159/000511671] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 09/16/2020] [Indexed: 12/17/2022] Open
Abstract
Introduction Changes in speech have the potential to provide important information on the diagnosis and progression of various neurological diseases. Many researchers have relied on open-source speech features to develop algorithms for measuring speech changes in clinical populations as they are convenient and easy to use. However, the repeatability of open-source features in the context of neurological diseases has not been studied. Methods We used a longitudinal sample of healthy controls, individuals with amyotrophic lateral sclerosis, and individuals with suspected frontotemporal dementia, and we evaluated the repeatability of acoustic and language features separately on these 3 data sets. Results Repeatability was evaluated using intraclass correlation (ICC) and the within-subjects coefficient of variation (WSCV). In 3 sets of tasks, the median ICC were between 0.02 and 0.55, and the median WSCV were between 29 and 79%. Conclusion Our results demonstrate that the repeatability of speech features extracted using open-source tool kits is low. Researchers should exercise caution when developing digital health models with open-source speech features. We provide a detailed summary of feature-by-feature repeatability results (ICC, WSCV, SE of measurement, limits of agreement for WSCV, and minimal detectable change) in the online supplementary material so that researchers may incorporate repeatability information into the models they develop.
Collapse
Affiliation(s)
- Gabriela M Stegmann
- Arizona State University, Phoenix, Arizona, USA.,Aural Analytics, Scottsdale, Arizona, USA
| | - Shira Hahn
- Arizona State University, Phoenix, Arizona, USA.,Aural Analytics, Scottsdale, Arizona, USA
| | - Julie Liss
- Arizona State University, Phoenix, Arizona, USA.,Aural Analytics, Scottsdale, Arizona, USA
| | | | - Seward B Rutkove
- Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | | | | | | | | | - Visar Berisha
- Arizona State University, Phoenix, Arizona, USA.,Aural Analytics, Scottsdale, Arizona, USA
| |
Collapse
|
47
|
Villongco C, Khan F. "Sorry I Didn't Hear You." The Ethics of Voice Computing and AI in High Risk Mental Health Populations. AJOB Neurosci 2020; 11:105-112. [PMID: 32228383 DOI: 10.1080/21507740.2020.1740355] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
This article examines the ethical and policy implications of using voice computing and artificial intelligence to screen for mental health conditions in low income and minority populations. Mental health is unequally distributed among these groups, which is further exacerbated by increased barriers to psychiatric care. Advancements in voice computing and artificial intelligence promise increased screening and more sensitive diagnostic assessments. Machine learning algorithms have the capacity to identify vocal features that can screen those with depression. However, in order to screen for mental health pathology, computer algorithms must first be able to account for the fundamental differences in vocal characteristics between low income minorities and those who are not. While researchers have envisioned this technology as a beneficent tool, this technology could be repurposed to scale up discrimination or exploitation. Studies on the use of big data and predictive analytics demonstrate that low income minority populations already face significant discrimination. This article urges researchers developing AI tools for vulnerable populations to consider the full ethical, legal, and social impact of their work. Without a national, coherent framework of legal regulations and ethical guidelines to protect vulnerable populations, it will be difficult to limit AI applications to solely beneficial uses. Without such protections, vulnerable populations will rightfully be wary of participating in such studies which also will negatively impact the robustness of such tools. Thus, for research involving AI tools like voice computing, it is in the research community's interest to demand more guidance and regulatory oversight from the federal government.
Collapse
|
48
|
Robin J, Harrison JE, Kaufman LD, Rudzicz F, Simpson W, Yancheva M. Evaluation of Speech-Based Digital Biomarkers: Review and Recommendations. Digit Biomark 2020; 4:99-108. [PMID: 33251474 DOI: 10.1159/000510820] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 08/11/2020] [Indexed: 12/23/2022] Open
Abstract
Speech represents a promising novel biomarker by providing a window into brain health, as shown by its disruption in various neurological and psychiatric diseases. As with many novel digital biomarkers, however, rigorous evaluation is currently lacking and is required for these measures to be used effectively and safely. This paper outlines and provides examples from the literature of evaluation steps for speech-based digital biomarkers, based on the recent V3 framework (Goldsack et al., 2020). The V3 framework describes 3 components of evaluation for digital biomarkers: verification, analytical validation, and clinical validation. Verification includes assessing the quality of speech recordings and comparing the effects of hardware and recording conditions on the integrity of the recordings. Analytical validation includes checking the accuracy and reliability of data processing and computed measures, including understanding test-retest reliability, demographic variability, and comparing measures to reference standards. Clinical validity involves verifying the correspondence of a measure to clinical outcomes which can include diagnosis, disease progression, or response to treatment. For each of these sections, we provide recommendations for the types of evaluation necessary for speech-based biomarkers and review published examples. The examples in this paper focus on speech-based biomarkers, but they can be used as a template for digital biomarker development more generally.
Collapse
Affiliation(s)
| | - John E Harrison
- Metis Cognition Ltd., Park House, Kilmington Common, Warminster, United Kingdom.,Alzheimer Center, AUmc, Amsterdam, The Netherlands.,Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | | | - Frank Rudzicz
- Li Ka Shing Knowledge Institute, St Michael's Hospital, Toronto, Ontario, Canada.,Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - William Simpson
- Winterlight Labs, Toronto, Ontario, Canada.,Department of Psychiatry and Behavioural Neuroscience, McMaster University, Hamilton, Ontario, Canada
| | | |
Collapse
|
49
|
Cychosz M, Romeo R, Soderstrom M, Scaff C, Ganek H, Cristia A, Casillas M, de Barbaro K, Bang JY, Weisleder A. Longform recordings of everyday life: Ethics for best practices. Behav Res Methods 2020; 52:1951-1969. [PMID: 32103465 PMCID: PMC7483614 DOI: 10.3758/s13428-020-01365-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Recent advances in large-scale data storage and processing offer unprecedented opportunities for behavioral scientists to collect and analyze naturalistic data, including from underrepresented groups. Audio data, particularly real-world audio recordings, are of particular interest to behavioral scientists because they provide high-fidelity access to subtle aspects of daily life and social interactions. However, these methodological advances pose novel risks to research participants and communities. In this article, we outline the benefits and challenges associated with collecting, analyzing, and sharing multi-hour audio recording data. Guided by the principles of autonomy, privacy, beneficence, and justice, we propose a set of ethical guidelines for the use of longform audio recordings in behavioral research. This article is also accompanied by an Open Science Framework Ethics Repository that includes informed consent resources such as frequent participant concerns and sample consent forms.
Collapse
Affiliation(s)
- Margaret Cychosz
- Department of Linguistics, University of California, 1203 Dwinelle Hall, Berkeley, CA, 94720, USA.
| | - Rachel Romeo
- Boston Children's Hospital and Massachusetts Institute of Technology, Boston, MA, USA
| | | | - Camila Scaff
- Human Ecology Group, Institute of Evolutionary Medicine, University of Zurich, Zürich, Switzerland
| | | | - Alejandrina Cristia
- Laboratoire de Sciences Cognitives et de Psycholinguistique, Département d'études cognitives, ENS, EHESS, CNRS, PSL University, Paris, France
| | - Marisa Casillas
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Kaya de Barbaro
- Department of Psychology, The University of Texas at Austin, Austin, TX, USA
| | - Janet Y Bang
- Department of Psychology, Stanford University, Stanford, CA, USA
| | - Adriana Weisleder
- Department of Communication Sciences and Disorders, Northwestern University, 2240 Campus Dr., Frances Searle Building, Room 3-358, Evanston, IL, 60208, USA.
| |
Collapse
|
50
|
Shinohara S, Toda H, Nakamura M, Omiya Y, Higuchi M, Takano T, Saito T, Tanichi M, Boku S, Mitsuyoshi S, So M, Yoshino A, Tokuno S. Evaluation of the Severity of Major Depression Using a Voice Index for Emotional Arousal. SENSORS (BASEL, SWITZERLAND) 2020; 20:E5041. [PMID: 32899881 PMCID: PMC7570922 DOI: 10.3390/s20185041] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 09/03/2020] [Accepted: 09/03/2020] [Indexed: 11/16/2022]
Abstract
Recently, the relationship between emotional arousal and depression has been studied. Focusing on this relationship, we first developed an arousal level voice index (ALVI) to measure arousal levels using the Interactive Emotional Dyadic Motion Capture database. Then, we calculated ALVI from the voices of depressed patients from two hospitals (Ginza Taimei Clinic (H1) and National Defense Medical College hospital (H2)) and compared them with the severity of depression as measured by the Hamilton Rating Scale for Depression (HAM-D). Depending on the HAM-D score, the datasets were classified into a no depression (HAM-D < 8) and a depression group (HAM-D ≥ 8) for each hospital. A comparison of the mean ALVI between the groups was performed using the Wilcoxon rank-sum test and a significant difference at the level of 10% (p = 0.094) at H1 and 1% (p = 0.0038) at H2 was determined. The area under the curve (AUC) of the receiver operating characteristic was 0.66 when categorizing between the two groups for H1, and the AUC for H2 was 0.70. The relationship between arousal level and depression severity was indirectly suggested via the ALVI.
Collapse
Affiliation(s)
- Shuji Shinohara
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan; (M.N.); (M.H.); (S.M.); (S.T.)
| | - Hiroyuki Toda
- Department of Psychiatry, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan; (H.T.); (T.S.); (M.T.); (A.Y.)
| | - Mitsuteru Nakamura
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan; (M.N.); (M.H.); (S.M.); (S.T.)
| | - Yasuhiro Omiya
- PST Inc., Industry & Trade Center Building 905, 2 Yamashita-cho, Naka-ku, Yokohama, Kanagawa 231-0023, Japan; (Y.O.); (T.T.)
| | - Masakazu Higuchi
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan; (M.N.); (M.H.); (S.M.); (S.T.)
| | - Takeshi Takano
- PST Inc., Industry & Trade Center Building 905, 2 Yamashita-cho, Naka-ku, Yokohama, Kanagawa 231-0023, Japan; (Y.O.); (T.T.)
| | - Taku Saito
- Department of Psychiatry, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan; (H.T.); (T.S.); (M.T.); (A.Y.)
| | - Masaaki Tanichi
- Department of Psychiatry, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan; (H.T.); (T.S.); (M.T.); (A.Y.)
| | - Shuken Boku
- Department of Neuropsychiatry, Faculty of Life Sciences, Kumamoto University, 1-1-1 Honjo, Chuo-ku, Kumamoto, Kumamoto 860-8556, Japan;
| | - Shunji Mitsuyoshi
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan; (M.N.); (M.H.); (S.M.); (S.T.)
| | - Mirai So
- Department of Psychiatry, Tokyo Dental College, 2-9-18, Misakicho, Chiyoda-ku, Tokyo 101-0061, Japan;
| | - Aihide Yoshino
- Department of Psychiatry, National Defense Medical College, 3-2 Namiki, Tokorozawa, Saitama 359-8513, Japan; (H.T.); (T.S.); (M.T.); (A.Y.)
| | - Shinichi Tokuno
- Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan; (M.N.); (M.H.); (S.M.); (S.T.)
| |
Collapse
|