1
|
Xu X, Li J, Zhu Z, Zhao L, Wang H, Song C, Chen Y, Zhao Q, Yang J, Pei Y. A Comprehensive Review on Synergy of Multi-Modal Data and AI Technologies in Medical Diagnosis. Bioengineering (Basel) 2024; 11:219. [PMID: 38534493 DOI: 10.3390/bioengineering11030219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 02/15/2024] [Accepted: 02/21/2024] [Indexed: 03/28/2024] Open
Abstract
Disease diagnosis represents a critical and arduous endeavor within the medical field. Artificial intelligence (AI) techniques, spanning from machine learning and deep learning to large model paradigms, stand poised to significantly augment physicians in rendering more evidence-based decisions, thus presenting a pioneering solution for clinical practice. Traditionally, the amalgamation of diverse medical data modalities (e.g., image, text, speech, genetic data, physiological signals) is imperative to facilitate a comprehensive disease analysis, a topic of burgeoning interest among both researchers and clinicians in recent times. Hence, there exists a pressing need to synthesize the latest strides in multi-modal data and AI technologies in the realm of medical diagnosis. In this paper, we narrow our focus to five specific disorders (Alzheimer's disease, breast cancer, depression, heart disease, epilepsy), elucidating advanced endeavors in their diagnosis and treatment through the lens of artificial intelligence. Our survey not only delineates detailed diagnostic methodologies across varying modalities but also underscores commonly utilized public datasets, the intricacies of feature engineering, prevalent classification models, and envisaged challenges for future endeavors. In essence, our research endeavors to contribute to the advancement of diagnostic methodologies, furnishing invaluable insights for clinical decision making.
Collapse
Affiliation(s)
- Xi Xu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Zhichao Zhu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Linna Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Huina Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Changwei Song
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Yining Chen
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Qing Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Jijiang Yang
- Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | - Yan Pei
- School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, Japan
| |
Collapse
|
2
|
黄 祥, 廖 义, 张 文, 张 莉. [A research on depression recognition based on voice pre-training model]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2024; 41:9-16. [PMID: 38403599 PMCID: PMC10894747 DOI: 10.7507/1001-5515.202304008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 11/24/2023] [Indexed: 02/27/2024]
Abstract
For the increasing number of patients with depression, this paper proposes an artificial intelligence method to effectively identify depression through voice signals, with the aim of improving the efficiency of diagnosis and treatment. Firstly, a pre-training model called wav2vec 2.0 is fine-tuned to encode and contextualize the speech, thereby obtaining high-quality voice features. This model is applied to the publicly available dataset - the distress analysis interview corpus-wizard of OZ (DAIC-WOZ). The results demonstrate a precision rate of 93.96%, a recall rate of 94.87%, and an F1 score of 94.41% for the binary classification task of depression recognition, resulting in an overall classification accuracy of 96.48%. For the four-class classification task evaluating the severity of depression, the precision rates are all above 92.59%, the recall rates are all above 92.89%, the F1 scores are all above 93.12%, and the overall classification accuracy is 94.80%. The research findings indicate that the proposed method effectively enhances classification accuracy in scenarios with limited data, exhibiting strong performance in depression identification and severity evaluation. In the future, this method has the potential to serve as a valuable supportive tool for depression diagnosis.
Collapse
Affiliation(s)
- 祥胜 黄
- 中南民族大学 生物医学工程学院(武汉 430074)School of Biomedical Engineering, South-Central Minzu University, Wuhan 430074, P. R. China
| | - 义龙 廖
- 中南民族大学 生物医学工程学院(武汉 430074)School of Biomedical Engineering, South-Central Minzu University, Wuhan 430074, P. R. China
| | - 文劲 张
- 中南民族大学 生物医学工程学院(武汉 430074)School of Biomedical Engineering, South-Central Minzu University, Wuhan 430074, P. R. China
| | - 莉 张
- 中南民族大学 生物医学工程学院(武汉 430074)School of Biomedical Engineering, South-Central Minzu University, Wuhan 430074, P. R. China
| |
Collapse
|
3
|
Mao K, Wu Y, Chen J. A systematic review on automated clinical depression diagnosis. NPJ MENTAL HEALTH RESEARCH 2023; 2:20. [PMID: 38609509 PMCID: PMC10955993 DOI: 10.1038/s44184-023-00040-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 09/27/2023] [Indexed: 04/14/2024]
Abstract
Assessing mental health disorders and determining treatment can be difficult for a number of reasons, including access to healthcare providers. Assessments and treatments may not be continuous and can be limited by the unpredictable nature of psychiatric symptoms. Machine-learning models using data collected in a clinical setting can improve diagnosis and treatment. Studies have used speech, text, and facial expression analysis to identify depression. Still, more research is needed to address challenges such as the need for multimodality machine-learning models for clinical use. We conducted a review of studies from the past decade that utilized speech, text, and facial expression analysis to detect depression, as defined by the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guideline. We provide information on the number of participants, techniques used to assess clinical outcomes, speech-eliciting tasks, machine-learning algorithms, metrics, and other important discoveries for each study. A total of 544 studies were examined, 264 of which satisfied the inclusion criteria. A database has been created containing the query results and a summary of how different features are used to detect depression. While machine learning shows its potential to enhance mental health disorder evaluations, some obstacles must be overcome, especially the requirement for more transparent machine-learning models for clinical purposes. Considering the variety of datasets, feature extraction techniques, and metrics used in this field, guidelines have been provided to collect data and train machine-learning models to guarantee reproducibility and generalizability across different contexts.
Collapse
Affiliation(s)
- Kaining Mao
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Yuqi Wu
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, T6G 2R3, Canada
| | - Jie Chen
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, T6G 2R3, Canada.
| |
Collapse
|
4
|
Yang W, Liu J, Cao P, Zhu R, Wang Y, Liu JK, Wang F, Zhang X. Attention guided learnable time-domain filterbanks for speech depression detection. Neural Netw 2023; 165:135-149. [PMID: 37285730 DOI: 10.1016/j.neunet.2023.05.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 05/13/2023] [Accepted: 05/20/2023] [Indexed: 06/09/2023]
Abstract
Depression, as a global mental health problem, is lacking effective screening methods that can help with early detection and treatment. This paper aims to facilitate the large-scale screening of depression by focusing on the speech depression detection (SDD) task. Currently, direct modeling on the raw signal yields a large number of parameters, and the existing deep learning-based SDD models mainly use the fixed Mel-scale spectral features as input. However, these features are not designed for depression detection, and the manual settings limit the exploration of fine-grained feature representations. In this paper, we learn the effective representations of the raw signals from an interpretable perspective. Specifically, we present a joint learning framework with attention-guided learnable time-domain filterbanks for depression classification (DALF), which collaborates with the depression filterbanks features learning (DFBL) module and multi-scale spectral attention learning (MSSA) module. DFBL is capable of producing biologically meaningful acoustic features by employing learnable time-domain filters, and MSSA is used to guide the learnable filters to better retain the useful frequency sub-bands. We collect a new dataset, the Neutral Reading-based Audio Corpus (NRAC), to facilitate the research in depression analysis, and we evaluate the performance of DALF on the NRAC and the public DAIC-woz datasets. The experimental results demonstrate that our method outperforms the state-of-the-art SDD methods with an F1 of 78.4% on the DAIC-woz dataset. In particular, DALF achieves F1 scores of 87.3% and 81.7% on two parts of the NRAC dataset. By analyzing the filter coefficients, we find that the most important frequency range identified by our method is 600-700Hz, which corresponds to the Mandarin vowels /e/ and /eˆ/ and can be considered as an effective biomarker for the SDD task. Taken together, our DALF model provides a promising approach to depression detection.
Collapse
Affiliation(s)
- Wenju Yang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, 110819, Liaoning, China
| | - Jiankang Liu
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, 110819, Liaoning, China
| | - Peng Cao
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning, China; Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, 110819, Liaoning, China.
| | - Rongxin Zhu
- Early Intervention Unit, Department of Psychiatry, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, 210096, China
| | - Yang Wang
- Early Intervention Unit, Department of Psychiatry, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, 210096, China
| | - Jian K Liu
- School of Computing, University of Leeds, Leeds, LS2 9JT, United Kingdom
| | - Fei Wang
- Early Intervention Unit, Department of Psychiatry, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, 210096, China.
| | - Xizhe Zhang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China.
| |
Collapse
|
5
|
Pan W, Deng F, Wang X, Hang B, Zhou W, Zhu T. Exploring the ability of vocal biomarkers in distinguishing depression from bipolar disorder, schizophrenia, and healthy controls. Front Psychiatry 2023; 14:1079448. [PMID: 37575564 PMCID: PMC10415910 DOI: 10.3389/fpsyt.2023.1079448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 06/30/2023] [Indexed: 08/15/2023] Open
Abstract
Background Vocal features have been exploited to distinguish depression from healthy controls. While there have been some claims for success, the degree to which changes in vocal features are specific to depression has not been systematically studied. Hence, we examined the performances of vocal features in differentiating depression from bipolar disorder (BD), schizophrenia and healthy controls, as well as pairwise classifications for the three disorders. Methods We sampled 32 bipolar disorder patients, 106 depression patients, 114 healthy controls, and 20 schizophrenia patients. We extracted i-vectors from Mel-frequency cepstrum coefficients (MFCCs), and built logistic regression models with ridge regularization and 5-fold cross-validation on the training set, then applied models to the test set. There were seven classification tasks: any disorder versus healthy controls; depression versus healthy controls; BD versus healthy controls; schizophrenia versus healthy controls; depression versus BD; depression versus schizophrenia; BD versus schizophrenia. Results The area under curve (AUC) score for classifying depression and bipolar disorder was 0.5 (F-score = 0.44). For other comparisons, the AUC scores ranged from 0.75 to 0.92, and the F-scores ranged from 0.73 to 0.91. The model performance (AUC) of classifying depression and bipolar disorder was significantly worse than that of classifying bipolar disorder and schizophrenia (corrected p < 0.05). While there were no significant differences in the remaining pairwise comparisons of the 7 classification tasks. Conclusion Vocal features showed discriminatory potential in classifying depression and the healthy controls, as well as between depression and other mental disorders. Future research should systematically examine the mechanisms of voice features in distinguishing depression with other mental disorders and develop more sophisticated machine learning models so that voice can assist clinical diagnosis better.
Collapse
Affiliation(s)
- Wei Pan
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Fusong Deng
- Wuhan Wuchang Hospital, Wuchang Hospital Affiliated to Wuhan University of Science and Technology, Wuhan, China
| | - Xianbin Wang
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Bowen Hang
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Wenwei Zhou
- Key Laboratory of Adolescent Cyberpsychology and Behavior (CCNU), Ministry of Education, Wuhan, China
- School of Psychology, Central China Normal University, Wuhan, China
- Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan, China
| | - Tingshao Zhu
- Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
6
|
Tang SX, Hänsel K, Cong Y, Nikzad AH, Mehta A, Cho S, Berretta S, Behbehani L, Pradhan S, John M, Liberman MY. Latent Factors of Language Disturbance and Relationships to Quantitative Speech Features. Schizophr Bull 2023; 49:S93-S103. [PMID: 36946530 PMCID: PMC10031730 DOI: 10.1093/schbul/sbac145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Abstract
BACKGROUND AND HYPOTHESIS Quantitative acoustic and textual measures derived from speech ("speech features") may provide valuable biomarkers for psychiatric disorders, particularly schizophrenia spectrum disorders (SSD). We sought to identify cross-diagnostic latent factors for speech disturbance with relevance for SSD and computational modeling. STUDY DESIGN Clinical ratings for speech disturbance were generated across 14 items for a cross-diagnostic sample (N = 334), including SSD (n = 90). Speech features were quantified using an automated pipeline for brief recorded samples of free speech. Factor models for the clinical ratings were generated using exploratory factor analysis, then tested with confirmatory factor analysis in the cross-diagnostic and SSD groups. The relationships between factor scores and computational speech features were examined for 202 of the participants. STUDY RESULTS We found a 3-factor model with a good fit in the cross-diagnostic group and an acceptable fit for the SSD subsample. The model identifies an impaired expressivity factor and 2 interrelated disorganized factors for inefficient and incoherent speech. Incoherent speech was specific to psychosis groups, while inefficient speech and impaired expressivity showed intermediate effects in people with nonpsychotic disorders. Each of the 3 factors had significant and distinct relationships with speech features, which differed for the cross-diagnostic vs SSD groups. CONCLUSIONS We report a cross-diagnostic 3-factor model for speech disturbance which is supported by good statistical measures, intuitive, applicable to SSD, and relatable to linguistic theories. It provides a valuable framework for understanding speech disturbance and appropriate targets for modeling with quantitative speech features.
Collapse
Affiliation(s)
- Sunny X Tang
- Institute of Behavioral Science, Feinstein Institutes for Medical Research, Glen Oaks, USA
| | - Katrin Hänsel
- Department of Laboratory Medicine, Yale University, New Haven, USA
| | - Yan Cong
- Institute of Behavioral Science, Feinstein Institutes for Medical Research, Glen Oaks, USA
| | - Amir H Nikzad
- Institute of Behavioral Science, Feinstein Institutes for Medical Research, Glen Oaks, USA
| | - Aarush Mehta
- Institute of Behavioral Science, Feinstein Institutes for Medical Research, Glen Oaks, USA
| | - Sunghye Cho
- Linguistic Data Consortium, University of Pennsylvania, Philadelphia, USA
| | - Sarah Berretta
- Institute of Behavioral Science, Feinstein Institutes for Medical Research, Glen Oaks, USA
| | - Leily Behbehani
- Institute of Behavioral Science, Feinstein Institutes for Medical Research, Glen Oaks, USA
| | - Sameer Pradhan
- Linguistic Data Consortium, University of Pennsylvania, Philadelphia, USA
| | - Majnu John
- Institute of Behavioral Science, Feinstein Institutes for Medical Research, Glen Oaks, USA
| | - Mark Y Liberman
- Linguistic Data Consortium, University of Pennsylvania, Philadelphia, USA
| |
Collapse
|
7
|
Du M, Liu S, Wang T, Zhang W, Ke Y, Chen L, Ming D. Depression recognition using a proposed speech chain model fusing speech production and perception features. J Affect Disord 2023; 323:299-308. [PMID: 36462607 DOI: 10.1016/j.jad.2022.11.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 10/22/2022] [Accepted: 11/20/2022] [Indexed: 12/05/2022]
Abstract
BACKGROUND Increasing depression patients puts great pressure on clinical diagnosis. Audio-based diagnosis is a helpful auxiliary tool for early mass screening. However, current methods consider only speech perception features, ignoring patients' vocal tract changes, which may partly result in the poor recognition. METHODS This work proposes a novel machine speech chain model for depression recognition (MSCDR) that can capture text-independent depressive speech representation from the speaker's mouth to the listener's ear to improve recognition performance. In the proposed MSCDR, linear predictive coding (LPC) and Mel-frequency cepstral coefficients (MFCC) features are extracted to describe the processes of speech generation and of speech perception, respectively. Then, a one-dimensional convolutional neural network and a long short-term memory network sequentially capture intra- and inter-segment dynamic depressive features for classification. RESULTS We tested the MSCDR on two public datasets with different languages and paradigms, namely, the Distress Analysis Interview Corpus-Wizard of Oz and the Multi-modal Open Dataset for Mental-disorder Analysis. The accuracy of the MSCDR on the two datasets was 0.77 and 0.86, and the average F1 score was 0.75 and 0.86, which were better than the other existing methods. This improvement reveals the complementarity of speech production and perception features in carrying depressive information. LIMITATIONS The sample size was relatively small, which may limit the application in clinical translation to some extent. CONCLUSION This experiment proves the good generalization ability and superiority of the proposed MSCDR and suggests that the vocal tract changes in patients with depression deserve attention for audio-based depression diagnosis.
Collapse
Affiliation(s)
- Minghao Du
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Shuang Liu
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China.
| | - Tao Wang
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Wenquan Zhang
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Yufeng Ke
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Long Chen
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Dong Ming
- Tianjin International Joint Research Center for Neural Engineering, Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China; Lab of Neural Engineering & Rehabilitation, Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin, China.
| |
Collapse
|
8
|
Eysenbach G, Jang EH, Lee SH, Choi KY, Park JG, Shin HC. Automatic Depression Detection Using Smartphone-Based Text-Dependent Speech Signals: Deep Convolutional Neural Network Approach. J Med Internet Res 2023; 25:e34474. [PMID: 36696160 PMCID: PMC9909514 DOI: 10.2196/34474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 05/20/2022] [Accepted: 12/18/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Automatic diagnosis of depression based on speech can complement mental health treatment methods in the future. Previous studies have reported that acoustic properties can be used to identify depression. However, few studies have attempted a large-scale differential diagnosis of patients with depressive disorders using acoustic characteristics of non-English speakers. OBJECTIVE This study proposes a framework for automatic depression detection using large-scale acoustic characteristics based on the Korean language. METHODS We recruited 153 patients who met the criteria for major depressive disorder and 165 healthy controls without current or past mental illness. Participants' voices were recorded on a smartphone while performing the task of reading predefined text-based sentences. Three approaches were evaluated and compared to detect depression using data sets with text-dependent read speech tasks: conventional machine learning models based on acoustic features, a proposed model that trains and classifies log-Mel spectrograms by applying a deep convolutional neural network (CNN) with a relatively small number of parameters, and models that train and classify log-Mel spectrograms by applying well-known pretrained networks. RESULTS The acoustic characteristics of the predefined text-based sentence reading automatically detected depression using the proposed CNN model. The highest accuracy achieved with the proposed CNN on the speech data was 78.14%. Our results show that the deep-learned acoustic characteristics lead to better performance than those obtained using the conventional approach and pretrained models. CONCLUSIONS Checking the mood of patients with major depressive disorder and detecting the consistency of objective descriptions are very important research topics. This study suggests that the analysis of speech data recorded while reading text-dependent sentences could help predict depression status automatically by capturing the characteristics of depression. Our method is smartphone based, is easily accessible, and can contribute to the automatic identification of depressive states.
Collapse
Affiliation(s)
| | - Eun Hye Jang
- Medical Information Research Section, Electronics and Telecommunications Research Institute, Dajeon, Republic of Korea
| | - Seung-Hwan Lee
- Clinical Emotion and Cognition Research Laboratory, Inje University, Goyang, Republic of Korea.,Department of Psychiatry, Inje University, Ilsan-Paik Hospital, Goyang, Republic of Korea.,Bwave Inc, Goyang, Republic of Korea
| | - Kwang-Yeon Choi
- Department of Psychiatry, College of Medicine, Chungnam National University, Daejeon, Republic of Korea
| | - Jeon Gue Park
- Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute, Dajeon, Republic of Korea.,Tutorus Labs Inc, Seoul, Republic of Korea
| | - Hyun-Chool Shin
- Department of Electronics Engineering, Soongsil University, Seoul, Republic of Korea
| |
Collapse
|
9
|
Koops S, Brederoo SG, de Boer JN, Nadema FG, Voppel AE, Sommer IE. Speech as a Biomarker for Depression. CNS & NEUROLOGICAL DISORDERS DRUG TARGETS 2023; 22:152-160. [PMID: 34961469 DOI: 10.2174/1871527320666211213125847] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 10/10/2021] [Accepted: 10/10/2021] [Indexed: 01/01/2023]
Abstract
BACKGROUND Depression is a debilitating disorder that at present lacks a reliable biomarker to aid in diagnosis and early detection. Recent advances in computational analytic approaches have opened up new avenues in developing such a biomarker by taking advantage of the wealth of information that can be extracted from a person's speech. OBJECTIVE The current review provides an overview of the latest findings in the rapidly evolving field of computational language analysis for the detection of depression. We cover a wide range of both acoustic and content-related linguistic features, data types (i.e., spoken and written language), and data sources (i.e., lab settings, social media, and smartphone-based). We put special focus on the current methodological advances with regard to feature extraction and computational modeling techniques. Furthermore, we pay attention to potential hurdles in the implementation of automatic speech analysis. CONCLUSION Depressive speech is characterized by several anomalies, such as lower speech rate, less pitch variability and more self-referential speech. With current computational modeling techniques, such features can be used to detect depression with an accuracy of up to 91%. The performance of the models is optimized when machine learning techniques are implemented that suit the type and amount of data. Recent studies now work towards further optimization and generalizability of the computational language models to detect depression. Finally, privacy and ethical issues are of paramount importance to be addressed when automatic speech analysis techniques are further implemented in, for example, smartphones. Altogether, computational speech analysis is well underway towards becoming an effective diagnostic aid for depression.
Collapse
Affiliation(s)
- Sanne Koops
- Department of Biomedical Sciences of Cells & Systems, Cognitive Neurosciences, University of Groningen, University Medical Center Groningen (UMCG), Groningen, The Netherlands
| | - Sanne G Brederoo
- Department of Biomedical Sciences of Cells & Systems, Cognitive Neurosciences, University of Groningen, University Medical Center Groningen (UMCG), Groningen, The Netherlands
- University Center for Psychiatry, University Medical Center Groningen, Groningen, The Netherlands
| | - Janna N de Boer
- Department of Psychiatry, University Medical Center Utrecht, Utrecht University & Brain Center Rudolf Magnus, Utrecht, The Netherlands
| | - Femke G Nadema
- Department of Biomedical Sciences of Cells & Systems, Cognitive Neurosciences, University of Groningen, University Medical Center Groningen (UMCG), Groningen, The Netherlands
| | - Alban E Voppel
- Department of Biomedical Sciences of Cells & Systems, Cognitive Neurosciences, University of Groningen, University Medical Center Groningen (UMCG), Groningen, The Netherlands
| | - Iris E Sommer
- Department of Biomedical Sciences of Cells & Systems, Cognitive Neurosciences, University of Groningen, University Medical Center Groningen (UMCG), Groningen, The Netherlands
| |
Collapse
|
10
|
Liu D, Liu B, Lin T, Liu G, Yang G, Qi D, Qiu Y, Lu Y, Yuan Q, Shuai SC, Li X, Liu O, Tang X, Shuai J, Cao Y, Lin H. Measuring depression severity based on facial expression and body movement using deep convolutional neural network. Front Psychiatry 2022; 13:1017064. [PMID: 36620657 PMCID: PMC9810804 DOI: 10.3389/fpsyt.2022.1017064] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 12/02/2022] [Indexed: 12/24/2022] Open
Abstract
Introduction Real-time evaluations of the severity of depressive symptoms are of great significance for the diagnosis and treatment of patients with major depressive disorder (MDD). In clinical practice, the evaluation approaches are mainly based on psychological scales and doctor-patient interviews, which are time-consuming and labor-intensive. Also, the accuracy of results mainly depends on the subjective judgment of the clinician. With the development of artificial intelligence (AI) technology, more and more machine learning methods are used to diagnose depression by appearance characteristics. Most of the previous research focused on the study of single-modal data; however, in recent years, many studies have shown that multi-modal data has better prediction performance than single-modal data. This study aimed to develop a measurement of depression severity from expression and action features and to assess its validity among the patients with MDD. Methods We proposed a multi-modal deep convolutional neural network (CNN) to evaluate the severity of depressive symptoms in real-time, which was based on the detection of patients' facial expression and body movement from videos captured by ordinary cameras. We established behavioral depression degree (BDD) metrics, which combines expression entropy and action entropy to measure the depression severity of MDD patients. Results We found that the information extracted from different modes, when integrated in appropriate proportions, can significantly improve the accuracy of the evaluation, which has not been reported in previous studies. This method presented an over 74% Pearson similarity between BDD and self-rating depression scale (SDS), self-rating anxiety scale (SAS), and Hamilton depression scale (HAMD). In addition, we tracked and evaluated the changes of BDD in patients at different stages of a course of treatment and the results obtained were in agreement with the evaluation from the scales. Discussion The BDD can effectively measure the current state of patients' depression and its changing trend according to the patient's expression and action features. Our model may provide an automatic auxiliary tool for the diagnosis and treatment of MDD.
Collapse
Affiliation(s)
- Dongdong Liu
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
| | - Bowen Liu
- Department of Psychiatry, National Clinical Research Center for Mental Disorders, The Second Xiangya Hospital of Central South University, Changsha, China
- Department of Psychiatry, Baoan Mental Health Center, Shenzhen Baoan Center for Chronic Disease Control, Shenzhen, China
| | - Tao Lin
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Guangya Liu
- Integrated Chinese and Western Therapy of Depression Ward, Hunan Brain Hospital, Changsha, China
| | - Guoyu Yang
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
| | - Dezhen Qi
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
| | - Ye Qiu
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Yuer Lu
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Qinmei Yuan
- Department of Psychiatry, National Clinical Research Center for Mental Disorders, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Stella C. Shuai
- Department of Biological Sciences, Northwestern University, Evanston, IL, United States
| | - Xiang Li
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
| | - Ou Liu
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Xiangdong Tang
- Sleep Medicine Center, Mental Health Center, Department of Respiratory and Critical Care Medicine, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Jianwei Shuai
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China
| | - Yuping Cao
- Department of Psychiatry, National Clinical Research Center for Mental Disorders, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Hai Lin
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| |
Collapse
|
11
|
Exploration of Despair Eccentricities Based on Scale Metrics with Feature Sampling Using a Deep Learning Algorithm. Diagnostics (Basel) 2022; 12:diagnostics12112844. [PMID: 36428903 PMCID: PMC9689169 DOI: 10.3390/diagnostics12112844] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 11/11/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open
Abstract
The majority of people in the modern biosphere struggle with depression as a result of the coronavirus pandemic's impact, which has adversely impacted mental health without warning. Even though the majority of individuals are still protected, it is crucial to check for post-corona virus symptoms if someone is feeling a little lethargic. In order to identify the post-coronavirus symptoms and attacks that are present in the human body, the recommended approach is included. When a harmful virus spreads inside a human body, the post-diagnosis symptoms are considerably more dangerous, and if they are not recognised at an early stage, the risks will be increased. Additionally, if the post-symptoms are severe and go untreated, it might harm one's mental health. In order to prevent someone from succumbing to depression, the technology of audio prediction is employed to recognise all the symptoms and potentially dangerous signs. Different choral characters are used to combine machine-learning algorithms to determine each person's mental state. Design considerations are made for a separate device that detects audio attribute outputs in order to evaluate the effectiveness of the suggested technique; compared to the previous method, the performance metric is substantially better by roughly 67%.
Collapse
|
12
|
Othmani A, Zeghina AO, Muzammel M. A Model of Normality Inspired Deep Learning Framework for Depression Relapse Prediction Using Audiovisual Data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107132. [PMID: 36183638 DOI: 10.1016/j.cmpb.2022.107132] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 09/04/2022] [Accepted: 09/13/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Depression (Major Depressive Disorder) is one of the most common mental illnesses. According to the World Health Organization, more than 300 million people in the world are affected. A first depressive episode can be solved by a spontaneous remission within 6 to 12 months. It has been shown that depression affects speech production and facial expressions. Although numerous studies are proposed in the literature for depression recognition using audiovisual cues, depression relapse using audiovisual cues has not been studied in the literature. METHOD In this paper, we propose a deep learning-based approach for depression recognition and depression relapse prediction using audiovisual data. For more versatility and reusability, the proposed approach is based on a Model of Normality inspired framework where we define depression relapse by the closeness of the audiovisual patterns of a subject after a symptom-free period to the audiovisual patterns of depressed subjects. A model of Normality is an anomaly detection distance-based approach that computes a distance of normality between the deep audiovisual encoding of a test sample and a learned representation from audiovisual encodings of anomaly-free data. RESULTS The proposed approach shows a very promising results with an accuracy of 87.4% and a F1-score of 82.3% for relapse/depression prediction using a Leave-One-Subject-Out training strategy on the DAIC-Woz dataset. CONCLUSION The proposed model of normality-based framework is accurate in detecting depression and in predicting depression relapse. A prospective monitoring system is proposed for assisting depressed patients. The proposed framework is easily extensible and others modalities will be integrated in future works.
Collapse
Affiliation(s)
- Alice Othmani
- Université Paris-Est Créteil (UPEC), LISSI, Vitry sur Seine 94400, France.
| | | | - Muhammad Muzammel
- Université Paris-Est Créteil (UPEC), LISSI, Vitry sur Seine 94400, France
| |
Collapse
|
13
|
Cui Y, Li Z, Liu L, Zhang J, Liu J. Privacy-preserving Speech-based Depression Diagnosis via Federated Learning. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:1371-1374. [PMID: 36085955 DOI: 10.1109/embc48229.2022.9871861] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Mental health disorders, such as depression, affect a large and growing number of populations worldwide, and they may cause severe emotional, behavioral and physical health problems if left untreated. As depression affects a patient's speech characteristics, recent studies have proposed to leverage deep-learning-powered speech analysis models for depression diagnosis, which often require centralized learning on the collected voice data. However, this centralized training requiring data to be stored at a server raises the risks of severe voice data breaches, and people may not be willing to share their speech data with third parties due to privacy concerns. To address these issues, in this paper, we demonstrate for the first time that speech-based depression diagnosis models can be trained in a privacy-preserving way using federated learning, which enables collaborative model training while keeping the private speech data decentralized on clients' devices. To ensure the model's robustness under attacks, we also integrate different FL defenses into the system, such as norm bounding, differential privacy, and secure aggregation mechanisms. Extensive experiments under various FL settings on the DAIC-WOZ dataset show that our FL model can achieve high performance without sacrificing much utility compared with centralized-learning approaches while ensuring users' speech data privacy. Clinical Relevance- The experiments were conducted on publicly available clinical datasets. No humans or animals were involved.
Collapse
|
14
|
Saad E, Sadiq S, Jamil R, Rustam F, Mehmood A, Choi GS, Ashraf I. Novel extreme regression-voting classifier to predict death risk in vaccinated people using VAERS data. PLoS One 2022; 17:e0270327. [PMID: 35767542 PMCID: PMC9242465 DOI: 10.1371/journal.pone.0270327] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 06/09/2022] [Indexed: 12/23/2022] Open
Abstract
COVID-19 vaccination raised serious concerns among the public and people are mind stuck by various rumors regarding the resulting illness, adverse reactions, and death. Such rumors are dangerous to the campaign against the COVID-19 and should be dealt with accordingly and timely. One prospective solution is to use machine learning-based models to predict the death risk for vaccinated people and clarify people’s perceptions regarding death risk. This study focuses on the prediction of the death risks associated with vaccinated people followed by a second dose for two reasons; first to build consensus among people to get the vaccines; second, to reduce the fear regarding vaccines. Given that, this study utilizes the COVID-19 VAERS dataset that records adverse events after COVID-19 vaccination as ‘recovered’, ‘not recovered’, and ‘survived’. To obtain better prediction results, a novel voting classifier extreme regression-voting classifier (ER-VC) is introduced. ER-VC ensembles extra tree classifier and logistic regression using soft voting criterion. To avoid model overfitting and get better results, two data balancing techniques synthetic minority oversampling (SMOTE) and adaptive synthetic sampling (ADASYN) have been applied. Moreover, three feature extraction techniques term frequency-inverse document frequency (TF-IDF), bag of words (BoW), and global vectors (GloVe) have been used for comparison. Both machine learning and deep learning models are deployed for experiments. Results obtained from extensive experiments reveal that the proposed model in combination with TF-TDF has shown robust results with a 0.85 accuracy when trained on the SMOTE-balanced dataset. In line with this, validation of the proposed voting classifier on binary classification shows state-of-the-art results with a 0.98 accuracy. Results show that machine learning models can predict the death risk with high accuracy and can assist the authors in taking timely measures.
Collapse
Affiliation(s)
- Eysha Saad
- Department of Computer Science, Khawaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Saima Sadiq
- Department of Computer Science, Khawaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Ramish Jamil
- Department of Computer Science, Khawaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Furqan Rustam
- Department of Software Engineering, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Arif Mehmood
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Gyu Sang Choi
- Information and Communication Engineering, Yeungnam University, Gyeongsan, Korea
- * E-mail: (GSC); (IA)
| | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongsan, Korea
- * E-mail: (GSC); (IA)
| |
Collapse
|
15
|
Wu P, Wang R, Lin H, Zhang F, Tu J, Sun M. Automatic depression recognition by intelligent speech signal processing: A systematic survey. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2022. [DOI: 10.1049/cit2.12113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Pingping Wu
- Jiangsu Key Laboratory of Public Project Audit, School of Engineering Audit Nanjing Audit University Nanjing China
| | - Ruihao Wang
- School of Information Engineering Nanjing Audit University Nanjing China
| | - Han Lin
- Jiangsu Key Laboratory of Public Project Audit, School of Engineering Audit Nanjing Audit University Nanjing China
| | - Fanlong Zhang
- School of Information Engineering Nanjing Audit University Nanjing China
| | - Juan Tu
- Key Laboratory of Modern Acoustics (MOE), School of Physics Nanjing University Nanjing China
| | - Miao Sun
- Faculty of Electrical Engineering, Mathematics & Computer Science Delft University of Technology Delft The Netherlands
| |
Collapse
|
16
|
Wang K, Zhou YE, Xu JX, Yang G. RETRACTED ARTICLE: Reviewing big data based mental health education process for promoting education system. CURRENT PSYCHOLOGY 2022. [DOI: 10.1007/s12144-020-00800-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
17
|
Abstract
BACKGROUND In this modern era, depression is one of the most prevalent mental disorders from which millions of individuals are affected today. The symptoms of depression are heterogeneous and often coincide with other disorders such as bipolar disorder, Parkinson's, schizophrenia, etc. It is a serious mental illness that may lead to other health problems if left untreated. Currently, identifying individuals with depression is totally based on the expertise of the clinician's experience. In order to assist clinicians in identifying the characteristics and classifying depressed people, different types of data modalities and machine learning techniques have been incorporated by researchers in this field. This study aims to find the answers to some important questions related to the trend of publications, data modality, machine learning models, dataset usage, pre-processing techniques and feature extraction and selection techniques that are prevalent and guide the direction of future research on depression diagnosis. METHODS This systematic review was conducted using a broad range of articles from two major databases: IEEE Xplore and PubMed. Studies ranging from the years 2011 to April 2021 were retrieved from the databases resulting in a total of 590 articles (53 articles from the IEEE Xplore database and 537 articles from the PubMed database). Out of those, the articles which satisfied the defined inclusion criteria were investigated for further analysis. RESULTS A total of 135 articles were identified and analysed for this review. High growth in the number of publications has been observed in recent years. Furthermore, significant diversity in the use of data modalities and machine learning classifiers has also been noted in this study. fMRI data with an SVM classifier was found to be the most popular choice among researchers. In most of the studies, data scarcity and small sample size, particularly for neuroimaging data are major concerns. The use of identical data pre-processing tools for similar data modalities can be seen. This study also provides statistical analysis of the current framework with respect to the modality, machine learning classifier, sample size and accuracy by applying one-way ANOVA and the Tukey - Kramer test. CONCLUSION The results indicate that an effective fusion of machine learning techniques with a potential data modality has a promising future for assisting clinicians in automatic depression diagnosis.
Collapse
Affiliation(s)
- Sweta Bhadra
- Department of CS & IT, Cotton University, Guwahati, India
| | | |
Collapse
|
18
|
Almaghrabi SA, Thewlis D, Thwaites S, Rogasch NC, Lau S, Clark SR, Baumert M. The reproducibility of bio-acoustic features is associated with sample duration, speech task and gender. IEEE Trans Neural Syst Rehabil Eng 2022; 30:167-175. [PMID: 35038295 DOI: 10.1109/tnsre.2022.3143117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Bio-acoustic properties of speech show evolving value in analyzing psychiatric illnesses. Obtaining a sufficient speech sample length to quantify these properties is essential, but the impact of sample duration on the stability of bio-acoustic features has not been systematically explored. We aimed to evaluate bio-acoustic features' reproducibility against changes in speech durations and tasks. We extracted source, spectral, formant, and prosodic features in 185 English-speaking adults (98 w, 87 m) for reading-a-story and counting tasks. We compared features at 25% of the total sample duration of the reading task to those obtained from non-overlapping randomly selected sub-samples shortened to 75%, 50%, and 25% of total duration using intraclass correlation coefficients. We also compared the features extracted from entire recordings to those measured at 25% of the duration and features obtained from 50% of the duration. Further, we compared features extracted from reading-a-story to counting tasks. Our results show that the number of reproducible features (out of 125) decreased stepwise with duration reduction. Spectral shape, pitch, and formants reached excellent reproducibility. Mel-frequency cepstral coefficients (MFCCs), loudness, and zero-crossing rate achieved excellent reproducibility only at a longer duration. Reproducibility of source, MFCC derivatives, and voicing probability (VP) was poor. Significant gender differences existed in jitter, MFCC first-derivative, spectral skewness, pitch, VP, and formants. Around 97% of features in both genders were not reproducible across speech tasks, in part due to the short counting task duration. In conclusion, bio-acoustic features are less reproducible in shorter samples and are affected by gender.
Collapse
|
19
|
Rejaibi E, Komaty A, Meriaudeau F, Agrebi S, Othmani A. MFCC-based Recurrent Neural Network for automatic clinical depression recognition and assessment from speech. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103107] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
20
|
Ye J, Yu Y, Wang Q, Li W, Liang H, Zheng Y, Fu G. Multi-modal depression detection based on emotional audio and evaluation text. J Affect Disord 2021; 295:904-913. [PMID: 34706461 DOI: 10.1016/j.jad.2021.08.090] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 07/02/2021] [Accepted: 08/26/2021] [Indexed: 10/20/2022]
Abstract
BACKGROUND Early detection of depression is very important for the treatment of patients. In view of the current inefficient screening methods for depression, the research of depression identification technology is a complex problem with application value. METHODS Our research propose a new experimental method for depression detection based on audio and text. 160 Chinese subjects are investigated in this study. It is worth noting that we propose a text reading experiment to make subjects emotions change rapidly. It will be called Segmental Emotional Speech Experiment (SESE) below. We extract 384-dimensional Low-level audio features to find the differences of different emotional change in SESE. At the same time, our research propose a multi-modal fusion method based on DeepSpectrum features and word vector features to detect depression by using deep learning. RESULTS Our experiment proved that SESE can improve the recognition accuracy of depression and found differences in Low-level audio features. Case group and Control group, gender and age are grouped for verification. It is also satisfactory that the multi-modal fusion model achieves accuracy of 0.912 and F1 score of 0.906. CONCLUSIONS Our contribution is twofold. First, we propose and verify SESE, which can provide a new experimental idea for the follow-up researchers. Secondly, a new efficient multi-modal depression recognition model is proposed.
Collapse
Affiliation(s)
- Jiayu Ye
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Yanhong Yu
- College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan 250355, China
| | - Qingxiang Wang
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China.
| | - Wentao Li
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Hu Liang
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | | | - Gang Fu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| |
Collapse
|
21
|
Stasak B, Huang Z, Epps J, Joachim D. Depression Classification Using n-Gram Speech Errors from Manual and Automatic Stroop Color Test Transcripts. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:1631-1635. [PMID: 34891598 DOI: 10.1109/embc46164.2021.9629881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
While the psychological Stroop color test has frequently been used to analyze response delays in temporal cognitive processing, minimal research has examined incorrect/correct verbal test response pattern differences exhibited in healthy control and clinically depressed populations. Further, the development of speech error features with an emphasis on sequential Stroop test responses has been unexplored for automatic depression classification. In this study which uses speech recorded via a smart device, an analysis of и-gram error sequence distributions shows that participants with clinical depression produce more Stroop color test errors, especially sequential errors, than the healthy controls. By utilizing и-gram error features derived from multisession manual transcripts, experimentation shows that trigram error features generate up to 95% depression classification accuracy, whereas an acoustic feature baseline achieve only upwards of 75%. Moreover, и-gram error features using ASR transcripts produced up to 90% depression classification accuracy.
Collapse
|
22
|
Klangpornkun N, Ruangritchai M, Munthuli A, Onsuwan C, Jaisin K, Pattanaseri K, Lortrakul J, Thanakulakkarachai P, Anansiripinyo T, Amornlaksananon A, Laohawee S, Tantibundhit C. Classification of Depression and Other Psychiatric Conditions Using Speech Features Extracted from a Thai Psychiatric and Verbal Screening Test. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:651-656. [PMID: 34891377 DOI: 10.1109/embc46164.2021.9629571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Depression is a common and serious mental illness which negatively affects daily functioning. To prevent the progression of the illness into severe or long-term consequences, early diagnosis is crucial. We developed an automated speech feature analysis application for depression and other psychiatric disorders derived from a developed Thai psychiatric and verbal screening test. The screening test includes Thai's version of Patient Health Questionnaire-9 (PHQ-9) and Hamilton Depression Rating Scale (HAM-D), and 32 additional emotion-induced questions. Case-control study was conducted on speech features from 66 participants. Twenty seven of those had depression (DP), 12 had other psychiatric disorders (OP), and 27 were normal controls (NC). The five-fold cross-validation from 6 settings of 5 classifiers with the combination of PHQ-9 and HAM-D scores, and speech features were examined. Results showed highest performance from the multilayer perceptron (MLP) classifier which yielded 83.33% sensitivity, 91.67% specificity, and 83.33% accuracy, where negative-emotional questions were most effective in classification. The automated speech feature analysis showed promising results for screening patients with depression or other psychiatric disorders. The current application is accessible through smartphone, making it a feasible and intuitive setup for low-resource countries such as Thailand.
Collapse
|
23
|
Muzammel M, Salam H, Othmani A. End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 211:106433. [PMID: 34614452 DOI: 10.1016/j.cmpb.2021.106433] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 09/15/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE Major Depressive Disorder is a highly prevalent and disabling mental health condition. Numerous studies explored multimodal fusion systems combining visual, audio, and textual features via deep learning architectures for clinical depression recognition. Yet, no comparative analysis for multimodal depression analysis has been proposed in the literature. METHODS In this paper, an up-to-date literature overview of multimodal depression recognition is presented and an extensive comparative analysis of different deep learning architectures for depression recognition is performed. First, audio features based Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) are studied. Then, early-level and model-level fusion of deep audio features with visual and textual features through LSTM and CNN architectures are investigated. RESULTS The performance of the proposed architectures using an hold-out strategy on the DAIC-WOZ dataset (80% training, 10% validation, 10% test split) for binary and severity levels of depression recognition is tested. Using this strategy, a set of experiments have been performed and they have demonstrated: (1) LSTM-based audio features perform slightly better than CNN ones with an accuracy of 66.25% versus 65.60% for binary depression classes. (2) the model level fusion of deep audio and visual features using LSTM network performed the best with an accuracy of 77.16%, a precision of 53% for the depressed class, and a precision of 83% for the non-depressed class. The given network obtained a normalized Root Mean Square Error (RMSE) of 0.15 for depression severity level prediction. Using a Leave-One-Subject-Out strategy, this network achieved an accuracy of 95.38% for binary depression detection, and a normalized RMSE of 0.1476 for depression severity level prediction. Our best-performing architecture outperforms all state-of-the-art approaches on DAIC-WOZ dataset. CONCLUSIONS The obtained results show that the proposed LSTM-based surpass the proposed CNN-based architectures allowing to learn temporal dynamics representations of multimodal features. Furthermore, model-level fusion of audio and visual features using an LSTM network leads to the best performance. Our best-performing architecture successfully detects depression using a speech segment of less than 8 seconds, and an average prediction computation time of less than 6ms; making it suitable for real-world clinical applications.
Collapse
Affiliation(s)
- Muhammad Muzammel
- Université Paris-Est Créteil (UPEC), LISSI, Vitry sur Seine 94400, France
| | - Hanan Salam
- New York University, SMART Lab, Saadiyat Island, Abu Dhabi
| | - Alice Othmani
- Université Paris-Est Créteil (UPEC), LISSI, Vitry sur Seine 94400, France.
| |
Collapse
|
24
|
Zhao Y, Liang Z, Du J, Zhang L, Liu C, Zhao L. Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech. Front Neurorobot 2021; 15:684037. [PMID: 34512301 PMCID: PMC8426553 DOI: 10.3389/fnbot.2021.684037] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
Depression is a mental disorder that threatens the health and normal life of people. Hence, it is essential to provide an effective way to detect depression. However, research on depression detection mainly focuses on utilizing different parallel features from audio, video, and text for performance enhancement regardless of making full usage of the inherent information from speech. To focus on more emotionally salient regions of depression speech, in this research, we propose a multi-head time-dimension attention-based long short-term memory (LSTM) model. We first extract frame-level features to store the original temporal relationship of a speech sequence and then analyze their difference between speeches of depression and those of health status. Then, we study the performance of various features and use a modified feature set as the input of the LSTM layer. Instead of using the output of the traditional LSTM, multi-head time-dimension attention is employed to obtain more key time information related to depression detection by projecting the output into different subspaces. The experimental results show the proposed model leads to improvements of 2.3 and 10.3% over the LSTM model on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) and the Multi-modal Open Dataset for Mental-disorder Analysis (MODMA) corpus, respectively.
Collapse
Affiliation(s)
- Yan Zhao
- Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing, China
| | - Zhenlin Liang
- Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing, China
| | - Jing Du
- Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing, China
| | - Li Zhang
- Computational Intelligence Group, Northumbria University, Newcastle upon Tyne, United Kingdom
- National Subsea Centre, Robert Gordon University, Aberdeen, United Kingdom
| | - Chengyu Liu
- School of Instrument Science and Engineering, Southeast University, Nanjing, China
| | - Li Zhao
- Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing, China
| |
Collapse
|
25
|
Little B, Alshabrawy O, Stow D, Ferrier IN, McNaney R, Jackson DG, Ladha K, Ladha C, Ploetz T, Bacardit J, Olivier P, Gallagher P, O'Brien JT. Deep learning-based automated speech detection as a marker of social functioning in late-life depression. Psychol Med 2021; 51:1441-1450. [PMID: 31944174 PMCID: PMC8311821 DOI: 10.1017/s0033291719003994] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 10/23/2019] [Accepted: 12/13/2019] [Indexed: 11/24/2022]
Abstract
BACKGROUND Late-life depression (LLD) is associated with poor social functioning. However, previous research uses bias-prone self-report scales to measure social functioning and a more objective measure is lacking. We tested a novel wearable device to measure speech that participants encounter as an indicator of social interaction. METHODS Twenty nine participants with LLD and 29 age-matched controls wore a wrist-worn device continuously for seven days, which recorded their acoustic environment. Acoustic data were automatically analysed using deep learning models that had been developed and validated on an independent speech dataset. Total speech activity and the proportion of speech produced by the device wearer were both detected whilst maintaining participants' privacy. Participants underwent a neuropsychological test battery and clinical and self-report scales to measure severity of depression, general and social functioning. RESULTS Compared to controls, participants with LLD showed poorer self-reported social and general functioning. Total speech activity was much lower for participants with LLD than controls, with no overlap between groups. The proportion of speech produced by the participants was smaller for LLD than controls. In LLD, both speech measures correlated with attention and psychomotor speed performance but not with depression severity or self-reported social functioning. CONCLUSIONS Using this device, LLD was associated with lower levels of speech than controls and speech activity was related to psychomotor retardation. We have demonstrated that speech activity measured by wearable technology differentiated LLD from controls with high precision and, in this study, provided an objective measure of an aspect of real-world social functioning in LLD.
Collapse
Affiliation(s)
- Bethany Little
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
| | - Ossama Alshabrawy
- Interdisciplinary Computing and Complex BioSystems (ICOS) group, School of Computing, Newcastle University, Newcastle upon Tyne, UK
- Faculty of Science, Damietta University, New Damietta, Egypt
| | - Daniel Stow
- Institute of Health and Society, Newcastle University, Newcastle upon Tyne, UK
| | - I. Nicol Ferrier
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
| | | | - Daniel G. Jackson
- Open Lab, School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Karim Ladha
- Open Lab, School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | | | - Thomas Ploetz
- School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA, USA
| | - Jaume Bacardit
- Interdisciplinary Computing and Complex BioSystems (ICOS) group, School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Patrick Olivier
- Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Peter Gallagher
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
| | - John T. O'Brien
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
- Department of Psychiatry, University of Cambridge, Cambridge, UK
| |
Collapse
|
26
|
Muzammel M, Salam H, Hoffmann Y, Chetouani M, Othmani A. AudVowelConsNet: A phoneme-level based deep CNN architecture for clinical depression diagnosis. MACHINE LEARNING WITH APPLICATIONS 2020. [DOI: 10.1016/j.mlwa.2020.100005] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
27
|
A Game Theory-Based Model for Predicting Depression due to Frustration in Competitive Environments. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:3573267. [PMID: 32565879 PMCID: PMC7290902 DOI: 10.1155/2020/3573267] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 04/22/2020] [Accepted: 05/08/2020] [Indexed: 11/30/2022]
Abstract
A computational model based on game theory is here proposed to forecast the prevalence of depression caused by frustration in a competitive environment. This model comprises a spatially structured game, in which the individuals are socially connected. This game, which is equivalent to the well-known prisoner's dilemma, represents the payoffs that can be received by the individuals in the labor market. These individuals may or may not have invested in a formal academic education. It is assumed that an individual becomes depressed when the difference between the average payoff earned by the neighbors in this game and the personal payoff surpasses a critical number, which can be distinct for men and women. Thus, the transition to depression depends on two thresholds, whose values are tuned for the model accurately predicting the percentage of individuals that become depressed due to a frustrating payoff. Here, this tuning is performed by using data of young adults living in the United Kingdom in 2014-2016.
Collapse
|
28
|
Kaonga NN, Morgan J. Common themes and emerging trends for the use of technology to support mental health and psychosocial well-being in limited resource settings: A review of the literature. Psychiatry Res 2019; 281:112594. [PMID: 31605874 DOI: 10.1016/j.psychres.2019.112594] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Revised: 09/29/2019] [Accepted: 09/29/2019] [Indexed: 12/19/2022]
Abstract
There are significant disparities in access to mental health care. With the burgeoning of technologies for health, digital tools have been leveraged within mental health and psychosocial support programming (eMental health). A review of the literature was conducted to understand and identify how eMental health has been used in resource-limited settings in general. PubMed, Ovid Medline and Web of Science were searched. Six-hundred and thirty full-text articles were identified and assessed for eligibility; of those, 67 articles met the inclusion criteria and were analyzed. The most common mental health use cases were for depression (n = 25) and general mental health and well-being (n = 21). Roughly one-third used a website or Internet-enabled intervention (n = 23) and nearly one-third used an SMS intervention (n = 22). Technology was applied to enhance service delivery (n = 32), behavior change communication (n = 26) and data collection (n = 8), and specifically dealt with adherence (n = 7), ecological momentary assessments (n = 7), well-being promotion (n = 5), education (n = 8), telemedicine (n = 28), machine learning (n = 5) and games (n = 2). Emerging trends identified wearables, predictive analytics, robots and virtual reality as promising areas. eMental health interventions that leverage low-tech tools can introduce, strengthen and expand mental health and psychosocial support services and can be a starting point for future, advanced tools.
Collapse
Affiliation(s)
- Nadi Nina Kaonga
- HealthEnabled, Cape Town, South Africa; Tufts University School of Medicine, Boston, MA, United States; Maine Medical Center, Portland, ME, United States.
| | - Jonathan Morgan
- Regional Psychosocial Support Initiative (REPSSI), Cape Town, South Africa.
| |
Collapse
|
29
|
Abstract
Depression is a common and serious medical illness which affects the way how we think, feel and act. Although harmless in its initial stages, it can cause serious problems if detected at a later stage. Due to advancements in technology, it is now possible to detect signs of depression. Different implementation of machine learning algorithms has been worked upon to detect factors causing depression. It is found that speech of a person is dramatically affected and various vocal features are used to classify depression.
Collapse
|