1
|
Liu L, Liu L, Wafa HA, Tydeman F, Xie W, Wang Y. Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis. J Am Med Inform Assoc 2024; 31:2394-2404. [PMID: 39013193 DOI: 10.1093/jamia/ocae189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 06/12/2024] [Accepted: 07/05/2024] [Indexed: 07/18/2024] Open
Abstract
OBJECTIVE This study aims to conduct a systematic review and meta-analysis of the diagnostic accuracy of deep learning (DL) using speech samples in depression. MATERIALS AND METHODS This review included studies reporting diagnostic results of DL algorithms in depression using speech data, published from inception to January 31, 2024, on PubMed, Medline, Embase, PsycINFO, Scopus, IEEE, and Web of Science databases. Pooled accuracy, sensitivity, and specificity were obtained by random-effect models. The diagnostic Precision Study Quality Assessment Tool (QUADAS-2) was used to assess the risk of bias. RESULTS A total of 25 studies met the inclusion criteria and 8 of them were used in the meta-analysis. The pooled estimates of accuracy, specificity, and sensitivity for depression detection models were 0.87 (95% CI, 0.81-0.93), 0.85 (95% CI, 0.78-0.91), and 0.82 (95% CI, 0.71-0.94), respectively. When stratified by model structure, the highest pooled diagnostic accuracy was 0.89 (95% CI, 0.81-0.97) in the handcrafted group. DISCUSSION To our knowledge, our study is the first meta-analysis on the diagnostic performance of DL for depression detection from speech samples. All studies included in the meta-analysis used convolutional neural network (CNN) models, posing problems in deciphering the performance of other DL algorithms. The handcrafted model performed better than the end-to-end model in speech depression detection. CONCLUSIONS The application of DL in speech provided a useful tool for depression detection. CNN models with handcrafted acoustic features could help to improve the diagnostic performance. PROTOCOL REGISTRATION The study protocol was registered on PROSPERO (CRD42023423603).
Collapse
Affiliation(s)
- Lidan Liu
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| | - Lu Liu
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| | - Hatem A Wafa
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| | - Florence Tydeman
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| | - Wanqing Xie
- Department of Intelligent Medical Engineering, School of Biomedical Engineering, Anhui Medical University, Hefei, 230032, China
- Department of Psychology, School of Mental Health and Psychological Sciences, Anhui Medical University, Hefei, 230032, China
- Beth Israel Deaconess Medical Center, Harvard Medical School, Harvard University, Boston, MA, 02115, United States
| | - Yanzhong Wang
- Department of Population Health Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences & Medicine, King's College London, London, SE1 1UL, United Kingdom
| |
Collapse
|
2
|
Berisha V, Liss JM. Responsible development of clinical speech AI: Bridging the gap between clinical research and technology. NPJ Digit Med 2024; 7:208. [PMID: 39122889 PMCID: PMC11316053 DOI: 10.1038/s41746-024-01199-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 07/19/2024] [Indexed: 08/12/2024] Open
Abstract
This perspective article explores the challenges and potential of using speech as a biomarker in clinical settings, particularly when constrained by the small clinical datasets typically available in such contexts. We contend that by integrating insights from speech science and clinical research, we can reduce sample complexity in clinical speech AI models with the potential to decrease timelines to translation. Most existing models are based on high-dimensional feature representations trained with limited sample sizes and often do not leverage insights from speech science and clinical research. This approach can lead to overfitting, where the models perform exceptionally well on training data but fail to generalize to new, unseen data. Additionally, without incorporating theoretical knowledge, these models may lack interpretability and robustness, making them challenging to troubleshoot or improve post-deployment. We propose a framework for organizing health conditions based on their impact on speech and promote the use of speech analytics in diverse clinical contexts beyond cross-sectional classification. For high-stakes clinical use cases, we advocate for a focus on explainable and individually-validated measures and stress the importance of rigorous validation frameworks and ethical considerations for responsible deployment. Bridging the gap between AI research and clinical speech research presents new opportunities for more efficient translation of speech-based AI tools and advancement of scientific discoveries in this interdisciplinary space, particularly if limited to small or retrospective datasets.
Collapse
Affiliation(s)
- Visar Berisha
- School of Electrical Computer and Energy Engineering and College of Health Solutions, Arizona State University, Tempe, AZ, USA.
| | - Julie M Liss
- College of Health Solutions, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
3
|
Kenyon KH, Boonstra F, Noffs G, Morgan AT, Vogel AP, Kolbe S, Van Der Walt A. The characteristics and reproducibility of motor speech functional neuroimaging in healthy controls. Front Hum Neurosci 2024; 18:1382102. [PMID: 39171097 PMCID: PMC11335534 DOI: 10.3389/fnhum.2024.1382102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 07/22/2024] [Indexed: 08/23/2024] Open
Abstract
Introduction Functional magnetic resonance imaging (fMRI) can improve our understanding of neural processes subserving motor speech function. Yet its reproducibility remains unclear. This study aimed to evaluate the reproducibility of fMRI using a word repetition task across two time points. Methods Imaging data from 14 healthy controls were analysed using a multi-level general linear model. Results Significant activation was observed during the task in the right hemispheric cerebellar lobules IV-V, right putamen, and bilateral sensorimotor cortices. Activation between timepoints was found to be moderately reproducible across time in the cerebellum but not in other brain regions. Discussion Preliminary findings highlight the involvement of the cerebellum and connected cerebral regions during a motor speech task. More work is needed to determine the degree of reproducibility of speech fMRI before this could be used as a reliable marker of changes in brain activity.
Collapse
Affiliation(s)
- Katherine H. Kenyon
- Department of Neuroscience, School of Translational Medicine, Melbourne, VIC, Australia
| | - Frederique Boonstra
- Department of Neuroscience, School of Translational Medicine, Melbourne, VIC, Australia
| | - Gustavo Noffs
- Department of Neuroscience, School of Translational Medicine, Melbourne, VIC, Australia
- Redenlab Inc., Melbourne, VIC, Australia
| | - Angela T. Morgan
- Murdoch Childrens Research Institute, Royal Children's Hospital, Melbourne, VIC, Australia
- Department of Audiology and Speech Pathology, Faculty of Medicine, Dentistry and Health Sciences, Melbourne School of Health Sciences, University of Melbourne, Carlton, VIC, Australia
| | - Adam P. Vogel
- Redenlab Inc., Melbourne, VIC, Australia
- Department of Audiology and Speech Pathology, Parkville, VIC, Australia
| | - Scott Kolbe
- Department of Neuroscience, School of Translational Medicine, Melbourne, VIC, Australia
| | - Anneke Van Der Walt
- Department of Neuroscience, School of Translational Medicine, Melbourne, VIC, Australia
- Department of Neurology, Royal Melbourne Hospital, Melbourne, VIC, Australia
| |
Collapse
|
4
|
Scheffer M, Bockting CL, Borsboom D, Cools R, Delecroix C, Hartmann JA, Kendler KS, van de Leemput I, van der Maas HLJ, van Nes E, Mattson M, McGorry PD, Nelson B. A Dynamical Systems View of Psychiatric Disorders-Practical Implications: A Review. JAMA Psychiatry 2024; 81:624-630. [PMID: 38568618 DOI: 10.1001/jamapsychiatry.2024.0228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
Importance Dynamical systems theory is widely used to explain tipping points, cycles, and chaos in complex systems ranging from the climate to ecosystems. It has been suggested that the same theory may be used to explain the nature and dynamics of psychiatric disorders, which may come and go with symptoms changing over a lifetime. Here we review evidence for the practical applicability of this theory and its quantitative tools in psychiatry. Observations Emerging results suggest that time series of mood and behavior may be used to monitor the resilience of patients using the same generic dynamical indicators that are now employed globally to monitor the risks of collapse of complex systems, such as tropical rainforest and tipping elements of the climate system. Other dynamical systems tools used in ecology and climate science open ways to infer personalized webs of causality for patients that may be used to identify targets for intervention. Meanwhile, experiences in ecological restoration help make sense of the occasional long-term success of short interventions. Conclusions and Relevance Those observations, while promising, evoke follow-up questions on how best to collect dynamic data, infer informative timescales, construct mechanistic models, and measure the effect of interventions on resilience. Done well, monitoring resilience to inform well-timed interventions may be integrated into approaches that give patients an active role in the lifelong challenge of managing their resilience and knowing when to seek professional help.
Collapse
|
5
|
Yang T, Guo Z, Li J, Zhu H, Cao Y, Ding Y, Liu X. Abnormally decreased functional connectivity of the right nucleus basalis of Meynert in Alzheimer's disease patients with depression symptoms. Biol Psychol 2024; 188:108785. [PMID: 38527571 DOI: 10.1016/j.biopsycho.2024.108785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 03/27/2024]
Abstract
Dysfunction of the basal forebrain is the main pathological feature in patients with Alzheimer's disease (AD). The aim of this study was to explore whether depressive symptoms cause changes in the functional network of the basal forebrain in AD patients. We collected MRI data from depressed AD patients (n = 24), nondepressed AD patients (n = 14) and healthy controls (n = 20). Resting-state functional magnetic resonance imaging data and functional connectivity analysis were used to study the characteristics of the basal forebrain functional network of the three groups of participants. The functional connectivity differences among the three groups were compared using ANCOVA and post hoc analyses. Compared to healthy controls, depressed AD patients showed reduced functional connectivity between the right nucleus basalis of Meynert and the left supramarginal gyrus and the supplementary motor area. These results increase our understanding of the neural mechanism of depressive symptoms in AD patients.
Collapse
Affiliation(s)
- Ting Yang
- The Second Affiliated Hospital and Yuying Children's Hospital, Wenzhou Medical University, Wenzhou, Zhejiang 325027, China
| | - Zhongwei Guo
- Tongde Hospital of Zhejiang Province, Hangzhou, Zhejiang 310012, China
| | - Jiapeng Li
- Tongde Hospital of Zhejiang Province, Hangzhou, Zhejiang 310012, China
| | - Hong Zhu
- Tongde Hospital of Zhejiang Province, Hangzhou, Zhejiang 310012, China
| | - Yulin Cao
- Tongde Hospital of Zhejiang Province, Hangzhou, Zhejiang 310012, China
| | - Yanping Ding
- Air Force Health Care Center for Special Services, Hangzhou 310007, China
| | - Xiaozheng Liu
- The Second Affiliated Hospital and Yuying Children's Hospital, Wenzhou Medical University, Wenzhou, Zhejiang 325027, China; Wenzhou Key Laboratory of Structural and Functional Imaging, Wenzhou 325027, China.
| |
Collapse
|
6
|
Kaplan DM, Tidwell CA, Chung JM, Alisic E, Demiray B, Bruni M, Evora S, Gajewski-Nemes JA, Macbeth A, Mangelsdorf SN, Mascaro JS, Minor KS, Noga RN, Nugent NR, Polsinelli AJ, Rentscher KE, Resnikoff AW, Robbins ML, Slatcher RB, Tejeda-Padron AB, Mehl MR. Diversity, equity, and inclusivity in observational ambulatory assessment: Recommendations from two decades of Electronically Activated Recorder (EAR) research. Behav Res Methods 2024; 56:3207-3225. [PMID: 38066394 DOI: 10.3758/s13428-023-02293-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/09/2023] [Indexed: 05/30/2024]
Abstract
Ambient audio sampling methods such as the Electronically Activated Recorder (EAR) have become increasingly prominent in clinical and social sciences research. These methods record snippets of naturalistically assessed audio from participants' daily lives, enabling novel observational research about the daily social interactions, identities, environments, behaviors, and speech of populations of interest. In practice, these scientific opportunities are equaled by methodological challenges: researchers' own cultural backgrounds and identities can easily and unknowingly permeate the collection, coding, analysis, and interpretation of social data from daily life. Ambient audio sampling poses unique and significant challenges to cultural humility, diversity, equity, and inclusivity (DEI) in scientific research that require systematized attention. Motivated by this observation, an international consortium of 21 researchers who have used ambient audio sampling methodologies created a workgroup with the aim of improving upon existing published guidelines. We pooled formally and informally documented challenges pertaining to DEI in ambient audio sampling from our collective experience on 40+ studies (most of which used the EAR app) in clinical and healthy populations ranging from children to older adults. This article presents our resultant recommendations and argues for the incorporation of community-engaged research methods in observational ambulatory assessment designs looking forward. We provide concrete recommendations across each stage typical of an ambient audio sampling study (recruiting and enrolling participants, developing coding systems, training coders, handling multi-linguistic participants, data analysis and interpretation, and dissemination of results) as well as guiding questions that can be used to adapt these recommendations to project-specific constraints and needs.
Collapse
Affiliation(s)
- Deanna M Kaplan
- Department of Family and Preventive Medicine, Emory University School of Medicine, Atlanta, GA, USA.
| | - Colin A Tidwell
- Department of Psychology, University of Arizona, Tucson, USA
| | - Joanne M Chung
- Department of Psychology, University of Toronto Mississauga, Mississauga, Canada
| | - Eva Alisic
- Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, Australia
| | - Burcu Demiray
- Department of Psychology, University of Zurich, Zürich, Switzerland
| | - Michelle Bruni
- Department of Psychology, University of California-Riverside, Riverside, USA
| | - Selena Evora
- Center for Health Promotion and Health Equity, School of Public Health, Brown University, Providence, USA
| | | | | | | | - Jennifer S Mascaro
- Department of Family and Preventive Medicine, Emory University School of Medicine, Atlanta, GA, USA
| | - Kyle S Minor
- Department of Psychology, Indiana University - Purdue University Indianapolis, Indianapolis, USA
| | - Rebecca N Noga
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina Chapel Hill, Chapel Hill, USA
| | - Nicole R Nugent
- Department of Psychiatry and Human Behavior, Alpert Medical School of Brown University, Providence, USA
| | | | - Kelly E Rentscher
- Department of Psychiatry and Behavioral Medicine, Medical College of Wisconsin, Milwaukee, USA
| | | | - Megan L Robbins
- Department of Psychology, University of California-Riverside, Riverside, USA
| | | | | | - Matthias R Mehl
- Department of Psychology, University of Arizona, Tucson, USA
| |
Collapse
|
7
|
Trifu RN, Nemeș B, Herta DC, Bodea-Hategan C, Talaș DA, Coman H. Linguistic markers for major depressive disorder: a cross-sectional study using an automated procedure. Front Psychol 2024; 15:1355734. [PMID: 38510303 PMCID: PMC10953917 DOI: 10.3389/fpsyg.2024.1355734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 02/06/2024] [Indexed: 03/22/2024] Open
Abstract
Introduction The identification of language markers, referring to both form and content, for common mental health disorders such as major depressive disorder (MDD), can facilitate the development of innovative tools for early recognition and prevention. However, studies in this direction are only at the beginning and are difficult to implement due to linguistic variability and the influence of cultural contexts. Aim This study aims to identify language markers specific to MDD through an automated analysis process based on RO-2015 LIWC (Linguistic Inquiry and Word Count). Materials and methods A sample of 62 medicated patients with MDD and a sample of 43 controls were assessed. Each participant provided language samples that described something that was pleasant for them. Assessment tools (1) Screening tests for MDD (MADRS and DASS-21); (2) Ro-LIWC2015 - Linguistic Inquiry and Word Count - a computerized text analysis software, validated for Romanian Language, that analyzes morphology, syntax and semantics of word use. Results Depressive patients use different approaches in sentence structure, and communicate in short sentences. This requires multiple use of the punctuation mark period, which implicitly requires directive communication, limited in exchange of ideas. Also, participants from the sample with depression mostly use impersonal pronouns, first person pronoun in plural form - not singular, a limited number of prepositions and an increased number of conjunctions, auxiliary verbs, negations, verbs in the past tense, and much less in the present tense, increased use of words expressing negative affects, anxiety, with limited use of words indicating positive affects. The favorite topics of interest of patients with depression are leisure, time and money. Conclusion Depressive patients use a significantly different language pattern than people without mood or behavioral disorders, both in form and content. These differences are sometimes associated with years of education and sex, and might also be explained by cultural differences.
Collapse
Affiliation(s)
- Raluca Nicoleta Trifu
- Department of Neurosciences, Discipline of Medical Psychology and Psychiatry, Iuliu Haţieganu University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Bogdan Nemeș
- Department of Neurosciences, Discipline of Medical Psychology and Psychiatry, Iuliu Haţieganu University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Dana Cristina Herta
- Department of Neurosciences, Discipline of Medical Psychology and Psychiatry, Iuliu Haţieganu University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Carolina Bodea-Hategan
- Special Education Department, Faculty of Psychology and Education Sciences, Babeș-Bolyai University, Cluj-Napoca, Romania
| | - Dorina Anca Talaș
- Special Education Department, Faculty of Psychology and Education Sciences, Babeș-Bolyai University, Cluj-Napoca, Romania
| | - Horia Coman
- Department of Neurosciences, Discipline of Medical Psychology and Psychiatry, Iuliu Haţieganu University of Medicine and Pharmacy, Cluj-Napoca, Romania
| |
Collapse
|
8
|
Shabber SM, Sumesh EP. AFM signal model for dysarthric speech classification using speech biomarkers. Front Hum Neurosci 2024; 18:1346297. [PMID: 38445096 PMCID: PMC10912169 DOI: 10.3389/fnhum.2024.1346297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/05/2024] [Indexed: 03/07/2024] Open
Abstract
Neurological disorders include various conditions affecting the brain, spinal cord, and nervous system which results in reduced performance in different organs and muscles throughout the human body. Dysarthia is a neurological disorder that significantly impairs an individual's ability to effectively communicate through speech. Individuals with dysarthria are characterized by muscle weakness that results in slow, slurred, and less intelligible speech production. An efficient identification of speech disorders at the beginning stages helps doctors suggest proper medications. The classification of dysarthric speech assumes a pivotal role as a diagnostic tool, enabling accurate differentiation between healthy speech patterns and those affected by dysarthria. Achieving a clear distinction between dysarthric speech and the speech of healthy individuals is made possible through the application of advanced machine learning techniques. In this work, we conducted feature extraction by utilizing the Amplitude and frequency modulated (AFM) signal model, resulting in the generation of a comprehensive array of unique features. A method involving Fourier-Bessel series expansion is employed to separate various components within a complex speech signal into distinct elements. Subsequently, the Discrete Energy Separation Algorithm is utilized to extract essential parameters, namely the Amplitude envelope and Instantaneous frequency, from each component within the speech signal. To ensure the robustness and applicability of our findings, we harnessed data from various sources, including TORGO, UA Speech, and Parkinson datasets. Furthermore, the classifier's performance was evaluated based on multiple measures such as the area under the curve, F1-Score, sensitivity, and accuracy, encompassing KNN, SVM, LDA, NB, and Boosted Tree. Our analyses resulted in classification accuracies ranging from 85 to 97.8% and the F1-score ranging between 0.90 and 0.97.
Collapse
|
9
|
Greco C, Raimo G, Amorese T, Cuciniello M, Mcconvey G, Cordasco G, Faundez-Zanuy M, Vinciarelli A, Callejas-Carrion Z, Esposito A. Discriminative Power of Handwriting and Drawing Features in Depression. Int J Neural Syst 2024; 34:2350069. [PMID: 38009869 DOI: 10.1142/s0129065723500697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
This study contributes knowledge on the detection of depression through handwriting/drawing features, to identify quantitative and noninvasive indicators of the disorder for implementing algorithms for its automatic detection. For this purpose, an original online approach was adopted to provide a dynamic evaluation of handwriting/drawing performance of healthy participants with no history of any psychiatric disorders ([Formula: see text]), and patients with a clinical diagnosis of depression ([Formula: see text]). Both groups were asked to complete seven tasks requiring either the writing or drawing on a paper while five handwriting/drawing features' categories (i.e. pressure on the paper, time, ductus, space among characters, and pen inclination) were recorded by using a digitalized tablet. The collected records were statistically analyzed. Results showed that, except for pressure, all the considered features, successfully discriminate between depressed and nondepressed subjects. In addition, it was observed that depression affects different writing/drawing functionalities. These findings suggest the adoption of writing/drawing tasks in the clinical practice as tools to support the current depression detection methods. This would have important repercussions on reducing the diagnostic times and treatment formulation.
Collapse
Affiliation(s)
- Claudia Greco
- Department of Psychology, Università della Campania "Luigi Vanvitelli", Viale Ellittico 31 Caserta, 81000, Italy
| | - Gennaro Raimo
- Department of Psychology, Università della Campania "Luigi Vanvitelli", Viale Ellittico 31 Caserta, 81000, Italy
| | - Terry Amorese
- Department of Psychology, Università della Campania "Luigi Vanvitelli", Viale Ellittico 31 Caserta, 81000, Italy
| | - Marialucia Cuciniello
- Department of Psychology, Università della Campania "Luigi Vanvitelli", Viale Ellittico 31 Caserta, 81000, Italy
| | - Gavin Mcconvey
- Action Mental Health, 27 Jubilee Rd, BT23 4YH, Newtownards, UK
| | - Gennaro Cordasco
- Department of Psychology, Università della Campania "Luigi Vanvitelli", Viale Ellittico 31 Caserta, 81000, Italy
| | - Marcos Faundez-Zanuy
- Tecnocampus Universitat Pompeu Fabra, Carrer d'Ernest Lluch 32 Mataro, Barcelona 08302, Spain
| | - Alessandro Vinciarelli
- University of Glasgow, School of Computing Science, 18 Lilybank Gardens Glasgow, G12,8RZ, Scotland
| | - Zoraida Callejas-Carrion
- Department of Languages and Computer Systems, Universidad de Granada, Periodista Daniel Saucedo Aranda Granada, 18071, Spain
| | - Anna Esposito
- Department of Psychology, Università della Campania "Luigi Vanvitelli", Viale Ellittico 31 Caserta, 81000, Italy
| |
Collapse
|
10
|
Caulley D, Alemu Y, Burson S, Cárdenas Bautista E, Abebe Tadesse G, Kottmyer C, Aeschbach L, Cheungvivatpant B, Sezgin E. Objectively Quantifying Pediatric Psychiatric Severity Using Artificial Intelligence, Voice Recognition Technology, and Universal Emotions: Pilot Study for Artificial Intelligence-Enabled Innovation to Address Youth Mental Health Crisis. JMIR Res Protoc 2023; 12:e51912. [PMID: 37870890 PMCID: PMC10628686 DOI: 10.2196/51912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 09/14/2023] [Accepted: 09/18/2023] [Indexed: 10/24/2023] Open
Abstract
BACKGROUND Providing Psychotherapy, particularly for youth, is a pressing challenge in the health care system. Traditional methods are resource-intensive, and there is a need for objective benchmarks to guide therapeutic interventions. Automated emotion detection from speech, using artificial intelligence, presents an emerging approach to address these challenges. Speech can carry vital information about emotional states, which can be used to improve mental health care services, especially when the person is suffering. OBJECTIVE This study aims to develop and evaluate automated methods for detecting the intensity of emotions (anger, fear, sadness, and happiness) in audio recordings of patients' speech. We also demonstrate the viability of deploying the models. Our model was validated in a previous publication by Alemu et al with limited voice samples. This follow-up study used significantly more voice samples to validate the previous model. METHODS We used audio recordings of patients, specifically children with high adverse childhood experience (ACE) scores; the average ACE score was 5 or higher, at the highest risk for chronic disease and social or emotional problems; only 1 in 6 have a score of 4 or above. The patients' structured voice sample was collected by reading a fixed script. In total, 4 highly trained therapists classified audio segments based on a scoring process of 4 emotions and their intensity levels for each of the 4 different emotions. We experimented with various preprocessing methods, including denoising, voice-activity detection, and diarization. Additionally, we explored various model architectures, including convolutional neural networks (CNNs) and transformers. We trained emotion-specific transformer-based models and a generalized CNN-based model to predict emotion intensities. RESULTS The emotion-specific transformer-based model achieved a test-set precision and recall of 86% and 79%, respectively, for binary emotional intensity classification (high or low). In contrast, the CNN-based model, generalized to predict the intensity of 4 different emotions, achieved test-set precision and recall of 83% for each. CONCLUSIONS Automated emotion detection from patients' speech using artificial intelligence models is found to be feasible, leading to a high level of accuracy. The transformer-based model exhibited better performance in emotion-specific detection, while the CNN-based model showed promise in generalized emotion detection. These models can serve as valuable decision-support tools for pediatricians and mental health providers to triage youth to appropriate levels of mental health care services. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) RR1-10.2196/51912.
Collapse
Affiliation(s)
- Desmond Caulley
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, United States
| | - Yared Alemu
- TQIntelligence, Inc, Atlanta, GA, United States
- Department of Psychiatry and Behavioral Sciences, Computational Psych Program, Morehouse School of Medicine, Atlanta, GA, United States
| | | | - Elizabeth Cárdenas Bautista
- TQIntelligence, Inc, Atlanta, GA, United States
- Department of Psychiatry and Behavioral Sciences, Computational Psych Program, Morehouse School of Medicine, Atlanta, GA, United States
| | | | - Christopher Kottmyer
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, United States
| | - Laurent Aeschbach
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, United States
| | - Bryan Cheungvivatpant
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, United States
| | - Emre Sezgin
- Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, United States
| |
Collapse
|
11
|
Berardi M, Brosch K, Pfarr JK, Schneider K, Sültmann A, Thomas-Odenthal F, Wroblewski A, Usemann P, Philipsen A, Dannlowski U, Nenadić I, Kircher T, Krug A, Stein F, Dietrich M. Relative importance of speech and voice features in the classification of schizophrenia and depression. Transl Psychiatry 2023; 13:298. [PMID: 37726285 PMCID: PMC10509176 DOI: 10.1038/s41398-023-02594-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 08/10/2023] [Accepted: 09/08/2023] [Indexed: 09/21/2023] Open
Abstract
Speech is a promising biomarker for schizophrenia spectrum disorder (SSD) and major depressive disorder (MDD). This proof of principle study investigates previously studied speech acoustics in combination with a novel application of voice pathology features as objective and reproducible classifiers for depression, schizophrenia, and healthy controls (HC). Speech and voice features for classification were calculated from recordings of picture descriptions from 240 speech samples (20 participants with SSD, 20 with MDD, and 20 HC each with 4 samples). Binary classification support vector machine (SVM) models classified the disorder groups and HC. For each feature, the permutation feature importance was calculated, and the top 25% most important features were used to compare differences between the disorder groups and HC including correlations between the important features and symptom severity scores. Multiple kernels for SVM were tested and the pairwise models with the best performing kernel (3-degree polynomial) were highly accurate for each classification: 0.947 for HC vs. SSD, 0.920 for HC vs. MDD, and 0.932 for SSD vs. MDD. The relatively most important features were measures of articulation coordination, number of pauses per minute, and speech variability. There were moderate correlations between important features and positive symptoms for SSD. The important features suggest that speech characteristics relating to psychomotor slowing, alogia, and flat affect differ between HC, SSD, and MDD.
Collapse
Affiliation(s)
- Mark Berardi
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany.
| | - Katharina Brosch
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Julia-Katharina Pfarr
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Katharina Schneider
- Institute for Linguistics: General Linguistics, University of Mainz, Mainz, Germany
| | - Angela Sültmann
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Florian Thomas-Odenthal
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Adrian Wroblewski
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Paula Usemann
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Alexandra Philipsen
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
| | - Udo Dannlowski
- Institute for Translational Psychiatry, University of Münster, Münster, Germany
| | - Igor Nenadić
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Tilo Kircher
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Axel Krug
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
| | - Frederike Stein
- Department of Psychiatry and Psychotherapy, University of Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, University of Marburg, Marburg, Germany
| | - Maria Dietrich
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
| |
Collapse
|
12
|
Digital mental health care: five lessons from Act 1 and a preview of Acts 2-5. NPJ Digit Med 2023; 6:9. [PMID: 36702920 PMCID: PMC9879995 DOI: 10.1038/s41746-023-00760-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 01/19/2023] [Indexed: 01/27/2023] Open
|
13
|
Stoyanov DS. Endophenotypes and Pathway Phenotypes in Neuro-psychiatry: Crossdisciplinary Implications for Diagnosis. CNS & NEUROLOGICAL DISORDERS DRUG TARGETS 2023; 22:150-151. [PMID: 36482720 DOI: 10.2174/187152732202220914125530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
14
|
König A, Tröger J, Mallick E, Mina M, Linz N, Wagnon C, Karbach J, Kuhn C, Peter J. Detecting subtle signs of depression with automated speech analysis in a non-clinical sample. BMC Psychiatry 2022; 22:830. [PMID: 36575442 PMCID: PMC9793349 DOI: 10.1186/s12888-022-04475-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 12/14/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Automated speech analysis has gained increasing attention to help diagnosing depression. Most previous studies, however, focused on comparing speech in patients with major depressive disorder to that in healthy volunteers. An alternative may be to associate speech with depressive symptoms in a non-clinical sample as this may help to find early and sensitive markers in those at risk of depression. METHODS We included n = 118 healthy young adults (mean age: 23.5 ± 3.7 years; 77% women) and asked them to talk about a positive and a negative event in their life. Then, we assessed the level of depressive symptoms with a self-report questionnaire, with scores ranging from 0-60. We transcribed speech data and extracted acoustic as well as linguistic features. Then, we tested whether individuals below or above the cut-off of clinically relevant depressive symptoms differed in speech features. Next, we predicted whether someone would be below or above that cut-off as well as the individual scores on the depression questionnaire. Since depression is associated with cognitive slowing or attentional deficits, we finally correlated depression scores with performance in the Trail Making Test. RESULTS In our sample, n = 93 individuals scored below and n = 25 scored above cut-off for clinically relevant depressive symptoms. Most speech features did not differ significantly between both groups, but individuals above cut-off spoke more than those below that cut-off in the positive and the negative story. In addition, higher depression scores in that group were associated with slower completion time of the Trail Making Test. We were able to predict with 93% accuracy who would be below or above cut-off. In addition, we were able to predict the individual depression scores with low mean absolute error (3.90), with best performance achieved by a support vector machine. CONCLUSIONS Our results indicate that even in a sample without a clinical diagnosis of depression, changes in speech relate to higher depression scores. This should be investigated in more detail in the future. In a longitudinal study, it may be tested whether speech features found in our study represent early and sensitive markers for subsequent depression in individuals at risk.
Collapse
Affiliation(s)
- Alexandra König
- Institut National de Recherche en Informatique Et en Automatique (INRIA), Sophia Antipolis, Stars Team, Valbonne, France
| | | | | | | | | | - Carole Wagnon
- University Hospital of Old Age Psychiatry and Psychotherapy, University of Bern, Bolligenstrasse 111, CH-3000, Bern 60, Switzerland
| | - Julia Karbach
- Department of Psychology, University of Koblenz-Landau, Koblenz, Germany
| | - Caroline Kuhn
- Department of Psychology, Clinical Neuropsychology, University of Saarland, Saarbrücken, Germany
| | - Jessica Peter
- University Hospital of Old Age Psychiatry and Psychotherapy, University of Bern, Bolligenstrasse 111, CH-3000, Bern 60, Switzerland.
| |
Collapse
|