1
|
Brain Structural Network Connectivity of Formal Thought Disorder Dimensions in Affective and Psychotic Disorders. Biol Psychiatry 2024; 95:629-638. [PMID: 37207935 DOI: 10.1016/j.biopsych.2023.05.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 04/14/2023] [Accepted: 05/04/2023] [Indexed: 05/21/2023]
Abstract
BACKGROUND The psychopathological syndrome of formal thought disorder (FTD) is not only present in schizophrenia (SZ), but also highly prevalent in major depressive disorder and bipolar disorder. It remains unknown how alterations in the structural white matter connectome of the brain correlate with psychopathological FTD dimensions across affective and psychotic disorders. METHODS Using FTD items of the Scale for the Assessment of Positive Symptoms and Scale for the Assessment of Negative Symptoms, we performed exploratory and confirmatory factor analyses in 864 patients with major depressive disorder (n= 689), bipolar disorder (n = 108), or SZ (n = 67) to identify psychopathological FTD dimensions. We used T1- and diffusion-weighted magnetic resonance imaging to reconstruct the structural connectome of the brain. To investigate the association of FTD subdimensions and global structural connectome measures, we employed linear regression models. We used network-based statistic to identify subnetworks of white matter fiber tracts associated with FTD symptomatology. RESULTS Three psychopathological FTD dimensions were delineated, i.e., disorganization, emptiness, and incoherence. Disorganization and incoherence were associated with global dysconnectivity. Network-based statistics identified subnetworks associated with the FTD dimensions disorganization and emptiness but not with the FTD dimension incoherence. Post hoc analyses on subnetworks did not reveal diagnosis × FTD dimension interaction effects. Results remained stable after correcting for medication and disease severity. Confirmatory analyses showed a substantial overlap of nodes from both subnetworks with cortical brain regions previously associated with FTD in SZ. CONCLUSIONS We demonstrated white matter subnetwork dysconnectivity in major depressive disorder, bipolar disorder, and SZ associated with FTD dimensions that predominantly comprise brain regions implicated in speech. Results open an avenue for transdiagnostic, psychopathology-informed, dimensional studies in pathogenetic research.
Collapse
|
2
|
Patients' Demographics and Risk Factors in Voice Disorders: An Umbrella Review of Systematic Reviews. J Voice 2024:S0892-1997(24)00080-8. [PMID: 38556378 DOI: 10.1016/j.jvoice.2024.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 03/09/2024] [Accepted: 03/11/2024] [Indexed: 04/02/2024]
Abstract
OBJECTIVES This study aimed to provide a comprehensive overview of the systematic reviews that focus on the prevalence of voice disorders (VDs), associated risk factors, and the demographic characteristics of patients with dysphonia. An umbrella review was conducted to identify general research themes in voice literature that might guide future research initiatives and contribute to the classification of VDs as a worldwide health concern. STUDY DESIGN Umbrella review of systematic reviews. METHODS Pubmed/Medline and Embase were searched for eligible systematic reviews by two authors independently. Extracted data items included the study publication details, study design, characteristics of the target population, sample size, region/country, and incidence and/or prevalence of the VD(s) of interest. RESULTS Forty systematic reviews were included. Sixteen reported a meta-analysis. Great heterogeneity in methods was found. A total of 277,035 patients across the included studies were included with a prevalence ranging from 0%-90%. The countries represented best were the United States and Brazil, with 13 studies each. Aging, occupational voice use, lifestyle choices, and specific comorbidities, such as obesity or hormonal disorders, seem to be associated with an increased prevalence of dysphonia. CONCLUSIONS This review underscores the influence of VDs on distinct patient groups and the general population. A variety of modifiable or non-modifiable risk factors, having varied degrees of impact on voice qualities, have been identified. The overall effect of VDs is probably underestimated due to factors, such as sample size, patient selection, underreporting of symptoms, and asymptomatic cases. Employing systematic reviews with consistent methodologies and criteria for diagnosing VDs would enhance the ability to determine the prevalence of VDs and their impact.
Collapse
|
3
|
Acoustic and Text Features Analysis for Adult ADHD Screening: A Data-Driven Approach Utilizing DIVA Interview. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2024; 12:359-370. [PMID: 38606391 PMCID: PMC11008805 DOI: 10.1109/jtehm.2024.3369764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/09/2024] [Accepted: 02/15/2024] [Indexed: 04/13/2024]
Abstract
Attention Deficit Hyperactivity Disorder (ADHD) is a neurodevelopmental disorder commonly seen in childhood that leads to behavioural changes in social development and communication patterns, often continues into undiagnosed adulthood due to a global shortage of psychiatrists, resulting in delayed diagnoses with lasting consequences on individual's well-being and the societal impact. Recently, machine learning methodologies have been incorporated into healthcare systems to facilitate the diagnosis and enhance the potential prediction of treatment outcomes for mental health conditions. In ADHD detection, the previous research focused on utilizing functional magnetic resonance imaging (fMRI) or Electroencephalography (EEG) signals, which require costly equipment and trained personnel for data collection. In recent years, speech and text modalities have garnered increasing attention due to their cost-effectiveness and non-wearable sensing in data collection. In this research, conducted in collaboration with the Cumbria, Northumberland, Tyne and Wear NHS Foundation Trust, we gathered audio data from both ADHD patients and normal controls based on the clinically popular Diagnostic Interview for ADHD in adults (DIVA). Subsequently, we transformed the speech data into text modalities through the utilization of the Google Cloud Speech API. We extracted both acoustic and text features from the data, encompassing traditional acoustic features (e.g., MFCC), specialized feature sets (e.g., eGeMAPS), as well as deep-learned linguistic and semantic features derived from pre-trained deep learning models. These features are employed in conjunction with a support vector machine for ADHD classification, yielding promising outcomes in the utilization of audio and text data for effective adult ADHD screening. Clinical impact: This research introduces a transformative approach in ADHD diagnosis, employing speech and text analysis to facilitate early and more accessible detection, particularly beneficial in areas with limited psychiatric resources. Clinical and Translational Impact Statement: The successful application of machine learning techniques in analyzing audio and text data for ADHD screening represents a significant advancement in mental health diagnostics, paving the way for its integration into clinical settings and potentially improving patient outcomes on a broader scale.
Collapse
|
4
|
Neurophysiological explorations across the spectrum of psychosis, autism, and depression, during wakefulness and sleep: protocol of a prospective case-control transdiagnostic multimodal study (DEMETER). BMC Psychiatry 2023; 23:860. [PMID: 37990173 PMCID: PMC10662684 DOI: 10.1186/s12888-023-05347-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 11/03/2023] [Indexed: 11/23/2023] Open
Abstract
BACKGROUND Quantitative electroencephalography (EEG) analysis offers the opportunity to study high-level cognitive processes across psychiatric disorders. In particular, EEG microstates translate the temporal dynamics of neuronal networks throughout the brain. Their alteration may reflect transdiagnostic anomalies in neurophysiological functions that are impaired in mood, psychosis, and autism spectrum disorders, such as sensorimotor integration, speech, sleep, and sense of self. The main questions this study aims to answer are as follows: 1) Are EEG microstate anomalies associated with clinical and functional prognosis, both in resting conditions and during sleep, across psychiatric disorders? 2) Are EEG microstate anomalies associated with differences in sensorimotor integration, speech, sense of self, and sleep? 3) Can the dynamic of EEG microstates be modulated by a non-drug intervention such as light hypnosis? METHODS This prospective cohort will include a population of adolescents and young adults, aged 15 to 30 years old, with ultra-high-risk of psychosis (UHR), first-episode psychosis (FEP), schizophrenia (SCZ), autism spectrum disorder (ASD), and major depressive disorder (MDD), as well as healthy controls (CTRL) (N = 21 × 6), who will be assessed at baseline and after one year of follow-up. Participants will undergo deep phenotyping based on psychopathology, neuropsychological assessments, 64-channel EEG recordings, and biological sampling at the two timepoints. At baseline, the EEG recording will also be coupled to a sensorimotor task and a recording of the characteristics of their speech (prosody and turn-taking), a one-night polysomnography, a self-reference effect task in virtual reality (only in UHR, FEP, and CTRL). An interventional ancillary study will involve only healthy controls, in order to assess whether light hypnosis can modify the EEG microstate architecture in a direction opposite to what is seen in disease. DISCUSSION This transdiagnostic longitudinal case-control study will provide a multimodal neurophysiological assessment of clinical dimensions (sensorimotor integration, speech, sleep, and sense of self) that are disrupted across mood, psychosis, and autism spectrum disorders. It will further test the relevance of EEG microstates as dimensional functional biomarkers. TRIAL REGISTRATION ClinicalTrials.gov Identifier NCT06045897.
Collapse
|
5
|
Unlocking the Emotional World of Visual Media: An Overview of the Science, Research, and Impact of Understanding Emotion: Drawing Insights From Psychology, Engineering, and the Arts, This Article Provides a Comprehensive Overview of the Field of Emotion Analysis in Visual Media and Discusses the Latest Research, Systems, Challenges, Ethical Implications, and Potential Impact of Artificial Emotional Intelligence on Society. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2023; 111:1236-1286. [PMID: 37859667 PMCID: PMC10586271 DOI: 10.1109/jproc.2023.3273517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
The emergence of artificial emotional intelligence technology is revolutionizing the fields of computers and robotics, allowing for a new level of communication and understanding of human behavior that was once thought impossible. While recent advancements in deep learning have transformed the field of computer vision, automated understanding of evoked or expressed emotions in visual media remains in its infancy. This foundering stems from the absence of a universally accepted definition of "emotion," coupled with the inherently subjective nature of emotions and their intricate nuances. In this article, we provide a comprehensive, multidisciplinary overview of the field of emotion analysis in visual media, drawing on insights from psychology, engineering, and the arts. We begin by exploring the psychological foundations of emotion and the computational principles that underpin the understanding of emotions from images and videos. We then review the latest research and systems within the field, accentuating the most promising approaches. We also discuss the current technological challenges and limitations of emotion analysis, underscoring the necessity for continued investigation and innovation. We contend that this represents a "Holy Grail" research problem in computing and delineate pivotal directions for future inquiry. Finally, we examine the ethical ramifications of emotion-understanding technologies and contemplate their potential societal impacts. Overall, this article endeavors to equip readers with a deeper understanding of the domain of emotion analysis in visual media and to inspire further research and development in this captivating and rapidly evolving field.
Collapse
|
6
|
Computerized text and voice analysis of patients with chronic schizophrenia in art therapy. Sci Rep 2023; 13:16062. [PMID: 37749186 PMCID: PMC10520069 DOI: 10.1038/s41598-023-43069-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 09/19/2023] [Indexed: 09/27/2023] Open
Abstract
This explorative study of patients with chronic schizophrenia aimed to clarify whether group art therapy followed by a therapist-guided picture review could influence patients' communication behaviour. Data on voice and speech characteristics were obtained via objective technological instruments, and these characteristics were selected as indicators of communication behaviour. Seven patients were recruited to participate in weekly group art therapy over a period of 6 months. Three days after each group meeting, they talked about their last picture during a standardized interview that was digitally recorded. The audio recordings were evaluated using validated computer-assisted procedures, the transcribed texts were evaluated using the German version of the LIWC2015 program, and the voice recordings were evaluated using the audio analysis software VocEmoApI. The dual methodological approach was intended to form an internal control of the study results. An exploratory factor analysis of the complete sets of output parameters was carried out with the expectation of obtaining typical speech and voice characteristics that map barriers to communication in patients with schizophrenia. The parameters of both methods were thus processed into five factors each, i.e., into a quantitative digitized classification of the texts and voices. The factor scores were subjected to a linear regression analysis to capture possible process-related changes. Most patients continued to participate in the study. This resulted in high-quality datasets for statistical analysis. To answer the study question, two results were summarized: First, text analysis factor called Presence proved to be a potential surrogate parameter for positive language development. Second, quantitative changes in vocal emotional factors were detected, demonstrating differentiated activation patterns of emotions. These results can be interpreted as an expression of a cathartic healing process. The methods presented in this study make a potentially significant contribution to quantitative research into the effectiveness and mode of action of art therapy.
Collapse
|
7
|
Relative importance of speech and voice features in the classification of schizophrenia and depression. Transl Psychiatry 2023; 13:298. [PMID: 37726285 PMCID: PMC10509176 DOI: 10.1038/s41398-023-02594-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 08/10/2023] [Accepted: 09/08/2023] [Indexed: 09/21/2023] Open
Abstract
Speech is a promising biomarker for schizophrenia spectrum disorder (SSD) and major depressive disorder (MDD). This proof of principle study investigates previously studied speech acoustics in combination with a novel application of voice pathology features as objective and reproducible classifiers for depression, schizophrenia, and healthy controls (HC). Speech and voice features for classification were calculated from recordings of picture descriptions from 240 speech samples (20 participants with SSD, 20 with MDD, and 20 HC each with 4 samples). Binary classification support vector machine (SVM) models classified the disorder groups and HC. For each feature, the permutation feature importance was calculated, and the top 25% most important features were used to compare differences between the disorder groups and HC including correlations between the important features and symptom severity scores. Multiple kernels for SVM were tested and the pairwise models with the best performing kernel (3-degree polynomial) were highly accurate for each classification: 0.947 for HC vs. SSD, 0.920 for HC vs. MDD, and 0.932 for SSD vs. MDD. The relatively most important features were measures of articulation coordination, number of pauses per minute, and speech variability. There were moderate correlations between important features and positive symptoms for SSD. The important features suggest that speech characteristics relating to psychomotor slowing, alogia, and flat affect differ between HC, SSD, and MDD.
Collapse
|
8
|
Speech characteristics yield important clues about motor function: Speech variability in individuals at clinical high-risk for psychosis. SCHIZOPHRENIA (HEIDELBERG, GERMANY) 2023; 9:60. [PMID: 37717025 PMCID: PMC10505148 DOI: 10.1038/s41537-023-00382-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 07/24/2023] [Indexed: 09/18/2023]
Abstract
BACKGROUND AND HYPOTHESIS Motor abnormalities are predictive of psychosis onset in individuals at clinical high risk (CHR) for psychosis and are tied to its progression. We hypothesize that these motor abnormalities also disrupt their speech production (a highly complex motor behavior) and predict CHR individuals will produce more variable speech than healthy controls, and that this variability will relate to symptom severity, motor measures, and psychosis-risk calculator risk scores. STUDY DESIGN We measure variability in speech production (variability in consonants, vowels, speech rate, and pausing/timing) in N = 58 CHR participants and N = 67 healthy controls. Three different tasks are used to elicit speech: diadochokinetic speech (rapidly-repeated syllables e.g., papapa…, pataka…), read speech, and spontaneously-generated speech. STUDY RESULTS Individuals in the CHR group produced more variable consonants and exhibited greater speech rate variability than healthy controls in two of the three speech tasks (diadochokinetic and read speech). While there were no significant correlations between speech measures and remotely-obtained motor measures, symptom severity, or conversion risk scores, these comparisons may be under-powered (in part due to challenges of remote data collection during the COVID-19 pandemic). CONCLUSION This study provides a thorough and theory-driven first look at how speech production is affected in this at-risk population and speaks to the promise and challenges facing this approach moving forward.
Collapse
|
9
|
Identifying Medications Underlying Communication Atypicalities in Psychotic and Affective Disorders: A Pharmacovigilance Study Within the FDA Adverse Event Reporting System. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:3242-3259. [PMID: 37524118 DOI: 10.1044/2023_jslhr-22-00739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/02/2023]
Abstract
PURPOSE Communication atypicalities are considered promising markers of a broad range of clinical conditions. However, little is known about the mechanisms and confounders underlying them. Medications might have a crucial, relatively unknown role both as potential confounders and offering an insight on the mechanisms at work. The integration of regulatory documents with disproportionality analyses provides a more comprehensive picture to account for in future investigations of communication-related markers. The aim of this study was to identify a list of drugs potentially associated with communicative atypicalities within psychotic and affective disorders. METHOD We developed a query using the Medical Dictionary for Regulatory Activities to search for communicative atypicalities within the FDA Adverse Event Reporting System (updated June 2021). A Bonferroni-corrected disproportionality analysis (reporting odds ratio) was separately performed on spontaneous reports involving psychotic, affective, and non-neuropsychiatric disorders, to account for the confounding role of different underlying conditions. Drug-adverse event associations not already reported in the Side Effect Resource database of labeled adverse drug reactions (unexpected) were subjected to further robustness analyses to account for expected biases. RESULTS A list of 291 expected and 91 unexpected potential confounding medications was identified, including drugs that may irritate (inhalants) or desiccate (anticholinergics) the larynx, impair speech motor control (antipsychotics), or induce nodules (acitretin) or necrosis (vascular endothelial growth factor receptor inhibitors) on vocal cords; sedatives and stimulants; neurotoxic agents (anti-infectives); and agents acting on neurotransmitter pathways (dopamine agonists). CONCLUSIONS We provide a list of medications to account for in future studies of communication-related markers in affective and psychotic disorders. The current test case illustrates rigorous procedures for digital phenotyping, and the methodological tools implemented for large-scale disproportionality analyses can be considered a road map for investigations of communication-related markers in other clinical populations. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.23721345.
Collapse
|
10
|
Alogia and pressured speech do not fall on a continuum of speech production using objective speech technologies. Schizophr Res 2023; 259:121-126. [PMID: 35864001 DOI: 10.1016/j.schres.2022.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 07/02/2022] [Accepted: 07/04/2022] [Indexed: 10/17/2022]
Abstract
Speech production is affected in a variety of serious mental illnesses (SMI; e.g., schizophrenia, unipolar depression, bipolar disorders) and at its extremes can be observed in the gross reduction of speech (e.g., alogia) or increase of speech (e.g., pressured speech). The present study evaluated whether clinically-rated alogia and pressured speech represent antithetical constructs when analyzed using objective metrics of speech production. We examined natural speech using acoustic and natural language processing features from two archival studies using several different speaking tasks and a combined 107 patients meeting criteria for SMI. Contrary to expectations, we did not find that alogia and pressured speech presented as opposing ends of a speech production continuum. Objective speech markers were associated with clinically rated alogia but not pressured speech, and these results were consistent across speaking tasks and studies. Implications for our understanding of speech production symptoms in SMI are discussed, as well as implications for Natural Language Processing and digital phenotyping efforts more generally.
Collapse
|
11
|
Speech disturbances in schizophrenia: Assessing cross-linguistic generalizability of NLP automated measures of coherence. Schizophr Res 2023; 259:59-70. [PMID: 35927097 DOI: 10.1016/j.schres.2022.07.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 06/29/2022] [Accepted: 07/01/2022] [Indexed: 11/22/2022]
Abstract
INTRODUCTION Language disorders - disorganized and incoherent speech in particular - are distinctive features of schizophrenia. Natural language processing (NLP) offers automated measures of incoherent speech as promising markers for schizophrenia. However, the scientific and clinical impact of NLP markers depends on their generalizability across contexts, samples, and languages, which we systematically assessed in the present study relying on a large, novel, cross-linguistic corpus. METHODS We collected a Danish (DK), German (GE), and Chinese (CH) cross-linguistic dataset involving transcripts from 187 participants with schizophrenia (111DK, 25GE, 51CH) and 200 matched controls (129DK, 29GE, 42CH) performing the Animated Triangles Task. Fourteen previously published NLP coherence measures were calculated, and between-groups differences and association with symptoms were tested for cross-linguistic generalizability. RESULTS One coherence measure, i.e. second-order coherence, robustly generalized across samples and languages. We found several language-specific effects, some of which partially replicated previous findings (lower coherence in German and Chinese patients), while others did not (higher coherence in Danish patients). We found several associations between symptoms and measures of coherence, but the effects were generally inconsistent across languages and rating scales. CONCLUSIONS Using a cumulative approach, we have shown that NLP findings of reduced semantic coherence in schizophrenia have limited generalizability across different languages, samples, and measures. We argue that several factors such as sociodemographic and clinical heterogeneity, cross-linguistic variation, and the different NLP measures reflecting different clinical aspects may be responsible for this variability. Future studies should take this variability into account in order to develop effective clinical applications targeting different patient populations.
Collapse
|
12
|
Exploring the ability of vocal biomarkers in distinguishing depression from bipolar disorder, schizophrenia, and healthy controls. Front Psychiatry 2023; 14:1079448. [PMID: 37575564 PMCID: PMC10415910 DOI: 10.3389/fpsyt.2023.1079448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 06/30/2023] [Indexed: 08/15/2023] Open
Abstract
Background Vocal features have been exploited to distinguish depression from healthy controls. While there have been some claims for success, the degree to which changes in vocal features are specific to depression has not been systematically studied. Hence, we examined the performances of vocal features in differentiating depression from bipolar disorder (BD), schizophrenia and healthy controls, as well as pairwise classifications for the three disorders. Methods We sampled 32 bipolar disorder patients, 106 depression patients, 114 healthy controls, and 20 schizophrenia patients. We extracted i-vectors from Mel-frequency cepstrum coefficients (MFCCs), and built logistic regression models with ridge regularization and 5-fold cross-validation on the training set, then applied models to the test set. There were seven classification tasks: any disorder versus healthy controls; depression versus healthy controls; BD versus healthy controls; schizophrenia versus healthy controls; depression versus BD; depression versus schizophrenia; BD versus schizophrenia. Results The area under curve (AUC) score for classifying depression and bipolar disorder was 0.5 (F-score = 0.44). For other comparisons, the AUC scores ranged from 0.75 to 0.92, and the F-scores ranged from 0.73 to 0.91. The model performance (AUC) of classifying depression and bipolar disorder was significantly worse than that of classifying bipolar disorder and schizophrenia (corrected p < 0.05). While there were no significant differences in the remaining pairwise comparisons of the 7 classification tasks. Conclusion Vocal features showed discriminatory potential in classifying depression and the healthy controls, as well as between depression and other mental disorders. Future research should systematically examine the mechanisms of voice features in distinguishing depression with other mental disorders and develop more sophisticated machine learning models so that voice can assist clinical diagnosis better.
Collapse
|
13
|
Voice acoustics allow classifying autism spectrum disorder with high accuracy. Transl Psychiatry 2023; 13:250. [PMID: 37422467 DOI: 10.1038/s41398-023-02554-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 06/28/2023] [Accepted: 06/30/2023] [Indexed: 07/10/2023] Open
Abstract
Early identification of children on the autism spectrum is crucial for early intervention with long-term positive effects on symptoms and skills. The need for improved objective autism detection tools is emphasized by the poor diagnostic power in current tools. Here, we aim to evaluate the classification performance of acoustic features of the voice in children with autism spectrum disorder (ASD) with respect to a heterogeneous control group (composed of neurotypical children, children with Developmental Language Disorder [DLD] and children with sensorineural hearing loss with Cochlear Implant [CI]). This retrospective diagnostic study was conducted at the Child Psychiatry Unit of Tours University Hospital (France). A total of 108 children, including 38 diagnosed with ASD (8.5 ± 0.25 years), 24 typically developing (TD; 8.2 ± 0.32 years) and 46 children with atypical development (DLD and CI; 7.9 ± 0.36 years) were enrolled in our studies. The acoustic properties of speech samples produced by children in the context of a nonword repetition task were measured. We used a Monte Carlo cross-validation with an ROC (Receiving Operator Characteristic) supervised k-Means clustering algorithm to develop a classification model that can differentially classify a child with an unknown disorder. We showed that voice acoustics classified autism diagnosis with an overall accuracy of 91% [CI95%, 90.40%-91.65%] against TD children, and of 85% [CI95%, 84.5%-86.6%] against an heterogenous group of non-autistic children. Accuracy reported here with multivariate analysis combined with Monte Carlo cross-validation is higher than in previous studies. Our findings demonstrate that easy-to-measure voice acoustic parameters could be used as a diagnostic aid tool, specific to ASD.
Collapse
|
14
|
Reading and lexical-semantic retrieval tasks outperforms single task speech analysis in the screening of mild cognitive impairment and Alzheimer's disease. Sci Rep 2023; 13:9728. [PMID: 37322073 PMCID: PMC10272227 DOI: 10.1038/s41598-023-36804-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/12/2023] [Indexed: 06/17/2023] Open
Abstract
Age-related cognitive impairment have increased dramatically in recent years, which has risen the interes in developing screening tools for mild cognitive impairment and Alzheimer's disease. Speech analysis allows to exploit the behavioral consequences of cognitive deficits on the patient's vocal performance so that it is possible to identify pathologies affecting speech production such as dementia. Previous studies have further shown that the speech task used determines how the speech parameters are altered. We aim to combine the impairments in several speech production tasks in order to improve the accuracy of screening through speech analysis. The sample consists of 72 participants divided into three equal groups of healthy older adults, people with mild cognitive impairment, or Alzheimer's disease, matched by age and education. A complete neuropsychological assessment and two voice recordings were performed. The tasks required the participants to read a text, and complete a sentence with semantic information. A stepwise linear discriminant analysis was performed to select speech parameters with discriminative power. The discriminative functions obtained an accuracy of 83.3% in simultaneous classifications of several levels of cognitive impairment. It would therefore be a promising screening tool for dementia.
Collapse
|
15
|
Combining automatic speech recognition with semantic natural language processing in schizophrenia. Psychiatry Res 2023; 325:115252. [PMID: 37236098 DOI: 10.1016/j.psychres.2023.115252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/21/2023] [Accepted: 05/13/2023] [Indexed: 05/28/2023]
Abstract
Natural language processing (NLP) tools are increasingly used to quantify semantic anomalies in schizophrenia. Automatic speech recognition (ASR) technology, if robust enough, could significantly speed up the NLP research process. In this study, we assessed the performance of a state-of-the-art ASR tool and its impact on diagnostic classification accuracy based on a NLP model. We compared ASR to human transcripts quantitatively (Word Error Rate (WER)) and qualitatively by analyzing error type and position. Subsequently, we evaluated the impact of ASR on classification accuracy using semantic similarity measures. Two random forest classifiers were trained with similarity measures derived from automatic and manual transcriptions, and their performance was compared. The ASR tool had a mean WER of 30.4%. Pronouns and words in sentence-final position had the highest WERs. The classification accuracy was 76.7% (sensitivity 70%; specificity 86%) using automated transcriptions and 79.8% (sensitivity 75%; specificity 86%) for manual transcriptions. The difference in performance between the models was not significant. These findings demonstrate that using ASR for semantic analysis is associated with only a small decrease in accuracy in classifying schizophrenia, compared to manual transcripts. Thus, combining ASR technology with semantic NLP models qualifies as a robust and efficient method for diagnosing schizophrenia.
Collapse
|
16
|
Different in different ways: A network-analysis approach to voice and prosody in Autism Spectrum Disorder. LANGUAGE LEARNING AND DEVELOPMENT : THE OFFICIAL JOURNAL OF THE SOCIETY FOR LANGUAGE DEVELOPMENT 2023; 20:40-57. [PMID: 38486613 PMCID: PMC10936700 DOI: 10.1080/15475441.2023.2196528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
The current study investigated whether the difficulty in finding group differences in prosody between speakers with autism spectrum disorder (ASD) and neurotypical (NT) speakers might be explained by identifying different acoustic profiles of speakers which, while still perceived as atypical, might be characterized by different acoustic qualities. We modelled the speech from a selection of speakers (N = 26), with and without ASD, as a network of nodes defined by acoustic features. We used a community-detection algorithm to identify clusters of speakers who were acoustically similar and compared these clusters with atypicality ratings by naïve and expert human raters. Results identified three clusters: one primarily composed of speakers with ASD, one of mostly NT speakers, and one comprised of an even mixture of ASD and NT speakers. The human raters were highly reliable at distinguishing speakers with and without ASD, regardless of which cluster the speaker was in. These results suggest that community-detection methods using a network approach may complement commonly-employed human ratings to improve our understanding of the intonation profiles in ASD.
Collapse
|
17
|
A Narrative Review of Speech and EEG Features for Schizophrenia Detection: Progress and Challenges. Bioengineering (Basel) 2023; 10:bioengineering10040493. [PMID: 37106680 PMCID: PMC10135748 DOI: 10.3390/bioengineering10040493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 04/06/2023] [Accepted: 04/14/2023] [Indexed: 04/29/2023] Open
Abstract
Schizophrenia is a mental illness that affects an estimated 21 million people worldwide. The literature establishes that electroencephalography (EEG) is a well-implemented means of studying and diagnosing mental disorders. However, it is known that speech and language provide unique and essential information about human thought. Semantic and emotional content, semantic coherence, syntactic structure, and complexity can thus be combined in a machine learning process to detect schizophrenia. Several studies show that early identification is crucial to prevent the onset of illness or mitigate possible complications. Therefore, it is necessary to identify disease-specific biomarkers for an early diagnosis support system. This work contributes to improving our knowledge about schizophrenia and the features that can identify this mental illness via speech and EEG. The emotional state is a specific characteristic of schizophrenia that can be identified with speech emotion analysis. The most used features of speech found in the literature review are fundamental frequency (F0), intensity/loudness (I), frequency formants (F1, F2, and F3), Mel-frequency cepstral coefficients (MFCC's), the duration of pauses and sentences (SD), and the duration of silence between words. Combining at least two feature categories achieved high accuracy in the schizophrenia classification. Prosodic and spectral or temporal features achieved the highest accuracy. The work with higher accuracy used the prosodic and spectral features QEVA, SDVV, and SSDL, which were derived from the F0 and spectrogram. The emotional state can be identified with most of the features previously mentioned (F0, I, F1, F2, F3, MFCCs, and SD), linear prediction cepstral coefficients (LPCC), linear spectral features (LSF), and the pause rate. Using the event-related potentials (ERP), the most promissory features found in the literature are mismatch negativity (MMN), P2, P3, P50, N1, and N2. The EEG features with higher accuracy in schizophrenia classification subjects are the nonlinear features, such as Cx, HFD, and Lya.
Collapse
|
18
|
Semantic and Acoustic Markers in Schizophrenia-Spectrum Disorders: A Combinatory Machine Learning Approach. Schizophr Bull 2023; 49:S163-S171. [PMID: 36305054 PMCID: PMC10031732 DOI: 10.1093/schbul/sbac142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
BACKGROUND AND HYPOTHESIS Speech is a promising marker to aid diagnosis of schizophrenia-spectrum disorders, as it reflects symptoms like thought disorder and negative symptoms. Previous approaches made use of different domains of speech for diagnostic classification, including features like coherence (semantic) and form (acoustic). However, an examination of the added value of each domain when combined is lacking as of yet. Here, we investigate the acoustic and semantic domains separately and combined. STUDY DESIGN Using semi-structured interviews, speech of 94 subjects with schizophrenia-spectrum disorders (SSD) and 73 healthy controls (HC) was recorded. Acoustic features were extracted using a standardized feature-set, and transcribed interviews were used to calculate semantic word similarity using word2vec. Random forest classifiers were trained for each domain. A third classifier was used to combine features from both domains; 10-fold cross-validation was used for each model. RESULTS The acoustic random forest classifier achieved 81% accuracy classifying SSD and HC, while the semantic domain classifier reached an accuracy of 80%. Joining features from the two domains, the combined classifier reached 85% accuracy, significantly improving on separate domain classifiers. For the combined classifier, top features were fragmented speech from the acoustic domain and variance of similarity from the semantic domain. CONCLUSIONS Both semantic and acoustic analyses of speech achieved ~80% accuracy in classifying SSD from HC. We replicate earlier findings per domain, additionally showing that combining these features significantly improves classification performance. Feature importance and accuracy in combined classification indicate that the domains measure different, complementing aspects of speech.
Collapse
|
19
|
Latent Factors of Language Disturbance and Relationships to Quantitative Speech Features. Schizophr Bull 2023; 49:S93-S103. [PMID: 36946530 PMCID: PMC10031730 DOI: 10.1093/schbul/sbac145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Abstract
BACKGROUND AND HYPOTHESIS Quantitative acoustic and textual measures derived from speech ("speech features") may provide valuable biomarkers for psychiatric disorders, particularly schizophrenia spectrum disorders (SSD). We sought to identify cross-diagnostic latent factors for speech disturbance with relevance for SSD and computational modeling. STUDY DESIGN Clinical ratings for speech disturbance were generated across 14 items for a cross-diagnostic sample (N = 334), including SSD (n = 90). Speech features were quantified using an automated pipeline for brief recorded samples of free speech. Factor models for the clinical ratings were generated using exploratory factor analysis, then tested with confirmatory factor analysis in the cross-diagnostic and SSD groups. The relationships between factor scores and computational speech features were examined for 202 of the participants. STUDY RESULTS We found a 3-factor model with a good fit in the cross-diagnostic group and an acceptable fit for the SSD subsample. The model identifies an impaired expressivity factor and 2 interrelated disorganized factors for inefficient and incoherent speech. Incoherent speech was specific to psychosis groups, while inefficient speech and impaired expressivity showed intermediate effects in people with nonpsychotic disorders. Each of the 3 factors had significant and distinct relationships with speech features, which differed for the cross-diagnostic vs SSD groups. CONCLUSIONS We report a cross-diagnostic 3-factor model for speech disturbance which is supported by good statistical measures, intuitive, applicable to SSD, and relatable to linguistic theories. It provides a valuable framework for understanding speech disturbance and appropriate targets for modeling with quantitative speech features.
Collapse
|
20
|
Investigating temporal and prosodic markers in clinical high-risk for psychosis participants using automated acoustic analysis. Early Interv Psychiatry 2023; 17:327-330. [PMID: 36205386 PMCID: PMC10946925 DOI: 10.1111/eip.13357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 05/14/2022] [Accepted: 09/18/2022] [Indexed: 11/27/2022]
Abstract
AIM Language disturbances are a candidate biomarker for the early detection of psychosis. Temporal and prosodic abnormalities have been observed in schizophrenia patients, while there is conflicting evidence whether such deficits are present in participants meeting clinical high-risk for psychosis (CHR-P) criteria. METHODS Clinical interviews from CHR-P participants (n = 50) were examined for temporal and prosodic metrics and compared against a group of healthy controls (n = 17) and participants with affective disorders and substance abuse (n = 23). RESULTS There were no deficits in acoustic variables in the CHR-P group, while participants with affective disorders/substance abuse were characterized by slower speech rate, longer pauses and higher unvoiced frames percentage. CONCLUSION Our finding suggests that temporal and prosodic aspects of speech are not impaired in early-stage psychosis. Further studies are required to clarify whether such abnormalities are present in sub-groups of CHR-P participants with elevated psychosis-risk.
Collapse
|
21
|
Acoustic speech markers for schizophrenia-spectrum disorders: a diagnostic and symptom-recognition tool. Psychol Med 2023; 53:1302-1312. [PMID: 34344490 PMCID: PMC10009369 DOI: 10.1017/s0033291721002804] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 06/10/2021] [Accepted: 06/21/2021] [Indexed: 11/05/2022]
Abstract
BACKGROUND Clinicians routinely use impressions of speech as an element of mental status examination. In schizophrenia-spectrum disorders, descriptions of speech are used to assess the severity of psychotic symptoms. In the current study, we assessed the diagnostic value of acoustic speech parameters in schizophrenia-spectrum disorders, as well as its value in recognizing positive and negative symptoms. METHODS Speech was obtained from 142 patients with a schizophrenia-spectrum disorder and 142 matched controls during a semi-structured interview on neutral topics. Patients were categorized as having predominantly positive or negative symptoms using the Positive and Negative Syndrome Scale (PANSS). Acoustic parameters were extracted with OpenSMILE, employing the extended Geneva Acoustic Minimalistic Parameter Set, which includes standardized analyses of pitch (F0), speech quality and pauses. Speech parameters were fed into a random forest algorithm with leave-ten-out cross-validation to assess their value for a schizophrenia-spectrum diagnosis, and PANSS subtype recognition. RESULTS The machine-learning speech classifier attained an accuracy of 86.2% in classifying patients with a schizophrenia-spectrum disorder and controls on speech parameters alone. Patients with predominantly positive v. negative symptoms could be classified with an accuracy of 74.2%. CONCLUSIONS Our results show that automatically extracted speech parameters can be used to accurately classify patients with a schizophrenia-spectrum disorder and healthy controls, as well as differentiate between patients with predominantly positive v. negatives symptoms. Thus, the field of speech technology has provided a standardized, powerful tool that has high potential for clinical applications in diagnosis and differentiation, given its ease of comparison and replication across samples.
Collapse
|
22
|
The future of psychopharmacology: a critical appraisal of ongoing phase 2/3 trials, and of some current trends aiming to de-risk trial programmes of novel agents. World Psychiatry 2023; 22:48-74. [PMID: 36640403 PMCID: PMC9840514 DOI: 10.1002/wps.21056] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/14/2022] [Indexed: 01/15/2023] Open
Abstract
Despite considerable progress in pharmacotherapy over the past seven decades, many mental disorders remain insufficiently treated. This situation is in part due to the limited knowledge of the pathophysiology of these disorders and the lack of biological markers to stratify and individualize patient selection, but also to a still restricted number of mechanisms of action being targeted in monotherapy or combination/augmentation treatment, as well as to a variety of challenges threatening the successful development and testing of new drugs. In this paper, we first provide an overview of the most promising drugs with innovative mechanisms of action that are undergoing phase 2 or 3 testing for schizophrenia, bipolar disorder, major depressive disorder, anxiety and trauma-related disorders, substance use disorders, and dementia. Promising repurposing of established medications for new psychiatric indications, as well as variations in the modulation of dopamine, noradrenaline and serotonin receptor functioning, are also considered. We then critically discuss the clinical trial parameters that need to be considered in depth when developing and testing new pharmacological agents for the treatment of mental disorders. Hurdles and perils threatening success of new drug development and testing include inadequacy and imprecision of inclusion/exclusion criteria and ratings, sub-optimally suited clinical trial participants, multiple factors contributing to a large/increasing placebo effect, and problems with statistical analyses. This information should be considered in order to de-risk trial programmes of novel agents or known agents for novel psychiatric indications, increasing their chances of success.
Collapse
|
23
|
Applications of Speech Analysis in Psychiatry. Harv Rev Psychiatry 2023; 31:1-13. [PMID: 36608078 DOI: 10.1097/hrp.0000000000000356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
ABSTRACT The need for objective measurement in psychiatry has stimulated interest in alternative indicators of the presence and severity of illness. Speech may offer a source of information that bridges the subjective and objective in the assessment of mental disorders. We systematically reviewed the literature for articles exploring speech analysis for psychiatric applications. The utility of speech analysis depends on how accurately speech features represent clinical symptoms within and across disorders. We identified four domains of the application of speech analysis in the literature: diagnostic classification, assessment of illness severity, prediction of onset of illness, and prognosis and treatment outcomes. We discuss the findings in each of these domains, with a focus on how types of speech features characterize different aspects of psychopathology. Models that bring together multiple speech features can distinguish speakers with psychiatric disorders from healthy controls with high accuracy. Differentiating between types of mental disorders and symptom dimensions are more complex problems that expose the transdiagnostic nature of speech features. Convergent progress in speech research and computer sciences opens avenues for implementing speech analysis to enhance objectivity of assessment in clinical practice. Application of speech analysis will need to address issues of ethics and equity, including the potential to perpetuate discriminatory bias through models that learn from clinical assessment data. Methods that mitigate bias are available and should play a key role in the implementation of speech analysis.
Collapse
|
24
|
Brain mechanism of unfamiliar and familiar voice processing: an activation likelihood estimation meta-analysis. PeerJ 2023; 11:e14976. [PMID: 36935917 PMCID: PMC10019337 DOI: 10.7717/peerj.14976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 02/08/2023] [Indexed: 03/14/2023] Open
Abstract
Interpersonal communication through vocal information is very important for human society. During verbal interactions, our vocal cord vibrations convey important information regarding voice identity, which allows us to decide how to respond to speakers (e.g., neither greeting a stranger too warmly or speaking too coldly to a friend). Numerous neural studies have shown that identifying familiar and unfamiliar voices may rely on different neural bases. However, the mechanism underlying voice identification of individuals of varying familiarity has not been determined due to vague definitions, confusion of terms, and differences in task design. To address this issue, the present study first categorized three kinds of voice identity processing (perception, recognition and identification) from speakers with different degrees of familiarity. We defined voice identity perception as passively listening to a voice or determining if the voice was human, voice identity recognition as determining if the sound heard was acoustically familiar, and voice identity identification as ascertaining whether a voice is associated with a name or face. Of these, voice identity perception involves processing unfamiliar voices, and voice identity recognition and identification involves processing familiar voices. According to these three definitions, we performed activation likelihood estimation (ALE) on 32 studies and revealed different brain mechanisms underlying processing of unfamiliar and familiar voice identities. The results were as follows: (1) familiar voice recognition/identification was supported by a network involving most regions in the temporal lobe, some regions in the frontal lobe, subcortical structures and regions around the marginal lobes; (2) the bilateral superior temporal gyrus was recruited for voice identity perception of an unfamiliar voice; (3) voice identity recognition/identification of familiar voices was more likely to activate the right frontal lobe than voice identity perception of unfamiliar voices, while voice identity perception of an unfamiliar voice was more likely to activate the bilateral temporal lobe and left frontal lobe; and (4) the bilateral superior temporal gyrus served as a shared neural basis of unfamiliar voice identity perception and familiar voice identity recognition/identification. In general, the results of the current study address gaps in the literature, provide clear definitions of concepts, and indicate brain mechanisms for subsequent investigations.
Collapse
|
25
|
A systematic review and Bayesian meta-analysis of the acoustic features of infant-directed speech. Nat Hum Behav 2023; 7:114-133. [PMID: 36192492 DOI: 10.1038/s41562-022-01452-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 08/23/2022] [Indexed: 02/03/2023]
Abstract
When speaking to infants, adults often produce speech that differs systematically from that directed to other adults. To quantify the acoustic properties of this speech style across a wide variety of languages and cultures, we extracted results from empirical studies on the acoustic features of infant-directed speech. We analysed data from 88 unique studies (734 effect sizes) on the following five acoustic parameters that have been systematically examined in the literature: fundamental frequency (f0), f0 variability, vowel space area, articulation rate and vowel duration. Moderator analyses were conducted in hierarchical Bayesian robust regression models to examine how these features change with infant age and differ across languages, experimental tasks and recording environments. The moderator analyses indicated that f0, articulation rate and vowel duration became more similar to adult-directed speech over time, whereas f0 variability and vowel space area exhibited stability throughout development. These results point the way for future research to disentangle different accounts of the functions and learnability of infant-directed speech by conducting theory-driven comparisons among different languages and using computational models to formulate testable predictions.
Collapse
|
26
|
Deconstructing heterogeneity in schizophrenia through language: a semi-automated linguistic analysis and data-driven clustering approach. SCHIZOPHRENIA (HEIDELBERG, GERMANY) 2022; 8:102. [PMID: 36446789 PMCID: PMC9708845 DOI: 10.1038/s41537-022-00306-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 10/24/2022] [Indexed: 06/16/2023]
Abstract
Previous works highlighted the relevance of automated language analysis for predicting diagnosis in schizophrenia, but a deeper language-based data-driven investigation of the clinical heterogeneity through the illness course has been generally neglected. Here we used a semiautomated multidimensional linguistic analysis innovatively combined with a machine-driven clustering technique to characterize the speech of 67 individuals with schizophrenia. Clusters were then compared for psychopathological, cognitive, and functional characteristics. We identified two subgroups with distinctive linguistic profiles: one with higher fluency, lower lexical variety but greater use of psychological lexicon; the other with reduced fluency, greater lexical variety but reduced psychological lexicon. The former cluster was associated with lower symptoms and better quality of life, pointing to the existence of specific language profiles, which also show clinically meaningful differences. These findings highlight the importance of considering language disturbances in schizophrenia as multifaceted and approaching them in automated and data-driven ways.
Collapse
|
27
|
Identifying psychiatric manifestations in schizophrenia and depression from audio-visual behavioural indicators through a machine-learning approach. SCHIZOPHRENIA 2022; 8:92. [PMID: 36344515 PMCID: PMC9640655 DOI: 10.1038/s41537-022-00287-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 09/08/2022] [Indexed: 11/09/2022]
Abstract
Schizophrenia (SCZ) and depression (MDD) are two chronic mental disorders that seriously affect the quality of life of millions of people worldwide. We aim to develop machine-learning methods with objective linguistic, speech, facial, and motor behavioral cues to reliably predict the severity of psychopathology or cognitive function, and distinguish diagnosis groups. We collected and analyzed the speech, facial expressions, and body movement recordings of 228 participants (103 SCZ, 50 MDD, and 75 healthy controls) from two separate studies. We created an ensemble machine-learning pipeline and achieved a balanced accuracy of 75.3% for classifying the total score of negative symptoms, 75.6% for the composite score of cognitive deficits, and 73.6% for the total score of general psychiatric symptoms in the mixed sample containing all three diagnostic groups. The proposed system is also able to differentiate between MDD and SCZ with a balanced accuracy of 84.7% and differentiate patients with SCZ or MDD from healthy controls with a balanced accuracy of 82.3%. These results suggest that machine-learning models leveraging audio-visual characteristics can help diagnose, assess, and monitor patients with schizophrenia and depression.
Collapse
|
28
|
A Tutorial Review on Clinical Acoustic Markers in Speech Science. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3239-3263. [PMID: 36044888 DOI: 10.1044/2022_jslhr-21-00647] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
PURPOSE The human voice changes with the progression of neurological disease and the onset of diseases that affect articulators, often decreasing the effectiveness of communication. These changes can be objectively measured using signal processing techniques that extract acoustic features. When measuring acoustic features, there are often several steps and assumptions that might be known to experts in acoustics and phonetics, but are less transparent for other disciplines (e.g., clinical medicine, speech pathology, engineering, and data science). This tutorial describes these signal processing techniques, explicitly outlines the underlying steps for accurate measurement, and discusses the implications of clinical acoustic markers. CONCLUSIONS We establish a vocabulary using straightforward terms, provide visualizations to achieve common ground, and guide understanding for those outside the domains of acoustics and auditory signal processing. Where possible, we highlight the best practices for measuring clinical acoustic markers and suggest resources for obtaining and further understanding these measures.
Collapse
|
29
|
Using automated syllable counting to detect missing information in speech transcripts from clinical settings. Psychiatry Res 2022; 315:114712. [PMID: 35839638 PMCID: PMC9378537 DOI: 10.1016/j.psychres.2022.114712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 07/01/2022] [Accepted: 07/02/2022] [Indexed: 11/19/2022]
Abstract
Speech rate and quantity reflect clinical state; thus automated transcription holds potential clinical applications. We describe two datasets where recording quality and speaker characteristics affected transcription accuracy. Transcripts of low-quality recordings omitted significant portions of speech. An automated syllable counter estimated actual speech output and quantified the amount of missing information. The efficacy of this method differed by audio quality: the correlation between missing syllables and word error rate was only significant when quality was low. Automatically counting syllables could be useful to measure and flag transcription omissions in clinical contexts where speaker characteristics and recording quality are problematic.
Collapse
|
30
|
Voice Analysis for Neurological Disorder Recognition–A Systematic Review and Perspective on Emerging Trends. Front Digit Health 2022; 4:842301. [PMID: 35899034 PMCID: PMC9309252 DOI: 10.3389/fdgth.2022.842301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 05/25/2022] [Indexed: 11/25/2022] Open
Abstract
Quantifying neurological disorders from voice is a rapidly growing field of research and holds promise for unobtrusive and large-scale disorder monitoring. The data recording setup and data analysis pipelines are both crucial aspects to effectively obtain relevant information from participants. Therefore, we performed a systematic review to provide a high-level overview of practices across various neurological disorders and highlight emerging trends. PRISMA-based literature searches were conducted through PubMed, Web of Science, and IEEE Xplore to identify publications in which original (i.e., newly recorded) datasets were collected. Disorders of interest were psychiatric as well as neurodegenerative disorders, such as bipolar disorder, depression, and stress, as well as amyotrophic lateral sclerosis amyotrophic lateral sclerosis, Alzheimer's, and Parkinson's disease, and speech impairments (aphasia, dysarthria, and dysphonia). Of the 43 retrieved studies, Parkinson's disease is represented most prominently with 19 discovered datasets. Free speech and read speech tasks are most commonly used across disorders. Besides popular feature extraction toolkits, many studies utilise custom-built feature sets. Correlations of acoustic features with psychiatric and neurodegenerative disorders are presented. In terms of analysis, statistical analysis for significance of individual features is commonly used, as well as predictive modeling approaches, especially with support vector machines and a small number of artificial neural networks. An emerging trend and recommendation for future studies is to collect data in everyday life to facilitate longitudinal data collection and to capture the behavior of participants more naturally. Another emerging trend is to record additional modalities to voice, which can potentially increase analytical performance.
Collapse
|
31
|
MDMA for the Treatment of Negative Symptoms in Schizophrenia. J Clin Med 2022; 11:jcm11123255. [PMID: 35743326 PMCID: PMC9225098 DOI: 10.3390/jcm11123255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 05/31/2022] [Accepted: 06/02/2022] [Indexed: 02/05/2023] Open
Abstract
The profound economic burden of schizophrenia is due, in part, to the negative symptoms of the disease, which can severely limit daily functioning. There is much debate in the field regarding their measurement and classification and there are no FDA-approved treatments for negative symptoms despite an abundance of research. 3,4-Methylenedioxy methamphetamine (MDMA) is a schedule I substance that has emerged as a novel therapeutic given its ability to enhance social interactions, generate empathy, and induce a state of metaplasticity in the brain. This review provides a rationale for the use of MDMA in the treatment of negative symptoms by reviewing the literature on negative symptoms, their treatment, MDMA, and MDMA-assisted therapy. It reviews recent evidence that supports the safe and potentially effective use of MDMA to treat negative symptoms and concludes with considerations regarding safety and possible mechanisms of action.
Collapse
|
32
|
Vocal markers of autism: Assessing the generalizability of machine learning models. Autism Res 2022; 15:1018-1030. [PMID: 35385224 DOI: 10.1002/aur.2721] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 02/24/2022] [Accepted: 03/22/2022] [Indexed: 01/09/2023]
Abstract
Machine learning (ML) approaches show increasing promise in their ability to identify vocal markers of autism. Nonetheless, it is unclear to what extent such markers generalize to new speech samples collected, for example, using a different speech task or in a different language. In this paper, we systematically assess the generalizability of ML findings across a variety of contexts. We train promising published ML models of vocal markers of autism on novel cross-linguistic datasets following a rigorous pipeline to minimize overfitting, including cross-validated training and ensemble models. We test the generalizability of the models by testing them on (i) different participants from the same study, performing the same task; (ii) the same participants, performing a different (but similar) task; (iii) a different study with participants speaking a different language, performing the same type of task. While model performance is similar to previously published findings when trained and tested on data from the same study (out-of-sample performance), there is considerable variance between studies. Crucially, the models do not generalize well to different, though similar, tasks and not at all to new languages. The ML pipeline is openly shared. Generalizability of ML models of vocal markers of autism is an issue. We outline three recommendations for strategies researchers could take to be more explicit about generalizability and improve it in future studies. LAY SUMMARY: Machine learning approaches promise to be able to identify autism from voice only. These models underestimate how diverse the contexts in which we speak are, how diverse the languages used are and how diverse autistic voices are. Machine learning approaches need to be more careful in defining their limits and generalizability.
Collapse
|
33
|
Abstract
Negative schizotypal traits potentially can be digitally phenotyped using objective vocal analysis. Prior attempts have shown mixed success in this regard, potentially because acoustic analysis has relied on small, constrained feature sets. We employed machine learning to (a) optimize and cross-validate predictive models of self-reported negative schizotypy using a large acoustic feature set, (b) evaluate model performance as a function of sex and speaking task, (c) understand potential mechanisms underlying negative schizotypal traits by evaluating the key acoustic features within these models, and (d) examine model performance in its convergence with clinical symptoms and cognitive functioning. Accuracy was good (> 80%) and was improved by considering speaking task and sex. However, the features identified as most predictive of negative schizotypal traits were generally not considered critical to their conceptual definitions. Implications for validating and implementing digital phenotyping to understand and quantify negative schizotypy are discussed.
Collapse
|
34
|
A generalizable speech emotion recognition model reveals depression and remission. Acta Psychiatr Scand 2022; 145:186-199. [PMID: 34850386 DOI: 10.1111/acps.13388] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/24/2021] [Accepted: 11/25/2021] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Affective disorders are associated with atypical voice patterns; however, automated voice analyses suffer from small sample sizes and untested generalizability on external data. We investigated a generalizable approach to aid clinical evaluation of depression and remission from voice using transfer learning: We train machine learning models on easily accessible non-clinical datasets and test them on novel clinical data in a different language. METHODS A Mixture of Experts machine learning model was trained to infer happy/sad emotional state using three publicly available emotional speech corpora in German and US English. We examined the model's predictive ability to classify the presence of depression on Danish speaking healthy controls (N = 42), patients with first-episode major depressive disorder (MDD) (N = 40), and the subset of the same patients who entered remission (N = 25) based on recorded clinical interviews. The model was evaluated on raw, de-noised, and speaker-diarized data. RESULTS The model showed separation between healthy controls and depressed patients at the first visit, obtaining an AUC of 0.71. Further, speech from patients in remission was indistinguishable from that of the control group. Model predictions were stable throughout the interview, suggesting that 20-30 s of speech might be enough to accurately screen a patient. Background noise (but not speaker diarization) heavily impacted predictions. CONCLUSION A generalizable speech emotion recognition model can effectively reveal changes in speaker depressive states before and after remission in patients with MDD. Data collection settings and data cleaning are crucial when considering automated voice analysis for clinical purposes.
Collapse
|
35
|
Facial and Vocal Markers of Schizophrenia Measured Using Remote Smartphone Assessments: Observational Study. JMIR Form Res 2022; 6:e26276. [PMID: 35060906 PMCID: PMC8817208 DOI: 10.2196/26276] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 03/02/2021] [Accepted: 11/22/2021] [Indexed: 12/24/2022] Open
Abstract
Background Machine learning–based facial and vocal measurements have demonstrated relationships with schizophrenia diagnosis and severity. Demonstrating utility and validity of remote and automated assessments conducted outside of controlled experimental or clinical settings can facilitate scaling such measurement tools to aid in risk assessment and tracking of treatment response in populations that are difficult to engage. Objective This study aimed to determine the accuracy of machine learning–based facial and vocal measurements acquired through automated assessments conducted remotely through smartphones. Methods Measurements of facial and vocal characteristics including facial expressivity, vocal acoustics, and speech prevalence were assessed in 20 patients with schizophrenia over the course of 2 weeks in response to two classes of prompts previously utilized in experimental laboratory assessments: evoked prompts, where subjects are guided to produce specific facial expressions and speech; and spontaneous prompts, where subjects are presented stimuli in the form of emotionally evocative imagery and asked to freely respond. Facial and vocal measurements were assessed in relation to schizophrenia symptom severity using the Positive and Negative Syndrome Scale. Results Vocal markers including speech prevalence, vocal jitter, fundamental frequency, and vocal intensity demonstrated specificity as markers of negative symptom severity, while measurement of facial expressivity demonstrated itself as a robust marker of overall schizophrenia symptom severity. Conclusions Established facial and vocal measurements, collected remotely in schizophrenia patients via smartphones in response to automated task prompts, demonstrated accuracy as markers of schizophrenia symptom severity. Clinical implications are discussed.
Collapse
|
36
|
Toward a cumulative science of vocal markers of autism: A cross-linguistic meta-analysis-based investigation of acoustic markers in American and Danish autistic children. Autism Res 2021; 15:653-664. [PMID: 34957701 DOI: 10.1002/aur.2661] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 11/11/2021] [Accepted: 12/13/2021] [Indexed: 12/21/2022]
Abstract
Acoustic atypicalities in speech production are argued to be potential markers of clinical features in autism spectrum disorder (ASD). A recent meta-analysis highlighted shortcomings in the field, in particular small sample sizes and study heterogeneity. We showcase a cumulative (i.e., explicitly building on previous studies both conceptually and statistically) yet self-correcting (i.e., critically assessing the impact of cumulative statistical techniques) approach to prosody in ASD to overcome these issues. We relied on the recommendations contained in the meta-analysis to build and analyze a cross-linguistic corpus of multiple speech productions in 77 autistic and 72 neurotypical children and adolescents (>1000 recordings in Danish and US English). We used meta-analytically informed and skeptical priors, with informed priors leading to more generalizable inference. We replicated findings of a minimal cross-linguistically reliable distinctive acoustic profile for ASD (higher pitch and longer pauses) with moderate effect sizes. We identified novel reliable differences between the two groups for normalized amplitude quotient, maxima dispersion quotient, and creakiness. However, the differences were small, and there is likely no one acoustic profile characterizing all autistic individuals. We identified reliable relations of acoustic features with individual differences (age, gender), and clinical features (speech rate and ADOS sub-scores). Besides cumulatively building our understanding of acoustic atypicalities in ASD, the study shows how to use systematic reviews and meta-analyses to guide the design and analysis of follow-up studies. We indicate future directions: larger and more diverse cross-linguistic datasets, focus on heterogeneity, self-critical cumulative approaches, and open science. LAY SUMMARY: Autistic individuals are reported to speak in distinctive ways. Distinctive vocal production can affect social interactions and social development and could represent a noninvasive way to support the assessment of autism spectrum disorder (ASD). We systematically checked whether acoustic atypicalities highlighted in previous articles could be actually found across multiple recordings and two languages. We find a minimal acoustic profile of ASD: higher pitch, longer pauses, increased hoarseness and creakiness of the voice. However, there is much individual variability (by age, sex, language, and clinical characteristics). This suggests that the search for one common "autistic voice" might be naive and more fine-grained approaches are needed.
Collapse
|
37
|
Prosodic deficits and interpersonal difficulties in patients with schizophrenia. Psychiatry Res 2021; 306:114244. [PMID: 34673310 DOI: 10.1016/j.psychres.2021.114244] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 10/05/2021] [Accepted: 10/10/2021] [Indexed: 10/20/2022]
Abstract
The present study examines the use of receptive emotional and linguistic prosody in patients with schizophrenia; particularly, its aim was to evaluate the type and number of errors made when comprehending the emotions and modes implied by meaningless utterances. Seventy-eight participants were enrolled to the study, i.e. two groups (patients with schizophrenia and healthy controls) consisting of 39 subjects. The severity of illness was evaluated with the Positive and Negative Syndrome Scale; comprehension of emotional and linguistic prosody was assessed by the subtests of the Polish Version of the Right Hemisphere Language Battery. Neither emotional nor linguistic prosody comprehension both correlated with schizophrenia symptoms. The study group experienced more difficulties in distinguishing between happiness and anger, and were more likely to misunderstand imperative utterances, confusing them with interrogative or affirmative ones. Such impairments are significant as they may affect the ability to form and sustain relationships with other people, achieve success in the work environment, and integrate in the community. They may also be a trait mark of the illness independent of psychotic symptoms. Further research is needed to translate this knowledge into meaningful and therapeutic interventions to improve quality of life, both for affected individuals and for their communication partners.
Collapse
|
38
|
Distinctive prosodic features of people with autism spectrum disorder: a systematic review and meta-analysis study. Sci Rep 2021; 11:23093. [PMID: 34845298 PMCID: PMC8630064 DOI: 10.1038/s41598-021-02487-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 11/16/2021] [Indexed: 12/26/2022] Open
Abstract
In this systematic review, we analyzed and evaluated the findings of studies on prosodic features of vocal productions of people with autism spectrum disorder (ASD) in order to recognize the statistically significant, most confirmed and reliable prosodic differences distinguishing people with ASD from typically developing individuals. Using suitable keywords, three major databases including Web of Science, PubMed and Scopus, were searched. The results for prosodic features such as mean pitch, pitch range and variability, speech rate, intensity and voice duration were extracted from eligible studies. The pooled standard mean difference between ASD and control groups was extracted or calculated. Using I2 statistic and Cochrane Q-test, between-study heterogeneity was evaluated. Furthermore, publication bias was assessed using funnel plot and its significance was evaluated using Egger's and Begg's tests. Thirty-nine eligible studies were retrieved (including 910 and 850 participants for ASD and control groups, respectively). This systematic review and meta-analysis showed that ASD group members had a significantly larger mean pitch (SMD = - 0.4, 95% CI [- 0.70, - 0.10]), larger pitch range (SMD = - 0.78, 95% CI [- 1.34, - 0.21]), longer voice duration (SMD = - 0.43, 95% CI [- 0.72, - 0.15]), and larger pitch variability (SMD = - 0.46, 95% CI [- 0.84, - 0.08]), compared with typically developing control group. However, no significant differences in pitch standard deviation, voice intensity and speech rate were found between groups. Chronological age of participants and voice elicitation tasks were two sources of between-study heterogeneity. Furthermore, no publication bias was observed during analyses (p > 0.05). Mean pitch, pitch range, pitch variability and voice duration were recognized as the prosodic features reliably distinguishing people with ASD from TD individuals.
Collapse
|
39
|
Multimodal assessment of communicative-pragmatic features in schizophrenia: a machine learning approach. NPJ SCHIZOPHRENIA 2021; 7:28. [PMID: 34031425 PMCID: PMC8144364 DOI: 10.1038/s41537-021-00153-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 03/18/2021] [Indexed: 02/04/2023]
Abstract
An impairment in pragmatic communication is a core feature of schizophrenia, often associated with difficulties in social interactions. The pragmatic deficits regard various pragmatic phenomena, e.g., direct and indirect communicative acts, deceit, irony, and include not only the use of language but also other expressive means such as non-verbal/extralinguistic modalities, e.g., gestures and body movements, and paralinguistic cues, e.g., prosody and tone of voice. The present paper focuses on the identification of those pragmatic features, i.e., communicative phenomena and expressive modalities, that more reliably discriminate between individuals with schizophrenia and healthy controls. We performed a multimodal assessment of communicative-pragmatic ability, and applied a machine learning approach, specifically a Decision Tree model, with the aim of identifying the pragmatic features that best separate the data into the two groups, i.e., individuals with schizophrenia and healthy controls, and represent their configuration. The results indicated good overall performance of the Decision Tree model, with mean Accuracy of 82%, Sensitivity of 76%, and Precision of 91%. Linguistic irony emerged as the most relevant pragmatic phenomenon in distinguishing between the two groups, followed by violation of the Gricean maxims, and then extralinguistic deceitful and sincere communicative acts. The results are discussed in light of the pragmatic theoretical literature, and their clinical relevance in terms of content and design of both assessment and rehabilitative training.
Collapse
|
40
|
Natural language processing methods are sensitive to sub-clinical linguistic differences in schizophrenia spectrum disorders. NPJ SCHIZOPHRENIA 2021; 7:25. [PMID: 33990615 PMCID: PMC8121795 DOI: 10.1038/s41537-021-00154-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 03/26/2021] [Indexed: 01/11/2023]
Abstract
Computerized natural language processing (NLP) allows for objective and sensitive detection of speech disturbance, a hallmark of schizophrenia spectrum disorders (SSD). We explored several methods for characterizing speech changes in SSD (n = 20) compared to healthy control (HC) participants (n = 11) and approached linguistic phenotyping on three levels: individual words, parts-of-speech (POS), and sentence-level coherence. NLP features were compared with a clinical gold standard, the Scale for the Assessment of Thought, Language and Communication (TLC). We utilized Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art embedding algorithm incorporating bidirectional context. Through the POS approach, we found that SSD used more pronouns but fewer adverbs, adjectives, and determiners (e.g., “the,” “a,”). Analysis of individual word usage was notable for more frequent use of first-person singular pronouns among individuals with SSD and first-person plural pronouns among HC. There was a striking increase in incomplete words among SSD. Sentence-level analysis using BERT reflected increased tangentiality among SSD with greater sentence embedding distances. The SSD sample had low speech disturbance on average and there was no difference in group means for TLC scores. However, NLP measures of language disturbance appear to be sensitive to these subclinical differences and showed greater ability to discriminate between HC and SSD than a model based on clinical ratings alone. These intriguing exploratory results from a small sample prompt further inquiry into NLP methods for characterizing language disturbance in SSD and suggest that NLP measures may yield clinically relevant and informative biomarkers.
Collapse
|
41
|
Understanding communicative intentions in schizophrenia using an error analysis approach. NPJ SCHIZOPHRENIA 2021; 7:12. [PMID: 33637736 PMCID: PMC7910544 DOI: 10.1038/s41537-021-00142-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 01/12/2021] [Indexed: 01/31/2023]
Abstract
Patients with schizophrenia (SCZ) have a core impairment in the communicative-pragmatic domain, characterized by severe difficulties in correctly inferring the speaker's communicative intentions. While several studies have investigated pragmatic performance of patients with SCZ, little research has analyzed the errors committed in the comprehension of different communicative acts. The present research investigated error patterns in 24 patients with SCZ and 24 healthy controls (HC) during a task assessing the comprehension of different communicative acts, i.e., sincere, deceitful and ironic, and their relationship with the clinical features of SCZ. We used signal detection analysis to quantify participants' ability to correctly detect the speakers' communicative intention, i.e., sensitivity, and their tendency to wrongly perceive a communicative intention when not present, i.e., response bias. Further, we investigated the relationship between sensitivity and response bias, and the clinical features of the disorder, namely symptom severity, pharmacotherapy, and personal and social functioning. The results showed that the ability to infer the speaker's communicative intention is impaired in SCZ, as patients exhibited lower sensitivity, compared to HC, for all the pragmatic phenomena evaluated, i.e., sincere, deceitful, and ironic communicative acts. Further, we found that the sensitivity measure for irony was related to disorganized/concrete symptoms. Moreover, patients with SCZ showed a stronger response bias for deceitful communicative acts compared to HC: when committing errors, they tended to misattribute deceitful intentions more often than sincere and ironic ones. This tendency to misattribute deceitful communicative intentions may be related to the attributional bias characterizing the disorder.
Collapse
|
42
|
Computer Vision-Based Assessment of Motor Functioning in Schizophrenia: Use of Smartphones for Remote Measurement of Schizophrenia Symptomatology. Digit Biomark 2021; 5:29-36. [PMID: 33615120 DOI: 10.1159/000512383] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 10/14/2020] [Indexed: 11/19/2022] Open
Abstract
Introduction Motor abnormalities have been shown to be a distinct component of schizophrenia symptomatology. However, objective and scalable methods for assessment of motor functioning in schizophrenia are lacking. Advancements in machine learning-based digital tools have allowed for automated and remote "digital phenotyping" of disease symptomatology. Here, we assess the performance of a computer vision-based assessment of motor functioning as a characteristic of schizophrenia using video data collected remotely through smartphones. Methods Eighteen patients with schizophrenia and 9 healthy controls were asked to remotely participate in smartphone-based assessments daily for 14 days. Video recorded from the smartphone front-facing camera during these assessments was used to quantify the Euclidean distance of head movement between frames through a pretrained computer vision model. The ability of head movement measurements to distinguish between patients and healthy controls as well as their relationship to schizophrenia symptom severity as measured through traditional clinical scores was assessed. Results The rate of head movement in participants with schizophrenia (1.48 mm/frame) and those without differed significantly (2.50 mm/frame; p = 0.01), and a logistic regression demonstrated that head movement was a significant predictor of schizophrenia diagnosis (p = 0.02). Linear regression between head movement and clinical scores of schizophrenia showed that head movement has a negative relationship with schizophrenia symptom severity (p = 0.04), primarily with negative symptoms of schizophrenia. Conclusions Remote, smartphone-based assessments were able to capture meaningful visual behavior for computer vision-based objective measurement of head movement. The measurements of head movement acquired were able to accurately classify schizophrenia diagnosis and quantify symptom severity in patients with schizophrenia.
Collapse
|
43
|
Understanding Language Abnormalities and Associated Clinical Markers in Psychosis: The Promise of Computational Methods. Schizophr Bull 2020; 47:344-362. [PMID: 33205155 PMCID: PMC8480175 DOI: 10.1093/schbul/sbaa141] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The language and speech of individuals with psychosis reflect their impairments in cognition and motor processes. These language disturbances can be used to identify individuals with and at high risk for psychosis, as well as help track and predict symptom progression, allowing for early intervention and improved outcomes. However, current methods of language assessment-manual annotations and/or clinical rating scales-are time intensive, expensive, subject to bias, and difficult to administer on a wide scale, limiting this area from reaching its full potential. Computational methods that can automatically perform linguistic analysis have started to be applied to this problem and could drastically improve our ability to use linguistic information clinically. In this article, we first review how these automated, computational methods work and how they have been applied to the field of psychosis. We show that across domains, these methods have captured differences between individuals with psychosis and healthy controls and can classify individuals with high accuracies, demonstrating the promise of these methods. We then consider the obstacles that need to be overcome before these methods can play a significant role in the clinical process and provide suggestions for how the field should address them. In particular, while much of the work thus far has focused on demonstrating the successes of these methods, we argue that a better understanding of when and why these models fail will be crucial toward ensuring these methods reach their potential in the field of psychosis.
Collapse
|
44
|
Using machine learning of computerized vocal expression to measure blunted vocal affect and alogia. NPJ SCHIZOPHRENIA 2020; 6:26. [PMID: 32978400 PMCID: PMC7519104 DOI: 10.1038/s41537-020-00115-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 08/06/2020] [Indexed: 11/16/2022]
Abstract
Negative symptoms are a transdiagnostic feature of serious mental illness (SMI) that can be potentially “digitally phenotyped” using objective vocal analysis. In prior studies, vocal measures show low convergence with clinical ratings, potentially because analysis has used small, constrained acoustic feature sets. We sought to evaluate (1) whether clinically rated blunted vocal affect (BvA)/alogia could be accurately modelled using machine learning (ML) with a large feature set from two separate tasks (i.e., a 20-s “picture” and a 60-s “free-recall” task), (2) whether “Predicted” BvA/alogia (computed from the ML model) are associated with demographics, diagnosis, psychiatric symptoms, and cognitive/social functioning, and (3) which key vocal features are central to BvA/Alogia ratings. Accuracy was high (>90%) and was improved when computed separately by speaking task. ML scores were associated with poor cognitive performance and social functioning and were higher in patients with schizophrenia versus depression or mania diagnoses. However, the features identified as most predictive of BvA/Alogia were generally not considered critical to their operational definitions. Implications for validating and implementing digital phenotyping to reduce SMI burden are discussed.
Collapse
|
45
|
Language disturbances in schizophrenia: the relation with antipsychotic medication. NPJ SCHIZOPHRENIA 2020; 6:24. [PMID: 32895389 PMCID: PMC7477551 DOI: 10.1038/s41537-020-00114-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 06/23/2020] [Indexed: 12/12/2022]
Abstract
Language disturbances are key aberrations in schizophrenia. Little is known about the influence of antipsychotic medication on these symptoms. Using computational language methods, this study evaluated the impact of high versus low dopamine D2 receptor (D2R) occupancy antipsychotics on language disturbances in 41 patients with schizophrenia, relative to 40 healthy controls. Patients with high versus low D2R occupancy antipsychotics differed by total number of words and type-token ratio, suggesting medication effects. Both patient groups differed from the healthy controls on percentage of time speaking and clauses per utterance, suggesting illness effects. Overall, more severe negative language disturbances (i.e. slower articulation rate, increased pausing, and shorter utterances) were seen in the patients that used high D2R occupancy antipsychotics, while less prominent disturbances were seen in low D2R occupancy patients. Language analyses successfully predicted drug type (sensitivity = 80.0%, specificity = 76.5%). Several language disturbances were more related to drug type and dose, than to other psychotic symptoms, suggesting that language disturbances may be aggravated by high D2R antipsychotics. This negative impact of high D2R occupancy drugs may have clinical implications, as impaired language production predicts functional outcome and degrades the quality of life.
Collapse
|
46
|
Pragmatics, Theory of Mind and executive functions in schizophrenia: Disentangling the puzzle using machine learning. PLoS One 2020; 15:e0229603. [PMID: 32126068 PMCID: PMC7053733 DOI: 10.1371/journal.pone.0229603] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 02/10/2020] [Indexed: 12/13/2022] Open
Abstract
OBJECTIVE Schizophrenia is associated with a severe impairment in the communicative-pragmatic domain. Recent research has tried to disentangle the relationship between communicative impairment and other domains usually impaired in schizophrenia, i.e. Theory of Mind (ToM) and cognitive functions. However, the results are inconclusive and this relationship is still unclear. Machine learning (ML) provides novel opportunities for studying complex relationships among phenomena and representing causality among multiple variables. The present research explored the potential of applying ML, specifically Bayesian network (BNs) analysis, to characterize the relationship between cognitive, ToM and pragmatic abilities in individuals with schizophrenia and healthy controls, and to identify the cognitive and pragmatic abilities that are most informative in discriminating between schizophrenia and controls. METHODS We provided a comprehensive assessment of different aspects of pragmatic performance, i.e. linguistic, extralinguistic, paralinguistic, contextual and conversational, ToM and cognitive functions, i.e. Executive Functions (EF)-selective attention, planning, inhibition, cognitive flexibility, working memory and speed processing-and general intelligence, in a sample of 32 individuals with schizophrenia and 35 controls. RESULTS The results showed that the BNs classifier discriminated well between patients with schizophrenia and healthy controls. The network structure revealed that only pragmatic Linguistic ability directly influenced the classification of patients and controls, while diagnosis determined performance on ToM, Extralinguistic, Paralinguistic, Selective Attention, Planning, Inhibition and Cognitive Flexibility tasks. The model identified pragmatic, ToM and cognitive abilities as three distinct domains independent of one another. CONCLUSION Taken together, our results confirmed the importance of considering pragmatic linguistic impairment as a core dysfunction in schizophrenia, and demonstrated the potential of applying BNs in investigating the relationship between pragmatic ability and cognition.
Collapse
|
47
|
Automated assessment of psychiatric disorders using speech: A systematic review. Laryngoscope Investig Otolaryngol 2020; 5:96-116. [PMID: 32128436 PMCID: PMC7042657 DOI: 10.1002/lio2.354] [Citation(s) in RCA: 124] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 12/31/2019] [Accepted: 01/17/2020] [Indexed: 12/31/2022] Open
Abstract
OBJECTIVE There are many barriers to accessing mental health assessments including cost and stigma. Even when individuals receive professional care, assessments are intermittent and may be limited partly due to the episodic nature of psychiatric symptoms. Therefore, machine-learning technology using speech samples obtained in the clinic or remotely could one day be a biomarker to improve diagnosis and treatment. To date, reviews have only focused on using acoustic features from speech to detect depression and schizophrenia. Here, we present the first systematic review of studies using speech for automated assessments across a broader range of psychiatric disorders. METHODS We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines. We included studies from the last 10 years using speech to identify the presence or severity of disorders within the Diagnostic and Statistical Manual of Mental Disorders (DSM-5). For each study, we describe sample size, clinical evaluation method, speech-eliciting tasks, machine learning methodology, performance, and other relevant findings. RESULTS 1395 studies were screened of which 127 studies met the inclusion criteria. The majority of studies were on depression, schizophrenia, and bipolar disorder, and the remaining on post-traumatic stress disorder, anxiety disorders, and eating disorders. 63% of studies built machine learning predictive models, and the remaining 37% performed null-hypothesis testing only. We provide an online database with our search results and synthesize how acoustic features appear in each disorder. CONCLUSION Speech processing technology could aid mental health assessments, but there are many obstacles to overcome, especially the need for comprehensive transdiagnostic and longitudinal studies. Given the diverse types of data sets, feature extraction, computational methodologies, and evaluation criteria, we provide guidelines for both acquiring data and building machine learning models with a focus on testing hypotheses, open science, reproducibility, and generalizability. LEVEL OF EVIDENCE 3a.
Collapse
|