1
|
Abstract
BACKGROUND Smartphones can facilitate patients completing surveys and collecting sensor data to gain insight into their mental health conditions. However, the utility of sensor data is still being explored. Prior studies have reported a wide range of correlations between passive data and survey scores. AIMS To explore correlations in a large data-set collected with the mindLAMP app. Additionally, we explored whether passive data features could be used in models to predict survey results. METHOD Participants were asked to complete daily and weekly mental health surveys. After screening for data quality, our sample included 147 college student participants and 270 weeks of data. We examined correlations between six weekly surveys and 13 metrics derived from passive data features. Finally, we trained logistic regression models to predict survey scores from passive data with and without daily surveys. RESULTS Similar to other large studies, our correlations were lower than prior reports from smaller studies. We found that the most useful features came from GPS, call, and sleep duration data. Logistic regression models performed poorly with only passive data, but when daily survey scores were included, performance greatly increased. CONCLUSIONS Although passive data alone may not provide enough information to predict survey scores, augmenting this data with short daily surveys can improve performance. Therefore, it may be that passive data can be used to refine survey score predictions and clinical utility may be derived from the combination of active and passive data.
Collapse
Affiliation(s)
- Danielle Currey
- Division of Digital Psychiatry, Beth Israel Deaconess Medical Center, Harvard Medical School, Massachusetts, USA
| | - John Torous
- Division of Digital Psychiatry, Beth Israel Deaconess Medical Center, Harvard Medical School, Massachusetts, USA
| |
Collapse
|
2
|
Singkul S, Woraratpanya K. Vector learning representation for generalized speech emotion recognition. Heliyon 2022; 8:e09196. [PMID: 35846479 PMCID: PMC9280549 DOI: 10.1016/j.heliyon.2022.e09196] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/25/2021] [Accepted: 03/22/2022] [Indexed: 11/19/2022] Open
Abstract
A verify-to-classify framework was designed for achieving in generalization and overall performance. An implemented verify-to-classify framework can work well in both verification (in-domain) and recognition (out-domain). Our softmax with Lo5 can work well with emotion vectors and help improve classification performance.
Speech emotion recognition (SER) plays an important role in global business today to improve service efficiency. In the literature of SER, many techniques have been using deep learning to extract and learn features. Recently, we have proposed end-to-end learning for a deep residual local feature learning block (DeepResLFLB). The advantages of end-to-end learning are low engineering effort and less hyperparameter tuning. Nevertheless, this learning method is easily to fall into an overfitting problem. Therefore, this paper described the concept of the “verify-to-classify” framework to apply for learning vectors, extracted from feature spaces of emotional information. This framework consists of two important portions: speech emotion learning and recognition. In speech emotion learning, consisting of two steps: speech emotion verification enrolled training and prediction, the residual learning (ResNet) with squeeze-excitation (SE) block was used as a core component of both steps to extract emotional state vectors and build an emotion model by the speech emotion verification enrolled training. Then the in-domain pre-trained weights of the emotion trained model are transferred to the prediction step. As a result of the speech emotion learning, the accepted model—validated by EER—is transferred to the speech emotion recognition in terms of out-domain pre-trained weights, which are ready for classification using a classical ML method. In this manner, a suitable loss function is important to work with emotional vectors. Here, two loss functions were proposed: angular prototypical and softmax with angular prototypical losses. Based on two publicly available datasets: Emo-DB and RAVDESS, both with high- and low-quality environments. The experimental results show that our proposed method can significantly improve generalized performance and explainable emotion results, when evaluated by standard metrics: EER, accuracy, precision, recall, and F1-score.
Collapse
|
3
|
Yamada Y, Shinkawa K, Nemoto M, Arai T. Automatic Assessment of Loneliness in Older Adults Using Speech Analysis on Responses to Daily Life Questions. Front Psychiatry 2021; 12:712251. [PMID: 34966297 PMCID: PMC8710612 DOI: 10.3389/fpsyt.2021.712251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 11/19/2021] [Indexed: 11/13/2022] Open
Abstract
Loneliness is a perceived state of social and emotional isolation that has been associated with a wide range of adverse health effects in older adults. Automatically assessing loneliness by passively monitoring daily behaviors could potentially contribute to early detection and intervention for mitigating loneliness. Speech data has been successfully used for inferring changes in emotional states and mental health conditions, but its association with loneliness in older adults remains unexplored. In this study, we developed a tablet-based application and collected speech responses of 57 older adults to daily life questions regarding, for example, one's feelings and future travel plans. From audio data of these speech responses, we automatically extracted speech features characterizing acoustic, prosodic, and linguistic aspects, and investigated their associations with self-rated scores of the UCLA Loneliness Scale. Consequently, we found that with increasing loneliness scores, speech responses tended to have less inflections, longer pauses, reduced second formant frequencies, reduced variances of the speech spectrum, more filler words, and fewer positive words. The cross-validation results showed that regression and binary-classification models using speech features could estimate loneliness scores with an R 2 of 0.57 and detect individuals with high loneliness scores with 95.6% accuracy, respectively. Our study provides the first empirical results suggesting the possibility of using speech data that can be collected in everyday life for the automatic assessments of loneliness in older adults, which could help develop monitoring technologies for early detection and intervention for mitigating loneliness.
Collapse
Affiliation(s)
| | | | - Miyuki Nemoto
- Dementia Medical Center, University of Tsukuba Hospital, Tsukuba, Japan
| | - Tetsuaki Arai
- Division of Clinical Medicine, Department of Psychiatry, Faculty of Medicine, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|
4
|
Farrús M, Codina-Filbà J, Escudero J. Acoustic and prosodic information for home monitoring of bipolar disorder. Health Informatics J 2021; 27:1460458220972755. [PMID: 33438502 DOI: 10.1177/1460458220972755] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Epidemiological studies suggest that bipolar disorder has a prevalence of about 1% in European countries, becoming one of the most disabling illnesses in working age adults, and often long-term and persistent with complex management and treatment. Therefore, the capacity of home monitoring for patients with this disorder is crucial for their quality of life. The current paper introduces the use of speech-based information as an easy-to-record, ubiquitous and non-intrusive health sensor suitable for home monitoring, and its application in the framework on the NYMPHA-MD project. Some preliminary results also show the potential of acoustic and prosodic features to detect and classify bipolar disorder, by predicting the values of the Hamilton Depression Rating Scale (HDRS) and the Young Mania Rating Scale (YMRS) from speech.
Collapse
|
5
|
Weiner L, Guidi A, Doignon-Camus N, Giersch A, Bertschy G, Vanello N. Vocal features obtained through automated methods in verbal fluency tasks can aid the identification of mixed episodes in bipolar disorder. Transl Psychiatry 2021; 11:415. [PMID: 34341338 PMCID: PMC8329226 DOI: 10.1038/s41398-021-01535-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 07/05/2021] [Accepted: 07/26/2021] [Indexed: 02/07/2023] Open
Abstract
There is a lack of consensus on the diagnostic thresholds that could improve the detection accuracy of bipolar mixed episodes in clinical settings. Some studies have shown that voice features could be reliable biomarkers of manic and depressive episodes compared to euthymic states, but none thus far have investigated whether they could aid the distinction between mixed and non-mixed acute bipolar episodes. Here we investigated whether vocal features acquired via verbal fluency tasks could accurately classify mixed states in bipolar disorder using machine learning methods. Fifty-six patients with bipolar disorder were recruited during an acute episode (19 hypomanic, 8 mixed hypomanic, 17 with mixed depression, 12 with depression). Nine different trials belonging to four conditions of verbal fluency tasks-letter, semantic, free word generation, and associational fluency-were administered. Spectral and prosodic features in three conditions were selected for the classification algorithm. Using the leave-one-subject-out (LOSO) strategy to train the classifier, we calculated the accuracy rate, the F1 score, and the Matthews correlation coefficient (MCC). For depression versus mixed depression, the accuracy and F1 scores were high, i.e., respectively 0.83 and 0.86, and the MCC was of 0.64. For hypomania versus mixed hypomania, accuracy and F1 scores were also high, i.e., 0.86 and 0.75, respectively, and the MCC was of 0.57. Given the high rates of correctly classified subjects, vocal features quickly acquired via verbal fluency tasks seem to be reliable biomarkers that could be easily implemented in clinical settings to improve diagnostic accuracy.
Collapse
Affiliation(s)
- Luisa Weiner
- INSERM 1114, Strasbourg, France. .,University Hospital of Strasbourg, Strasbourg, France. .,Laboratoire de Psychologie des Cognitions, Université de Strasbourg, Strasbourg, France.
| | - Andrea Guidi
- grid.5395.a0000 0004 1757 3729Dipartimento di Ingegneria dell’Informazione, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy ,grid.5395.a0000 0004 1757 3729Research Center “E. Piaggio”, University of Pisa, Largo L, Lazzarino 1, 56122 Pisa, Italy
| | | | - Anne Giersch
- grid.7429.80000000121866389INSERM 1114, Strasbourg, France
| | - Gilles Bertschy
- grid.7429.80000000121866389INSERM 1114, Strasbourg, France ,grid.412220.70000 0001 2177 138XUniversity Hospital of Strasbourg, Strasbourg, France ,grid.11843.3f0000 0001 2157 9291Fédération de Médecine Translationnelle de Strasbourg, Université de Strasbourg, Strasbourg, France
| | - Nicola Vanello
- grid.5395.a0000 0004 1757 3729Dipartimento di Ingegneria dell’Informazione, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy ,grid.5395.a0000 0004 1757 3729Research Center “E. Piaggio”, University of Pisa, Largo L, Lazzarino 1, 56122 Pisa, Italy
| |
Collapse
|
6
|
Towards a model of arousal change after affective word pronunciation based on electrodermal activity and speech analysis. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102517] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
7
|
Di Matteo D, Fotinos K, Lokuge S, Yu J, Sternat T, Katzman MA, Rose J. The Relationship Between Smartphone-Recorded Environmental Audio and Symptomatology of Anxiety and Depression: Exploratory Study. JMIR Form Res 2020; 4:e18751. [PMID: 32788153 PMCID: PMC7453326 DOI: 10.2196/18751] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 06/17/2020] [Accepted: 07/07/2020] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Objective and continuous severity measures of anxiety and depression are highly valuable and would have many applications in psychiatry and psychology. A collective source of data for objective measures are the sensors in a person's smartphone, and a particularly rich source is the microphone that can be used to sample the audio environment. This may give broad insight into activity, sleep, and social interaction, which may be associated with quality of life and severity of anxiety and depression. OBJECTIVE This study aimed to explore the properties of passively recorded environmental audio from a subject's smartphone to find potential correlates of symptom severity of social anxiety disorder, generalized anxiety disorder, depression, and general impairment. METHODS An Android app was designed, together with a centralized server system, to collect periodic measurements of the volume of sounds in the environment and to detect the presence or absence of English-speaking voices. Subjects were recruited into a 2-week observational study during which the app was run on their personal smartphone to collect audio data. Subjects also completed self-report severity measures of social anxiety disorder, generalized anxiety disorder, depression, and functional impairment. Participants were 112 Canadian adults from a nonclinical population. High-level features were extracted from the environmental audio of 84 participants with sufficient data, and correlations were measured between the 4 audio features and the 4 self-report measures. RESULTS The regularity in daily patterns of activity and inactivity inferred from the environmental audio volume was correlated with the severity of depression (r=-0.37; P<.001). A measure of sleep disturbance inferred from the environmental audio volume was also correlated with the severity of depression (r=0.23; P=.03). A proxy measure of social interaction based on the detection of speaking voices in the environmental audio was correlated with depression (r=-0.37; P<.001) and functional impairment (r=-0.29; P=.01). None of the 4 environmental audio-based features tested showed significant correlations with the measures of generalized anxiety or social anxiety. CONCLUSIONS In this study group, the environmental audio was shown to contain signals that were associated with the severity of depression and functional impairment. Associations with the severity of social anxiety disorder and generalized anxiety disorder were much weaker in comparison and not statistically significant at the 5% significance level. This work also confirmed previous work showing that the presence of voices is associated with depression. Furthermore, this study suggests that sparsely sampled audio volume could provide potentially relevant insight into subjects' mental health.
Collapse
Affiliation(s)
- Daniel Di Matteo
- The Centre for Automation of Medicine, The Edward S Rogers Sr Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Kathryn Fotinos
- Stress Trauma Anxiety Rehabilitation Treatment Clinic for Mood and Anxiety Disorders, Toronto, ON, Canada
| | - Sachinthya Lokuge
- Stress Trauma Anxiety Rehabilitation Treatment Clinic for Mood and Anxiety Disorders, Toronto, ON, Canada
| | - Julia Yu
- Stress Trauma Anxiety Rehabilitation Treatment Clinic for Mood and Anxiety Disorders, Toronto, ON, Canada
| | - Tia Sternat
- Stress Trauma Anxiety Rehabilitation Treatment Clinic for Mood and Anxiety Disorders, Toronto, ON, Canada
- Department of Psychology, Adler Graduate Professional School, Toronto, ON, Canada
| | - Martin A Katzman
- Stress Trauma Anxiety Rehabilitation Treatment Clinic for Mood and Anxiety Disorders, Toronto, ON, Canada
- Department of Psychology, Adler Graduate Professional School, Toronto, ON, Canada
- Department of Psychology, Lakehead University, Thunder Bay, ON, Canada
- The Northern Ontario School of Medicine, Thunder Bay, ON, Canada
| | - Jonathan Rose
- The Centre for Automation of Medicine, The Edward S Rogers Sr Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
8
|
Voleti R, Liss JM, Berisha V. A Review of Automated Speech and Language Features for Assessment of Cognitive and Thought Disorders. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 2020; 14:282-298. [PMID: 33907590 PMCID: PMC8074691 DOI: 10.1109/jstsp.2019.2952087] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
It is widely accepted that information derived from analyzing speech (the acoustic signal) and language production (words and sentences) serves as a useful window into the health of an individual's cognitive ability. In fact, most neuropsychological testing batteries have a component related to speech and language where clinicians elicit speech from patients for subjective evaluation across a broad set of dimensions. With advances in speech signal processing and natural language processing, there has been recent interest in developing tools to detect more subtle changes in cognitive-linguistic function. This work relies on extracting a set of features from recorded and transcribed speech for objective assessments of speech and language, early diagnosis of neurological disease, and tracking of disease after diagnosis. With an emphasis on cognitive and thought disorders, in this paper we provide a review of existing speech and language features used in this domain, discuss their clinical application, and highlight their advantages and disadvantages. Broadly speaking, the review is split into two categories: language features based on natural language processing and speech features based on speech signal processing. Within each category, we consider features that aim to measure complementary dimensions of cognitive-linguistics, including language diversity, syntactic complexity, semantic coherence, and timing. We conclude the review with a proposal of new research directions to further advance the field.
Collapse
Affiliation(s)
- Rohit Voleti
- School of Electrical, Computer, & Energy Engineering, Arizona State University, Tempe, AZ, 85281 USA
| | | | | |
Collapse
|
9
|
Greco A, Marzi C, Lanata A, Scilingo EP, Vanello N. Combining Electrodermal Activity and Speech Analysis towards a more Accurate Emotion Recognition System. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2019:229-232. [PMID: 31945884 DOI: 10.1109/embc.2019.8857745] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Current research in the emotion recognition field is exploring the possibility of merging the information from physiological signals, behavioural data, and speech. Electrodermal activity (EDA) is amongst the main psychophysiological arousal indicators. Nonetheless, it is quite difficult to be analyzed in ecological scenarios, like, for instance, when the subject is speaking. On the other hand, speech carries relevant information of subject emotional state and its potential in the field of affective computing is still to be fully exploited. In this work, we aim at exploring the possibility of merging the information from electrodermal activity (EDA) and speech to improve the recognition of human arousal level during the pronunciation of single affective words. Unlike the majority of studies in the literature, we focus on speakers' arousal rather than the emotion conveyed by the spoken word. Specifically, a support vector machine with recursive feature elimination strategy (SVM-RFE) is trained and tested on three datasets, i.e using the two channels (i.e., speech and EDA) separately and then jointly. The results show that the merging of EDA and speech information significantly improves the marginal classifier (+11.64%). The six selected features by the RFE procedure will be used for the development of a future multivariate model of emotions.
Collapse
|
10
|
Guidi A, Gentili C, Scilingo E, Vanello N. Analysis of speech features and personality traits. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2019.01.027] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
11
|
Zhao J, Mao X, Chen L. Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed Signal Process Control 2019. [DOI: 10.1016/j.bspc.2018.08.035] [Citation(s) in RCA: 194] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
12
|
Rohani DA, Faurholt-Jepsen M, Kessing LV, Bardram JE. Correlations Between Objective Behavioral Features Collected From Mobile and Wearable Devices and Depressive Mood Symptoms in Patients With Affective Disorders: Systematic Review. JMIR Mhealth Uhealth 2018; 6:e165. [PMID: 30104184 PMCID: PMC6111148 DOI: 10.2196/mhealth.9691] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Revised: 05/13/2018] [Accepted: 06/18/2018] [Indexed: 12/14/2022] Open
Abstract
Background Several studies have recently reported on the correlation between objective behavioral features collected via mobile and wearable devices and depressive mood symptoms in patients with affective disorders (unipolar and bipolar disorders). However, individual studies have reported on different and sometimes contradicting results, and no quantitative systematic review of the correlation between objective behavioral features and depressive mood symptoms has been published. Objective The objectives of this systematic review were to (1) provide an overview of the correlations between objective behavioral features and depressive mood symptoms reported in the literature and (2) investigate the strength and statistical significance of these correlations across studies. The answers to these questions could potentially help identify which objective features have shown most promising results across studies. Methods We conducted a systematic review of the scientific literature, reported according to the preferred reporting items for systematic reviews and meta-analyses guidelines. IEEE Xplore, ACM Digital Library, Web of Sciences, PsychINFO, PubMed, DBLP computer science bibliography, HTA, DARE, Scopus, and Science Direct were searched and supplemented by hand examination of reference lists. The search ended on April 27, 2017, and was limited to studies published between 2007 and 2017. Results A total of 46 studies were eligible for the review. These studies identified and investigated 85 unique objective behavioral features, covering 17 various sensor data inputs. These features were divided into 7 categories. Several features were found to have statistically significant and consistent correlation directionality with mood assessment (eg, the amount of home stay, sleep duration, and vigorous activity), while others showed directionality discrepancies across the studies (eg, amount of text messages [short message service] sent, time spent between locations, and frequency of mobile phone screen activity). Conclusions Several studies showed consistent and statistically significant correlations between objective behavioral features collected via mobile and wearable devices and depressive mood symptoms. Hence, continuous and everyday monitoring of behavioral aspects in affective disorders could be a promising supplementary objective measure for estimating depressive mood symptoms. However, the evidence is limited by methodological issues in individual studies and by a lack of standardization of (1) the collected objective features, (2) the mood assessment methodology, and (3) the statistical methods applied. Therefore, consistency in data collection and analysis in future studies is needed, making replication studies as well as meta-analyses possible.
Collapse
Affiliation(s)
- Darius A Rohani
- Embedded Systems Engineering, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark.,Copenhagen Center for Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Maria Faurholt-Jepsen
- Copenhagen Affective Disorder Research Centre, Psychiatric Centre Copenhagen, Rigshospitalet, Copenhagen, Denmark
| | - Lars Vedel Kessing
- Copenhagen Affective Disorder Research Centre, Psychiatric Centre Copenhagen, Rigshospitalet, Copenhagen, Denmark.,Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jakob E Bardram
- Embedded Systems Engineering, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark.,Copenhagen Center for Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
13
|
Zhang J, Pan Z, Gui C, Xue T, Lin Y, Zhu J, Cui D. Analysis on speech signal features of manic patients. J Psychiatr Res 2018; 98:59-63. [PMID: 29291581 DOI: 10.1016/j.jpsychires.2017.12.012] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2017] [Revised: 12/15/2017] [Accepted: 12/18/2017] [Indexed: 10/18/2022]
Abstract
Given the lack of effective biological markers for early diagnosis of bipolar mania, and the tendency for voice fluctuation during transition between mood states, this study aimed to investigate the speech features of manic patients to identify a potential set of biomarkers for diagnosis of bipolar mania. 30 manic patients and 30 healthy controls were recruited and their corresponding speech features were collected during natural dialogue using the Automatic Voice Collecting System. Bech-Rafaelsdn Mania Rating Scale (BRMS) and Clinical impression rating scale (CGI) were used to assess illness. The speech features were compared between two groups: mood group (mania vs remission) and bipolar group (manic patients vs healthy individuals). We found that the characteristic speech signals differed between mood groups and bipolar groups. The fourth formant (F4) and Linear Prediction Coefficient (LPC) (P < .05) were significantly differed when patients transmitted from manic to remission state. The first formant (F1), the second formant (F2), and LPC (P < .05) also played key roles in distinguishing between patients and healthy individuals. In addition, there was a significantly correlation between LPC and BRMS, indicating that LPC may play an important role in diagnosis of bipolar mania. In this study we traced speech features of bipolar mania during natural dialogue (conversation), which is an accessible approach in clinic practice. Such specific indicators may respectively serve as promising biomarkers for benefiting the diagnosis and clinical therapeutic evaluation of bipolar mania.
Collapse
Affiliation(s)
- Jing Zhang
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai Jiading Mental Health Center, Shanghai, China
| | - Zhongde Pan
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Chao Gui
- Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Ting Xue
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yezhe Lin
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jie Zhu
- Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Donghong Cui
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Brain Science and Technology Research Center, Shanghai Jiao Tong University, China.
| |
Collapse
|
14
|
Guidi A, Schoentgen J, Bertschy G, Gentili C, Scilingo E, Vanello N. Features of vocal frequency contour and speech rhythm in bipolar disorder. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2017.01.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
15
|
Kandsberger J, Rogers SN, Zhou Y, Humphris G. Using fundamental frequency of cancer survivors' speech to investigate emotional distress in out-patient visits. PATIENT EDUCATION AND COUNSELING 2016; 99:1971-1977. [PMID: 27506580 DOI: 10.1016/j.pec.2016.08.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 07/15/2016] [Accepted: 08/02/2016] [Indexed: 06/06/2023]
Abstract
OBJECTIVE Emotions, are in part conveyed by varying levels of fundamental frequency of voice pitch (f0). This study tests the hypothesis that patients display heightened levels of emotional arousal (f0) during Verona Coding Definitions of Emotional Sequences (VR-CoDES) cues and concerns versus during neutral statements. METHODS The audio recordings of sixteen head and neck cancer survivors' follow-up consultations were coded for patients' emotional distress. Pitch (f0) of coded cues and concerns, including neutral statements was extracted. These were compared using a hierarchical linear model, nested for patient and pitch range, controlling for statement speech length. Utterance content was also explored. RESULTS Clustering by patient explained 30% of the variance in utterances f0. Cues and concerns were on average 13.07Hz higher than neutral statements (p=0.02). Cues and concerns in these consultations contained content with a high proportion of recurrence fears. CONCLUSION The present study highlights the benefits and challenges of adding f0 and potential other prosodic features to the toolkit of coding emotional distress in the health communication setting. PRACTICE IMPLICATIONS The assessment of f0 during clinical conversations can provide additional information for research into emotional expression.
Collapse
Affiliation(s)
| | - Simon N Rogers
- Merseyside Regional Head & Neck Cancer Centre, Aintree Hospital, Liverpool, L9 7AL, UK
| | - Yuefang Zhou
- Medical School, University of St. Andrews, KY16 9TF, UK
| | - Gerry Humphris
- Medical School, University of St. Andrews, KY16 9TF, UK; Edinburgh Cancer Centre, Western General Hospital, EH4 2XU, UK.
| |
Collapse
|
16
|
A Wearable System for the Evaluation of the Human-Horse Interaction: A Preliminary Study. ELECTRONICS 2016. [DOI: 10.3390/electronics5040063] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
17
|
Guidi A, Salvi S, Ottaviano M, Gentili C, Bertschy G, de Rossi D, Scilingo EP, Vanello N. Smartphone Application for the Analysis of Prosodic Features in Running Speech with a Focus on Bipolar Disorders: System Performance Evaluation and Case Study. SENSORS 2015; 15:28070-87. [PMID: 26561811 PMCID: PMC4701269 DOI: 10.3390/s151128070] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Revised: 09/26/2015] [Accepted: 10/26/2015] [Indexed: 11/16/2022]
Abstract
Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented.
Collapse
Affiliation(s)
- Andrea Guidi
- Dipartimento di Ingegneria dell'Informazione, University of Pisa, Via G. Caruso 16, Pisa 56122, Italy.
- Research Center "E. Piaggio", University of Pisa, Largo L. Lazzarino 1, Pisa 56122, Italy.
| | - Sergio Salvi
- Life Supporting Technologies, Universidad Politécnica de Madrid , Avd. Complutense 30, Madrid 28040, Spain.
| | - Manuel Ottaviano
- Life Supporting Technologies, Universidad Politécnica de Madrid , Avd. Complutense 30, Madrid 28040, Spain.
| | - Claudio Gentili
- Department of Surgical, Medical, Molecular Pathology and Critical Care, University of Pisa, Via Savi 10, Pisa 56126, Italy.
- Department of General Psychology, University of Padua, Via Venezia 8, Padua 35131, Italy.
| | - Gilles Bertschy
- Department of Psychiatry and Mental Health, Strasbourg University Hospital, INSERM U1114, Translational Medicine Federation, University of Strasbourg, Strasbourg 67000, France.
| | - Danilo de Rossi
- Dipartimento di Ingegneria dell'Informazione, University of Pisa, Via G. Caruso 16, Pisa 56122, Italy.
- Research Center "E. Piaggio", University of Pisa, Largo L. Lazzarino 1, Pisa 56122, Italy.
| | - Enzo Pasquale Scilingo
- Dipartimento di Ingegneria dell'Informazione, University of Pisa, Via G. Caruso 16, Pisa 56122, Italy.
- Research Center "E. Piaggio", University of Pisa, Largo L. Lazzarino 1, Pisa 56122, Italy.
| | - Nicola Vanello
- Dipartimento di Ingegneria dell'Informazione, University of Pisa, Via G. Caruso 16, Pisa 56122, Italy.
- Research Center "E. Piaggio", University of Pisa, Largo L. Lazzarino 1, Pisa 56122, Italy.
| |
Collapse
|