1
|
Neves L, Martins M, Correia AI, Castro SL, Schellenberg EG, Lima CF. Does music training improve emotion recognition and cognitive abilities? Longitudinal and correlational evidence from children. Cognition 2025; 259:106102. [PMID: 40064075 DOI: 10.1016/j.cognition.2025.106102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 01/22/2025] [Accepted: 02/14/2025] [Indexed: 04/09/2025]
Abstract
Music training is widely claimed to enhance nonmusical abilities, yet causal evidence remains inconclusive. Moreover, research tends to focus on cognitive over socioemotional outcomes. In two studies, we investigated whether music training improves emotion recognition in voices and faces among school-aged children. We also examined music-training effects on musical abilities, motor skills (fine and gross), broader socioemotional functioning, and cognitive abilities including nonverbal reasoning, executive functions, and auditory memory (short-term and working memory). Study 1 (N = 110) was a 2-year longitudinal intervention conducted in a naturalistic school setting, comparing music training to basketball training (active control) and no training (passive control). Music training improved fine-motor skills and auditory memory relative to controls, but it had no effect on emotion recognition or other cognitive and socioemotional abilities. Both music and basketball training improved gross-motor skills. Study 2 (N = 192) compared children without music training to peers attending a music school. Although music training correlated with better emotion recognition in speech prosody (tone of voice), this association disappeared after controlling for socioeconomic status, musical abilities, or short-term memory. In contrast, musical abilities correlated with emotion recognition in both prosody and faces, independently of training or other confounding variables. These findings suggest that music training enhances fine-motor skills and auditory memory, but it does not causally improve emotion recognition, other cognitive abilities, or socioemotional functioning. Observed advantages in emotion recognition likely stem from preexisting musical abilities and other confounding factors such as socioeconomic status.
Collapse
Affiliation(s)
- Leonor Neves
- Centro de Investigação e Intervenção Social (CIS-IUL), Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal
| | - Marta Martins
- Centro de Investigação e Intervenção Social (CIS-IUL), Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal
| | - Ana Isabel Correia
- Centro de Investigação e Intervenção Social (CIS-IUL), Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal
| | - São Luís Castro
- Centro de Psicologia da Universidade do Porto (CPUP), Faculdade de Psicologia e de Ciências da Educação da Universidade do Porto (FPCEUP), Porto, Portugal
| | - E Glenn Schellenberg
- Centro de Investigação e Intervenção Social (CIS-IUL), Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal; Department of Psychology, University of Toronto Mississauga, Mississauga, Canada
| | - César F Lima
- Centro de Investigação e Intervenção Social (CIS-IUL), Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal.
| |
Collapse
|
2
|
Daunay V, Reby D, Bryant GA, Pisanski K. Production and perception of volitional laughter across social contexts. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:2774-2789. [PMID: 40227885 DOI: 10.1121/10.0036388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 03/23/2025] [Indexed: 04/16/2025]
Abstract
Human nonverbal vocalizations such as laughter communicate emotion, motivation, and intent during social interactions. While differences between spontaneous and volitional laughs have been described, little is known about the communicative functions of volitional (voluntary) laughter-a complex signal used across diverse social contexts. Here, we examined whether the acoustic structure of volitional laughter encodes social contextual information recognizable by humans and computers. We asked men and women to produce volitional laughs in eight distinct social contexts ranging from positive (e.g., watching a comedy) to negative valence (e.g., embarrassment). Human listeners and machine classification algorithms accurately identified most laughter contexts above chance. However, confusion often arose within valence categories, and could be largely explained by shared acoustics. Although some acoustic features varied across social contexts, including fundamental frequency (perceived as voice pitch) and energy parameters (entropy variance, loudness, spectral centroid, and cepstral peak prominence), which also predicted listeners' recognition of laughter contexts, laughs evoked across different social contexts still often overlapped in acoustic and perceptual space. Thus, we show that volitional laughter can convey some reliable information about social context, but much of this is tied to valence, suggesting that volitional laughter is a graded rather than discrete vocal signal.
Collapse
Affiliation(s)
- Virgile Daunay
- ENES Bioacoustics Research Laboratory, CRNL Center for Research in Neuroscience in Lyon, University of Saint-Étienne, 42023 Saint-Étienne, France
- DDL Dynamics of Language Lab, CNRS French National Centre for Scientific Research, University of Lyon 2, 69363 Lyon, France
| | - David Reby
- ENES Bioacoustics Research Laboratory, CRNL Center for Research in Neuroscience in Lyon, University of Saint-Étienne, 42023 Saint-Étienne, France
| | - Gregory A Bryant
- Department of Communication, Center for Behavior, Evolution, and Culture, University of California, Los Angeles, California 90095, USA
| | - Katarzyna Pisanski
- ENES Bioacoustics Research Laboratory, CRNL Center for Research in Neuroscience in Lyon, University of Saint-Étienne, 42023 Saint-Étienne, France
- DDL Dynamics of Language Lab, CNRS French National Centre for Scientific Research, University of Lyon 2, 69363 Lyon, France
| |
Collapse
|
3
|
Lavan N, Ahmed A, Tyrene Oteng C, Aden M, Nasciemento-Krüger L, Raffiq Z, Mareschal I. Similarities in emotion perception from faces and voices: evidence from emotion sorting tasks. Cogn Emot 2025:1-17. [PMID: 40088052 DOI: 10.1080/02699931.2025.2478478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 12/03/2024] [Accepted: 01/14/2025] [Indexed: 03/17/2025]
Abstract
Emotions are expressed via many features including facial displays, vocal intonation, and touch, and perceivers can often interpret emotional displays across the different modalities with high accuracy. Here, we examine how emotion perception from faces and voices relates to one another, probing individual differences in emotion recognition abilities across visual and auditory modalities. We developed a novel emotion sorting task, in which participants were tasked with freely grouping different stimuli into perceived emotional categories, without requiring pre-defined emotion labels. Participants completed two emotion sorting tasks, one using silent videos of facial expressions, the other with audio recordings of vocal expressions. We furthermore manipulated the emotional intensity, contrasting more subtle, lower intensity vs higher intensity emotion portrayals. We find that participants' performance on the emotion sorting task was similar for face and voice stimuli. As expected, performance was lower when stimuli were of low emotional intensity. Consistent with previous reports, we find that task performance was positively correlated across the two modalities. Our findings show that emotion perception in the visual and auditory modalities may be underpinned by similar and/or shared processes, highlighting that emotion sorting tasks are powerful paradigms to investigate emotion recognition from voices, cross-modal and multimodal emotion recognition.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Centre for Brain and Behaviour, Queen Mary University of London, London, UK
| | - Aleena Ahmed
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Centre for Brain and Behaviour, Queen Mary University of London, London, UK
| | - Chantelle Tyrene Oteng
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Centre for Brain and Behaviour, Queen Mary University of London, London, UK
| | - Munira Aden
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Centre for Brain and Behaviour, Queen Mary University of London, London, UK
| | - Luisa Nasciemento-Krüger
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Centre for Brain and Behaviour, Queen Mary University of London, London, UK
| | - Zahra Raffiq
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Centre for Brain and Behaviour, Queen Mary University of London, London, UK
| | - Isabelle Mareschal
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Centre for Brain and Behaviour, Queen Mary University of London, London, UK
| |
Collapse
|
4
|
Li M, Li N, Zhou A, Yan H, Li Q, Ma C, Wu C. The Mandarin Chinese auditory emotions stimulus database: A validated corpus of monosyllabic Chinese characters. Behav Res Methods 2025; 57:89. [PMID: 39900840 DOI: 10.3758/s13428-025-02607-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2025] [Indexed: 02/05/2025]
Abstract
Auditory emotional rhythm can be transmitted by simple syllables. This study aimed to establish and validate an auditory speech dataset containing Mandarin Chinese auditory emotional monosyllables (MCAE-Monosyllable), a resource that has not been previously available. A total of 422 Chinese monosyllables were recorded by six professional Mandarin actors, each expressing seven emotions: neutral, happy, angry, sad, fearful, disgusted, and surprised. Additionally, each neutral voice was recorded in four Chinese tones. After standardization and energy balance, the recordings were evaluated by 720 Chinese college students for emotional categories (forced to choose one out of seven emotions) and emotional intensity (rated on a scale of 1-9). The final dataset consists of 18,089 valid Chinese monosyllabic pronunciations (neutrality: 9425, sadness: 2453, anger: 2024; surprise: 1699, disgust: 1624, happiness: 590, fear: 274). On average, neutrality had the highest accuracy rate (79%), followed by anger (75%) and sadness (75%), surprise (74%), happiness (73%), disgust (72%), and finally fear (67%). We provided detailed validation results, acoustic information, and perceptual intensity rating values for each sound. The MCAE-Monosyllable database serves as a valuable resource for neural decoding of Chinese emotional speech, cross-cultural language research, and behavioral or clinical studies related to language and emotional disorders. The database can be obtained within the Open Science Framework ( https://osf.io/h3uem/?view_only=047dfd08dbb64ad0882410da340aa271 ).
Collapse
Affiliation(s)
- Mengyuan Li
- School of Nursing, Peking University Health Science Center, Haidian District, Room 510, 38 Xueyuan Road, Beijing, 100191, China
| | - Na Li
- Theatre Pedagogy Department, Central Academy of Drama, Beijing, 100710, China
| | - Anqi Zhou
- School of Nursing, Peking University Health Science Center, Haidian District, Room 510, 38 Xueyuan Road, Beijing, 100191, China
| | - Huiru Yan
- School of Nursing, Peking University Health Science Center, Haidian District, Room 510, 38 Xueyuan Road, Beijing, 100191, China
| | - Qiuhong Li
- School of Nursing, Peking University Health Science Center, Haidian District, Room 510, 38 Xueyuan Road, Beijing, 100191, China
| | - Chifen Ma
- School of Nursing, Peking University Health Science Center, Haidian District, Room 510, 38 Xueyuan Road, Beijing, 100191, China
| | - Chao Wu
- School of Nursing, Peking University Health Science Center, Haidian District, Room 510, 38 Xueyuan Road, Beijing, 100191, China.
| |
Collapse
|
5
|
Temudo S, Pinheiro AP. What Is Faster than Where in Vocal Emotional Perception. J Cogn Neurosci 2025; 37:239-265. [PMID: 39348115 DOI: 10.1162/jocn_a_02251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Voices carry a vast amount of information about speakers (e.g., emotional state; spatial location). Neuroimaging studies postulate that spatial ("where") and emotional ("what") cues are processed by partially independent processing streams. Although behavioral evidence reveals interactions between emotion and space, the temporal dynamics of these processes in the brain and its modulation by attention remain unknown. We investigated whether and how spatial and emotional features interact during voice processing as a function of attention focus. Spatialized nonverbal vocalizations differing in valence (neutral, amusement, anger) were presented at different locations around the head, whereas listeners discriminated either the spatial location or emotional quality of the voice. Neural activity was measured with ERPs of the EEG. Affective ratings were collected at the end of the EEG session. Emotional vocalizations elicited decreased N1 but increased P2 and late positive potential amplitudes. Interactions of space and emotion occurred at the salience detection stage: neutral vocalizations presented at right (vs. left) locations elicited increased P2 amplitudes, but no such differences were observed for emotional vocalizations. When task instructions involved emotion categorization, the P2 was increased for vocalizations presented at front (vs. back) locations. Behaviorally, only valence and arousal ratings showed emotion-space interactions. These findings suggest that emotional representations are activated earlier than spatial representations in voice processing. The perceptual prioritization of emotional cues occurred irrespective of task instructions but was not paralleled by an augmented stimulus representation in space. These findings support the differential responding to emotional information by auditory processing pathways.
Collapse
|
6
|
Jiang Z, Long Y, Zhang X, Liu Y, Bai X. CNEV: A corpus of Chinese nonverbal emotional vocalizations with a database of emotion category, valence, arousal, and gender. Behav Res Methods 2025; 57:62. [PMID: 39838181 DOI: 10.3758/s13428-024-02595-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/18/2024] [Indexed: 01/23/2025]
Abstract
Nonverbal emotional vocalizations play a crucial role in conveying emotions during human interactions. Validated corpora of these vocalizations have facilitated emotion-related research and found wide-ranging applications. However, existing corpora have lacked representation from diverse cultural backgrounds, which may limit the generalizability of the resulting theories. The present paper introduces the Chinese Nonverbal Emotional Vocalization (CNEV) corpus, the first nonverbal emotional vocalization corpus recorded and validated entirely by Mandarin speakers from China. The CNEV corpus contains 2415 vocalizations across five emotion categories: happiness, sadness, fear, anger, and neutrality. It also includes a database containing subjective evaluation data on emotion category, valence, arousal, and speaker gender, as well as the acoustic features of the vocalizations. Key conclusions drawn from statistical analyses of perceptual evaluations and acoustic analysis include the following: (1) the CNEV corpus exhibits adequate reliability and high validity; (2) perceptual evaluations reveal a tendency for individuals to associate anger with male voices and fear with female voices; (3) acoustic analysis indicates that males are more effective at expressing anger, while females excel in expressing fear; and (4) the observed perceptual patterns align with the acoustic analysis results, suggesting that the perceptual differences may stem not only from the subjective factors of perceivers but also from objective expressive differences in the vocalizations themselves. For academic research purposes, the CNEV corpus and database are freely available for download at https://osf.io/6gy4v/ .
Collapse
Affiliation(s)
- Zhongqing Jiang
- College of Psychology, Liaoning Normal University, No. 850 Huanghe Road, Dalian, 116029, Liaoning, China.
| | - Yanling Long
- College of Psychology, Liaoning Normal University, No. 850 Huanghe Road, Dalian, 116029, Liaoning, China
| | - Xi'e Zhang
- Xianyang Senior High School of Shaanxi Province, Xianyang, China
| | - Yangtao Liu
- College of Psychology, Liaoning Normal University, No. 850 Huanghe Road, Dalian, 116029, Liaoning, China
| | - Xue Bai
- College of Psychology, Liaoning Normal University, No. 850 Huanghe Road, Dalian, 116029, Liaoning, China
| |
Collapse
|
7
|
Prada M, Guedes D, Garrido MV, Saraiva M. Normative ratings for the Kitchen and Food Sounds (KFS) database. Behav Res Methods 2024; 56:6967-6980. [PMID: 38548995 PMCID: PMC11362198 DOI: 10.3758/s13428-024-02402-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/12/2024] [Indexed: 08/30/2024]
Abstract
Sounds are important sensory cues for food perception and acceptance. We developed and validated a large-scale database of kitchen and food sounds (180 stimuli) capturing different stages of preparing, cooking, serving, and/or consuming foods and beverages and sounds of packaging, kitchen utensils, and appliances. Each sound was evaluated across nine subjective evaluative dimensions (random order), including stimuli-related properties (e.g., valence, arousal) and food-related items (e.g., healthfulness, appetizingness) by a subsample of 51 to 64 participants (Mdn = 54; N = 332; 69.6% women, Mage = 27.46 years, SD = 10.20). Participants also identified each sound and rated how confident they were in such identification. Results show that, overall, participants could correctly identify the sound or at least recognize the general sound categories. The stimuli of the KFS database varied across different levels (low, moderate, high) of the evaluative dimensions under analysis, indicating good adequacy to a broad range of research purposes. The correlation analysis showed a high degree of association between evaluative dimensions. The sociodemographic characteristics of the sample had a limited influence on the stimuli evaluation. Still, some aspects related to food and cooking were associated with how the sounds are evaluated, suggesting that participants' proficiency in the kitchen should be considered when planning studies with food sounds. Given its broad range of stimulus categories and evaluative dimensions, the KFS database (freely available at OSF ) is suitable for different research domains, from fundamental (e.g., cognitive psychology, basic sensory science) to more applied research (e.g., marketing, consumer science).
Collapse
Affiliation(s)
- Marília Prada
- Iscte - Instituto Universitário de Lisboa, Av. das Forças Armadas, 1649-026, Lisboa, Portugal.
| | - David Guedes
- Iscte - Instituto Universitário de Lisboa, Av. das Forças Armadas, 1649-026, Lisboa, Portugal
| | - Margarida Vaz Garrido
- Iscte - Instituto Universitário de Lisboa, Av. das Forças Armadas, 1649-026, Lisboa, Portugal
| | - Magda Saraiva
- William James Center for Research, Ispa-Instituto Universitário, Lisboa, Portugal
| |
Collapse
|
8
|
Diemerling H, Stresemann L, Braun T, von Oertzen T. Implementing machine learning techniques for continuous emotion prediction from uniformly segmented voice recordings. Front Psychol 2024; 15:1300996. [PMID: 38572198 PMCID: PMC10987695 DOI: 10.3389/fpsyg.2024.1300996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 02/09/2024] [Indexed: 04/05/2024] Open
Abstract
Introduction Emotional recognition from audio recordings is a rapidly advancing field, with significant implications for artificial intelligence and human-computer interaction. This study introduces a novel method for detecting emotions from short, 1.5 s audio samples, aiming to improve accuracy and efficiency in emotion recognition technologies. Methods We utilized 1,510 unique audio samples from two databases in German and English to train our models. We extracted various features for emotion prediction, employing Deep Neural Networks (DNN) for general feature analysis, Convolutional Neural Networks (CNN) for spectrogram analysis, and a hybrid model combining both approaches (C-DNN). The study addressed challenges associated with dataset heterogeneity, language differences, and the complexities of audio sample trimming. Results Our models demonstrated accuracy significantly surpassing random guessing, aligning closely with human evaluative benchmarks. This indicates the effectiveness of our approach in recognizing emotional states from brief audio clips. Discussion Despite the challenges of integrating diverse datasets and managing short audio samples, our findings suggest considerable potential for this methodology in real-time emotion detection from continuous speech. This could contribute to improving the emotional intelligence of AI and its applications in various areas.
Collapse
Affiliation(s)
- Hannes Diemerling
- Center for Lifespan Psychology, Max Planck Institute for Human Development, Berlin, Germany
- Thomas Bayes Institute, Berlin, Germany
- Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany
- Department of Psychology, University of the Bundeswehr München, Neubiberg, Germany
| | - Leonie Stresemann
- Department of Psychology, University of the Bundeswehr München, Neubiberg, Germany
| | - Tina Braun
- Department of Psychology, University of the Bundeswehr München, Neubiberg, Germany
- Department of Psychology, Charlotte-Fresenius University, Wiesbaden, Germany
| | - Timo von Oertzen
- Center for Lifespan Psychology, Max Planck Institute for Human Development, Berlin, Germany
- Thomas Bayes Institute, Berlin, Germany
| |
Collapse
|
9
|
Lingelbach K, Vukelić M, Rieger JW. GAUDIE: Development, validation, and exploration of a naturalistic German AUDItory Emotional database. Behav Res Methods 2024; 56:2049-2063. [PMID: 37221343 PMCID: PMC10991051 DOI: 10.3758/s13428-023-02135-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2023] [Indexed: 05/25/2023]
Abstract
Since thoroughly validated naturalistic affective German speech stimulus databases are rare, we present here a novel validated database of speech sequences assembled with the purpose of emotion induction. The database comprises 37 audio speech sequences with a total duration of 92 minutes for the induction of positive, neutral, and negative emotion: comedian shows intending to elicit humorous and amusing feelings, weather forecasts, and arguments between couples and relatives from movies or television series. Multiple continuous and discrete ratings are used to validate the database to capture the time course and variabilities of valence and arousal. We analyse and quantify how well the audio sequences fulfil quality criteria of differentiation, salience/strength, and generalizability across participants. Hence, we provide a validated speech database of naturalistic scenarios suitable to investigate emotion processing and its time course with German-speaking participants. Information on using the stimulus database for research purposes can be found at the OSF project repository GAUDIE: https://osf.io/xyr6j/ .
Collapse
Affiliation(s)
- Katharina Lingelbach
- Fraunhofer Institute for Industrial Engineering IAO, Nobelstraße 12, 70569, Stuttgart, Germany.
- Department of Psychology, University of Oldenburg, Oldenburg, Germany.
| | - Mathias Vukelić
- Fraunhofer Institute for Industrial Engineering IAO, Nobelstraße 12, 70569, Stuttgart, Germany
| | - Jochem W Rieger
- Department of Psychology, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
10
|
Larrouy-Maestri P, Poeppel D, Pell MD. The Sound of Emotional Prosody: Nearly 3 Decades of Research and Future Directions. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2024:17456916231217722. [PMID: 38232303 DOI: 10.1177/17456916231217722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Emotional voices attract considerable attention. A search on any browser using "emotional prosody" as a key phrase leads to more than a million entries. Such interest is evident in the scientific literature as well; readers are reminded in the introductory paragraphs of countless articles of the great importance of prosody and that listeners easily infer the emotional state of speakers through acoustic information. However, despite decades of research on this topic and important achievements, the mapping between acoustics and emotional states is still unclear. In this article, we chart the rich literature on emotional prosody for both newcomers to the field and researchers seeking updates. We also summarize problems revealed by a sample of the literature of the last decades and propose concrete research directions for addressing them, ultimately to satisfy the need for more mechanistic knowledge of emotional prosody.
Collapse
Affiliation(s)
- Pauline Larrouy-Maestri
- Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
- School of Communication Sciences and Disorders, McGill University
- Max Planck-NYU Center for Language, Music, and Emotion, New York, New York
| | - David Poeppel
- Max Planck-NYU Center for Language, Music, and Emotion, New York, New York
- Department of Psychology and Center for Neural Science, New York University
- Ernst Strüngmann Institute for Neuroscience, Frankfurt, Germany
| | - Marc D Pell
- School of Communication Sciences and Disorders, McGill University
- Centre for Research on Brain, Language, and Music, Montreal, Quebec, Canada
| |
Collapse
|
11
|
Anikin A, Canessa-Pollard V, Pisanski K, Massenet M, Reby D. Beyond speech: Exploring diversity in the human voice. iScience 2023; 26:108204. [PMID: 37908309 PMCID: PMC10613903 DOI: 10.1016/j.isci.2023.108204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/20/2023] [Accepted: 10/11/2023] [Indexed: 11/02/2023] Open
Abstract
Humans have evolved voluntary control over vocal production for speaking and singing, while preserving the phylogenetically older system of spontaneous nonverbal vocalizations such as laughs and screams. To test for systematic acoustic differences between these vocal domains, we analyzed a broad, cross-cultural corpus representing over 2 h of speech, singing, and nonverbal vocalizations. We show that, while speech is relatively low-pitched and tonal with mostly regular phonation, singing and especially nonverbal vocalizations vary enormously in pitch and often display harsh-sounding, irregular phonation owing to nonlinear phenomena. The evolution of complex supralaryngeal articulatory spectro-temporal modulation has been critical for speech, yet has not significantly constrained laryngeal source modulation. In contrast, articulation is very limited in nonverbal vocalizations, which predominantly contain minimally articulated open vowels and rapid temporal modulation in the roughness range. We infer that vocal source modulation works best for conveying affect, while vocal filter modulation mainly facilitates semantic communication.
Collapse
Affiliation(s)
- Andrey Anikin
- Division of Cognitive Science, Lund University, Lund, Sweden
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, 23 rue Michelon, 42023 Saint-Etienne, France
| | - Valentina Canessa-Pollard
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, 23 rue Michelon, 42023 Saint-Etienne, France
- Psychology, Institute of Psychology, Business and Human Sciences, University of Chichester, Chichester, West Sussex PO19 6PE, UK
| | - Katarzyna Pisanski
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, 23 rue Michelon, 42023 Saint-Etienne, France
- CNRS French National Centre for Scientific Research, DDL Dynamics of Language Lab, University of Lyon 2, 69007 Lyon, France
- Institute of Psychology, University of Wrocław, Dawida 1, 50-527 Wrocław, Poland
| | - Mathilde Massenet
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, 23 rue Michelon, 42023 Saint-Etienne, France
| | - David Reby
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, 23 rue Michelon, 42023 Saint-Etienne, France
| |
Collapse
|
12
|
Johnson KT, Narain J, Quatieri T, Maes P, Picard RW. ReCANVo: A database of real-world communicative and affective nonverbal vocalizations. Sci Data 2023; 10:523. [PMID: 37543663 PMCID: PMC10404278 DOI: 10.1038/s41597-023-02405-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 07/24/2023] [Indexed: 08/07/2023] Open
Abstract
Nonverbal vocalizations, such as sighs, grunts, and yells, are informative expressions within typical verbal speech. Likewise, individuals who produce 0-10 spoken words or word approximations ("minimally speaking" individuals) convey rich affective and communicative information through nonverbal vocalizations even without verbal speech. Yet, despite their rich content, little to no data exists on the vocal expressions of this population. Here, we present ReCANVo: Real-World Communicative and Affective Nonverbal Vocalizations - a novel dataset of non-speech vocalizations labeled by function from minimally speaking individuals. The ReCANVo database contains over 7000 vocalizations spanning communicative and affective functions from eight minimally speaking individuals, along with communication profiles for each participant. Vocalizations were recorded in real-world settings and labeled in real-time by a close family member who knew the communicator well and had access to contextual information while labeling. ReCANVo is a novel database of nonverbal vocalizations from minimally speaking individuals, the largest available dataset of nonverbal vocalizations, and one of the only affective speech datasets collected amidst daily life across contexts.
Collapse
Affiliation(s)
- Kristina T Johnson
- Massachusetts Institute of Technology, MIT Media Lab, Cambridge, MA, USA.
| | - Jaya Narain
- Massachusetts Institute of Technology, MIT Media Lab, Cambridge, MA, USA.
| | - Thomas Quatieri
- Massachusetts Institute of Technology, Lincoln Laboratory, Lexington, MA, USA
| | - Pattie Maes
- Massachusetts Institute of Technology, MIT Media Lab, Cambridge, MA, USA
| | - Rosalind W Picard
- Massachusetts Institute of Technology, MIT Media Lab, Cambridge, MA, USA
| |
Collapse
|
13
|
Soma CS, Wampold B, Flemotomos N, Peri R, Narayanan S, Atkins DC, Imel ZE. The Silent Treatment?: Changes in patient emotional expression after silence. COUNSELLING & PSYCHOTHERAPY RESEARCH 2023; 23:378-388. [PMID: 37457038 PMCID: PMC10348709 DOI: 10.1002/capr.12537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 04/12/2022] [Indexed: 11/08/2022]
Abstract
Psychotherapy can be an emotionally laden conversation, where both verbal and non-verbal interventions may impact the therapeutic process. Prior research has postulated mixed results in how clients emotionally react following a silence after the therapist is finished talking, potentially due to studying a limited range of silences with primarily qualitative and self-report methodologies. A quantitative exploration may illuminate new findings. Utilizing research and automatic data processing from the field of linguistics, we analysed the full range of silence lengths (0.2 to 24.01 seconds), and measures of emotional expression - vocally encoded arousal and emotional valence from the works spoken - of 84 audio recordings Motivational Interviewing sessions. We hypothesized that both the level and the variance of client emotional expression would change as a function of silence length, however, due to the mixed results in the literature the direction of emotional change was unclear. We conducted a multilevel linear regression to examine how the level of client emotional expression changed across silence length, and an ANOVA to examine the variability of client emotional expression across silence lengths. Results indicated in both analyses that as silence length increased, emotional expression largely remained the same. Broadly, we demonstrated a weak connection between silence length and emotional expression, indicating no persuasive evidence that silence leads to client emotional processing and expression.
Collapse
|
14
|
Gong B, Li N, Li Q, Yan X, Chen J, Li L, Wu X, Wu C. The Mandarin Chinese auditory emotions stimulus database: A validated set of Chinese pseudo-sentences. Behav Res Methods 2023; 55:1441-1459. [PMID: 35641682 DOI: 10.3758/s13428-022-01868-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/29/2022] [Indexed: 11/08/2022]
Abstract
Emotional prosody is fully embedded in language and can be influenced by the linguistic properties of a specific language. Considering the limitations of existing Chinese auditory stimulus database studies, we developed and validated an emotional auditory stimuli database composed of Chinese pseudo-sentences, recorded by six professional actors in Mandarin Chinese. Emotional expressions included happiness, sadness, anger, fear, disgust, pleasant surprise, and neutrality. All emotional categories were vocalized into two types of sentence patterns, declarative and interrogative. In addition, all emotional pseudo-sentences, except for neutral, were vocalized at two levels of emotional intensity: normal and strong. Each recording was validated with 40 native Chinese listeners in terms of the recognition accuracy of the intended emotion portrayal; finally, 4361 pseudo-sentence stimuli were included in the database. Validation of the database using a forced-choice recognition paradigm revealed high rates of emotional recognition accuracy. The detailed acoustic attributes of vocalization were provided and connected to the emotion recognition rates. This corpus could be a valuable resource for researchers and clinicians to explore the behavioral and neural mechanisms underlying emotion processing of the general population and emotional disturbances in neurological, psychiatric, and developmental disorders. The Mandarin Chinese auditory emotion stimulus database is available at the Open Science Framework ( https://osf.io/sfbm6/?view_only=e22a521e2a7d44c6b3343e11b88f39e3 ).
Collapse
Affiliation(s)
- Bingyan Gong
- School of Nursing, Peking University Health Science Center, Room 510, 38 Xueyuan Road, Haidian District, Beijing, 100191, China
| | - Na Li
- Theatre Pedagogy Department, Central Academy of Drama, Beijing, 100710, China
| | - Qiuhong Li
- School of Nursing, Peking University Health Science Center, Room 510, 38 Xueyuan Road, Haidian District, Beijing, 100191, China
| | - Xinyuan Yan
- School of Computing, University of Utah, Salt Lake City, UT, USA
| | - Jing Chen
- Department of Machine Intelligence, Peking University, 5 Yiheyuan Road, Haidian District, Beijing, 100871, China
- Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China
| | - Liang Li
- School of Psychological and Cognitive Sciences, Peking University, Beijing, 100871, China
| | - Xihong Wu
- Department of Machine Intelligence, Peking University, 5 Yiheyuan Road, Haidian District, Beijing, 100871, China.
- Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, 100871, China.
| | - Chao Wu
- School of Nursing, Peking University Health Science Center, Room 510, 38 Xueyuan Road, Haidian District, Beijing, 100191, China.
| |
Collapse
|
15
|
Soma CS, Knox D, Greer T, Gunnerson K, Young A, Narayanan S. It's not what you said, it's how you said it: An analysis of therapist vocal features during psychotherapy. COUNSELLING & PSYCHOTHERAPY RESEARCH 2023; 23:258-269. [PMID: 36873916 PMCID: PMC9979575 DOI: 10.1002/capr.12489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/21/2021] [Indexed: 11/07/2022]
Abstract
Psychotherapy is a conversation, whereby, at its foundation, many interventions are derived from the therapist talking. Research suggests that the voice can convey a variety of emotional and social information, and individuals may change their voice based on the context and content of the conversation (e.g., talking to a baby or delivering difficult news to patients with cancer). As such, therapists may adjust aspects of their voice throughout a therapy session depending on if they are beginning a therapy session and checking in with a client, conducting more therapeutic "work," or ending the session. In this study, we modeled three vocal features-pitch, energy, and rate-with linear and quadratic multilevel models to understand how therapists' vocal features change throughout a therapy session. We hypothesized that all three vocal features would be best fit with a quadratic function - starting high and more congruent with a conversational voice, decreasing during the middle portions of therapy where more therapeutic interventions were being administered, and increasing again at the end of the session. Results indicated a quadratic model for all three vocal features was superior in fitting the data, as compared to a linear model, suggesting that therapists begin and end therapy using a different style of voice than in the middle of a session.
Collapse
Affiliation(s)
- Christina S Soma
- Department of Educational Psychology, University of Utah, Salt Lake City, (UT) USA
| | - Dillon Knox
- Viterbi Department of Computer Science, University of Southern California, Los Angeles, (CA,) USA
- Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, (CA,) USA
| | - Timothy Greer
- Viterbi Department of Computer Science, University of Southern California, Los Angeles, (CA,) USA
| | - Keith Gunnerson
- Department of Educational Psychology, University of Utah, Salt Lake City, (UT) USA
| | - Alexander Young
- Viterbi Department of Computer Science, University of Southern California, Los Angeles, (CA,) USA
| | - Shrikanth Narayanan
- Viterbi Department of Computer Science, University of Southern California, Los Angeles, (CA,) USA
| |
Collapse
|
16
|
Grollero D, Petrolini V, Viola M, Morese R, Lettieri G, Cecchetti L. The structure underlying core affect and perceived affective qualities of human vocal bursts. Cogn Emot 2022; 37:1-17. [PMID: 36300588 DOI: 10.1080/02699931.2022.2139661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Vocal bursts are non-linguistic affectively-laden sounds with a crucial function in human communication, yet their affective structure is still debated. Studies showed that ratings of valence and arousal follow a V-shaped relationship in several kinds of stimuli: high arousal ratings are more likely to go on a par with very negative or very positive valence. Across two studies, we asked participants to listen to 1,008 vocal bursts and judge both how they felt when listening to the sound (i.e. core affect condition), and how the speaker felt when producing it (i.e. perception of affective quality condition). We show that a V-shaped fit outperforms a linear model in explaining the valence-arousal relationship across conditions and studies, even after equating the number of exemplars across emotion categories. Also, although subjective experience can be significantly predicted using affective quality ratings, core affect scores are significantly lower in arousal, less extreme in valence, more variable between individuals, and less reproducible between studies. Nonetheless, stimuli rated with opposite valence between conditions range from 11% (study 1) to 17% (study 2). Lastly, we demonstrate that ambiguity in valence (i.e. high between-participants variability) explains violations of the V-shape and relates to higher arousal.
Collapse
Affiliation(s)
- Demetrio Grollero
- Social and Affective Neuroscience (SANe) Group, MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Valentina Petrolini
- Lindy Lab - Language in Neurodiversity, Department of Linguistics and Basque Studies, University of the Basque Country (UPV/EHU), Vitoria-Gasteiz, Spain
| | - Marco Viola
- Department of Philosophy and Education, University of Turin, Turin, Italy
| | - Rosalba Morese
- Faculty of Communication, Culture and Society, Università della Svizzera Italiana, Lugano, Switzerland
- Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland
| | - Giada Lettieri
- Social and Affective Neuroscience (SANe) Group, MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
- Crossmodal Perception and Plasticity Laboratory, IPSY, University of Louvain, Louvain-la-Neuve, Belgium
| | - Luca Cecchetti
- Social and Affective Neuroscience (SANe) Group, MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| |
Collapse
|
17
|
Cosme G, Tavares V, Nobre G, Lima C, Sá R, Rosa P, Prata D. Cultural differences in vocal emotion recognition: a behavioural and skin conductance study in Portugal and Guinea-Bissau. PSYCHOLOGICAL RESEARCH 2022; 86:597-616. [PMID: 33718984 PMCID: PMC8885546 DOI: 10.1007/s00426-021-01498-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 03/02/2021] [Indexed: 11/03/2022]
Abstract
Cross-cultural studies of emotion recognition in nonverbal vocalizations not only support the universality hypothesis for its innate features, but also an in-group advantage for culture-dependent features. Nevertheless, in such studies, differences in socio-economic-educational status have not always been accounted for, with idiomatic translation of emotional concepts being a limitation, and the underlying psychophysiological mechanisms still un-researched. We set out to investigate whether native residents from Guinea-Bissau (West African culture) and Portugal (Western European culture)-matched for socio-economic-educational status, sex and language-varied in behavioural and autonomic system response during emotion recognition of nonverbal vocalizations from Portuguese individuals. Overall, Guinea-Bissauans (as out-group) responded significantly less accurately (corrected p < .05), slower, and showed a trend for higher concomitant skin conductance, compared to Portuguese (as in-group)-findings which may indicate a higher cognitive effort stemming from higher difficulty in discerning emotions from another culture. Specifically, accuracy differences were particularly found for pleasure, amusement, and anger, rather than for sadness, relief or fear. Nevertheless, both cultures recognized all emotions above-chance level. The perceived authenticity, measured for the first time in nonverbal cross-cultural research, in the same vocalizations, retrieved no difference between cultures in accuracy, but still a slower response from the out-group. Lastly, we provide-to our knowledge-a first account of how skin conductance response varies between nonverbally vocalized emotions, with significant differences (p < .05). In sum, we provide behavioural and psychophysiological data, demographically and language-matched, that supports cultural and emotion effects on vocal emotion recognition and perceived authenticity, as well as the universality hypothesis.
Collapse
Affiliation(s)
- Gonçalo Cosme
- Instituto de Biofísica e Engenharia Biomédica, Faculdade de Ciências da Universidade de Lisboa, Campo Grande 016, 1749-016, Lisboa, Portugal
| | - Vânia Tavares
- Instituto de Biofísica e Engenharia Biomédica, Faculdade de Ciências da Universidade de Lisboa, Campo Grande 016, 1749-016, Lisboa, Portugal
- Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| | - Guilherme Nobre
- Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| | - César Lima
- Centro de Investigação e Intervenção Social, Instituto Universitário de Lisboa (ISCTE-IUL), CIS-IUL, Lisboa, Portugal
| | - Rui Sá
- CAPP-Centre for Public Administration & Public Policies, ISCSP, Universidade de Lisboa, Lisboa, Portugal
- Environmental Sciences Department, Universidade Lusófona da Guiné, Bissau, Guinea-Bissau
| | - Pedro Rosa
- Centro de Investigação e Intervenção Social, Instituto Universitário de Lisboa (ISCTE-IUL), CIS-IUL, Lisboa, Portugal
- HEI-LAB: Human-Environment Interaction Lab/Universidade Lusófona de Humanidades e Tecnologias, Lisboa, Portugal
| | - Diana Prata
- Instituto de Biofísica e Engenharia Biomédica, Faculdade de Ciências da Universidade de Lisboa, Campo Grande 016, 1749-016, Lisboa, Portugal.
- Centro de Investigação e Intervenção Social, Instituto Universitário de Lisboa (ISCTE-IUL), CIS-IUL, Lisboa, Portugal.
- Department of Neuroimaging, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
| |
Collapse
|
18
|
Abstract
OBJECTIVE The ability to recognize others' emotions is a central aspect of socioemotional functioning. Emotion recognition impairments are well documented in Alzheimer's disease and other dementias, but it is less understood whether they are also present in mild cognitive impairment (MCI). Results on facial emotion recognition are mixed, and crucially, it remains unclear whether the potential impairments are specific to faces or extend across sensory modalities. METHOD In the current study, 32 MCI patients and 33 cognitively intact controls completed a comprehensive neuropsychological assessment and two forced-choice emotion recognition tasks, including visual and auditory stimuli. The emotion recognition tasks required participants to categorize emotions in facial expressions and in nonverbal vocalizations (e.g., laughter, crying) expressing neutrality, anger, disgust, fear, happiness, pleasure, surprise, or sadness. RESULTS MCI patients performed worse than controls for both facial expressions and vocalizations. The effect was large, similar across tasks and individual emotions, and it was not explained by sensory losses or affective symptomatology. Emotion recognition impairments were more pronounced among patients with lower global cognitive performance, but they did not correlate with the ability to perform activities of daily living. CONCLUSIONS These findings indicate that MCI is associated with emotion recognition difficulties and that such difficulties extend beyond vision, plausibly reflecting a failure at supramodal levels of emotional processing. This highlights the importance of considering emotion recognition abilities as part of standard neuropsychological testing in MCI, and as a target of interventions aimed at improving social cognition in these patients.
Collapse
|
19
|
Pinheiro AP, Anikin A, Conde T, Sarzedas J, Chen S, Scott SK, Lima CF. Emotional authenticity modulates affective and social trait inferences from voices. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200402. [PMID: 34719249 PMCID: PMC8558771 DOI: 10.1098/rstb.2020.0402] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2021] [Indexed: 01/31/2023] Open
Abstract
The human voice is a primary tool for verbal and nonverbal communication. Studies on laughter emphasize a distinction between spontaneous laughter, which reflects a genuinely felt emotion, and volitional laughter, associated with more intentional communicative acts. Listeners can reliably differentiate the two. It remains unclear, however, if they can detect authenticity in other vocalizations, and whether authenticity determines the affective and social impressions that we form about others. Here, 137 participants listened to laughs and cries that could be spontaneous or volitional and rated them on authenticity, valence, arousal, trustworthiness and dominance. Bayesian mixed models indicated that listeners detect authenticity similarly well in laughter and crying. Speakers were also perceived to be more trustworthy, and in a higher arousal state, when their laughs and cries were spontaneous. Moreover, spontaneous laughs were evaluated as more positive than volitional ones, and we found that the same acoustic features predicted perceived authenticity and trustworthiness in laughter: high pitch, spectral variability and less voicing. For crying, associations between acoustic features and ratings were less reliable. These findings indicate that emotional authenticity shapes affective and social trait inferences from voices, and that the ability to detect authenticity in vocalizations is not limited to laughter. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part I)'.
Collapse
Affiliation(s)
- Ana P. Pinheiro
- CICPSI, Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal
| | - Andrey Anikin
- Equipe de Neuro-Ethologie Sensorielle (ENES)/Centre de Recherche em Neurosciences de Lyon (CRNL), University of Lyon/Saint-Etienne, CNRS UMR5292, INSERM UMR_S 1028, 42023 Saint-Etienne, France
- Division of Cognitive Science, Lund University, 221 00 Lund, Sweden
| | - Tatiana Conde
- CICPSI, Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal
| | - João Sarzedas
- CICPSI, Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal
| | - Sinead Chen
- National Taiwan University, Taipei City, 10617 Taiwan
| | - Sophie K. Scott
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK
| | - César F. Lima
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK
- Instituto Universitário de Lisboa (ISCTE-IUL), Avenida das Forças Armadas, 1649-026 Lisboa, Portugal
| |
Collapse
|
20
|
The neural basis of authenticity recognition in laughter and crying. Sci Rep 2021; 11:23750. [PMID: 34887461 PMCID: PMC8660868 DOI: 10.1038/s41598-021-03131-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 11/22/2021] [Indexed: 01/28/2023] Open
Abstract
Deciding whether others' emotions are genuine is essential for successful communication and social relationships. While previous fMRI studies suggested that differentiation between authentic and acted emotional expressions involves higher-order brain areas, the time course of authenticity discrimination is still unknown. To address this gap, we tested the impact of authenticity discrimination on event-related potentials (ERPs) related to emotion, motivational salience, and higher-order cognitive processing (N100, P200 and late positive complex, the LPC), using vocalised non-verbal expressions of sadness (crying) and happiness (laughter) in a 32-participant, within-subject study. Using a repeated measures 2-factor (authenticity, emotion) ANOVA, we show that N100's amplitude was larger in response to authentic than acted vocalisations, particularly in cries, while P200's was larger in response to acted vocalisations, particularly in laughs. We suggest these results point to two different mechanisms: (1) a larger N100 in response to authentic vocalisations is consistent with its link to emotional content and arousal (putatively larger amplitude for genuine emotional expressions); (2) a larger P200 in response to acted ones is in line with evidence relating it to motivational salience (putatively larger for ambiguous emotional expressions). Complementarily, a significant main effect of emotion was found on P200 and LPC amplitudes, in that the two were larger for laughs than cries, regardless of authenticity. Overall, we provide the first electroencephalographic examination of authenticity discrimination and propose that authenticity processing of others' vocalisations is initiated early, along that of their emotional content or category, attesting for its evolutionary relevance for trust and bond formation.
Collapse
|
21
|
Superior Communication of Positive Emotions Through Nonverbal Vocalisations Compared to Speech Prosody. JOURNAL OF NONVERBAL BEHAVIOR 2021; 45:419-454. [PMID: 34744232 PMCID: PMC8553689 DOI: 10.1007/s10919-021-00375-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/22/2021] [Indexed: 11/29/2022]
Abstract
The human voice communicates emotion through two different types of vocalizations: nonverbal vocalizations (brief non-linguistic sounds like laughs) and speech prosody (tone of voice). Research examining recognizability of emotions from the voice has mostly focused on either nonverbal vocalizations or speech prosody, and included few categories of positive emotions. In two preregistered experiments, we compare human listeners’ (total n = 400) recognition performance for 22 positive emotions from nonverbal vocalizations (n = 880) to that from speech prosody (n = 880). The results show that listeners were more accurate in recognizing most positive emotions from nonverbal vocalizations compared to prosodic expressions. Furthermore, acoustic classification experiments with machine learning models demonstrated that positive emotions are expressed with more distinctive acoustic patterns for nonverbal vocalizations as compared to speech prosody. Overall, the results suggest that vocal expressions of positive emotions are communicated more successfully when expressed as nonverbal vocalizations compared to speech prosody.
Collapse
|
22
|
Neves L, Martins M, Correia AI, Castro SL, Lima CF. Associations between vocal emotion recognition and socio-emotional adjustment in children. ROYAL SOCIETY OPEN SCIENCE 2021; 8:211412. [PMID: 34804582 PMCID: PMC8595998 DOI: 10.1098/rsos.211412] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/20/2021] [Indexed: 06/13/2023]
Abstract
The human voice is a primary channel for emotional communication. It is often presumed that being able to recognize vocal emotions is important for everyday socio-emotional functioning, but evidence for this assumption remains scarce. Here, we examined relationships between vocal emotion recognition and socio-emotional adjustment in children. The sample included 141 6- to 8-year-old children, and the emotion tasks required them to categorize five emotions (anger, disgust, fear, happiness, sadness, plus neutrality), as conveyed by two types of vocal emotional cues: speech prosody and non-verbal vocalizations such as laughter. Socio-emotional adjustment was evaluated by the children's teachers using a multidimensional questionnaire of self-regulation and social behaviour. Based on frequentist and Bayesian analyses, we found that, for speech prosody, higher emotion recognition related to better general socio-emotional adjustment. This association remained significant even when the children's cognitive ability, age, sex and parental education were held constant. Follow-up analyses indicated that higher emotional prosody recognition was more robustly related to the socio-emotional dimensions of prosocial behaviour and cognitive and behavioural self-regulation. For emotion recognition in non-verbal vocalizations, no associations with socio-emotional adjustment were found. A similar null result was obtained for an additional task focused on facial emotion recognition. Overall, these results support the close link between children's emotional prosody recognition skills and their everyday social behaviour.
Collapse
Affiliation(s)
- Leonor Neves
- Centro de Investigação e Intervenção Social (CIS-IUL), Instituto Universitário de Lisboa (ISCTE-IUL), Av. das Forças Armadas, 1649-026 Lisboa, Portugal
| | - Marta Martins
- Centro de Investigação e Intervenção Social (CIS-IUL), Instituto Universitário de Lisboa (ISCTE-IUL), Av. das Forças Armadas, 1649-026 Lisboa, Portugal
| | - Ana Isabel Correia
- Centro de Investigação e Intervenção Social (CIS-IUL), Instituto Universitário de Lisboa (ISCTE-IUL), Av. das Forças Armadas, 1649-026 Lisboa, Portugal
| | - São Luís Castro
- Centro de Psicologia da Universidade do Porto (CPUP), Faculdade de Psicologia e de Ciências da Educação da Universidade do Porto (FPCEUP), Porto, Portugal
| | - César F. Lima
- Centro de Investigação e Intervenção Social (CIS-IUL), Instituto Universitário de Lisboa (ISCTE-IUL), Av. das Forças Armadas, 1649-026 Lisboa, Portugal
- Institute of Cognitive Neuroscience, University College London, London, UK
| |
Collapse
|
23
|
Pakarinen S, Lohilahti J, Sokka L, Korpela J, Huotilainen M, Müller K. Auditory deviance detection and involuntary attention allocation in occupational burnout-A follow-up study. Eur J Neurosci 2021; 55:2592-2611. [PMID: 34415092 DOI: 10.1111/ejn.15429] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 07/25/2021] [Accepted: 08/06/2021] [Indexed: 11/30/2022]
Abstract
Here, we investigated the central auditory processing and attentional control associated with both recovery and prolongation of occupational burnout. We recorded the event-related brain potentials N1, P2, mismatch negativity (MMN) and P3a to nine changes in speech sounds and to three rarely presented emotional (happy, angry and sad) utterances from individuals with burnout (N = 16) and their matched controls (N = 12). After the 5 years follow-up, one control had acquired burnout, half (N = 8) of the burnout group had recovered, and the other half (prolonged burnout) still had burnout. The processing of acoustical changes in speech sounds was mainly intact. Prolongation of the burnout was associated with a decrease in MMN amplitude and an increase in P3a amplitude for the happy stimulus. The results suggest that, in the absence of interventions, burnout is a persistent condition, associated with alterations of attentional control, that may be amplified with the prolongation of the condition.
Collapse
Affiliation(s)
- Satu Pakarinen
- Finnish Institute of Occupational Health, Helsinki, Finland
| | | | - Laura Sokka
- Finnish Institute of Occupational Health, Helsinki, Finland
| | - Jussi Korpela
- Finnish Institute of Occupational Health, Helsinki, Finland
| | - Minna Huotilainen
- CICERO Learning, Faculty of Education, and Cognitive Brain Research Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Kiti Müller
- Faculty of Medicine, Department of Neurosciences, University of Helsinki, Helsinki, Finland
| |
Collapse
|
24
|
Lima CF, Arriaga P, Anikin A, Pires AR, Frade S, Neves L, Scott SK. Authentic and posed emotional vocalizations trigger distinct facial responses. Cortex 2021; 141:280-292. [PMID: 34102411 DOI: 10.1016/j.cortex.2021.04.015] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 04/21/2021] [Accepted: 04/27/2021] [Indexed: 11/28/2022]
Abstract
The ability to recognize the emotions of others is a crucial skill. In the visual modality, sensorimotor mechanisms provide an important route for emotion recognition. Perceiving facial expressions often evokes activity in facial muscles and in motor and somatosensory systems, and this activity relates to performance in emotion tasks. It remains unclear whether and how similar mechanisms extend to audition. Here we examined facial electromyographic and electrodermal responses to nonverbal vocalizations that varied in emotional authenticity. Participants (N = 100) passively listened to laughs and cries that could reflect an authentic or a posed emotion. Bayesian mixed models indicated that listening to laughter evoked stronger facial responses than listening to crying. These responses were sensitive to emotional authenticity. Authentic laughs evoked more activity than posed laughs in the zygomaticus and orbicularis, muscles typically associated with positive affect. We also found that activity in the orbicularis and corrugator related to subjective evaluations in a subsequent authenticity perception task. Stronger responses in the orbicularis predicted higher perceived laughter authenticity. Stronger responses in the corrugator, a muscle associated with negative affect, predicted lower perceived laughter authenticity. Moreover, authentic laughs elicited stronger skin conductance responses than posed laughs. This arousal effect did not predict task performance, however. For crying, physiological responses were not associated with authenticity judgments. Altogether, these findings indicate that emotional authenticity affects peripheral nervous system responses to vocalizations. They also point to a role of sensorimotor mechanisms in the evaluation of authenticity in the auditory modality.
Collapse
Affiliation(s)
- César F Lima
- Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal; Institute of Cognitive Neuroscience, University College London, London, UK.
| | | | - Andrey Anikin
- Equipe de Neuro-Ethologie Sensorielle (ENES)/Centre de Recherche en Neurosciences de Lyon (CRNL), University of Lyon/Saint-Etienne, CNRS UMR5292, INSERM UMR_S 1028, Saint-Etienne, France; Division of Cognitive Science, Lund University, Lund, Sweden
| | - Ana Rita Pires
- Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal
| | - Sofia Frade
- Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal
| | - Leonor Neves
- Instituto Universitário de Lisboa (ISCTE-IUL), Lisboa, Portugal
| | - Sophie K Scott
- Institute of Cognitive Neuroscience, University College London, London, UK
| |
Collapse
|
25
|
Holz N, Larrouy-Maestri P, Poeppel D. The paradoxical role of emotional intensity in the perception of vocal affect. Sci Rep 2021; 11:9663. [PMID: 33958630 PMCID: PMC8102532 DOI: 10.1038/s41598-021-88431-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 04/09/2021] [Indexed: 11/08/2022] Open
Abstract
Vocalizations including laughter, cries, moans, or screams constitute a potent source of information about the affective states of others. It is typically conjectured that the higher the intensity of the expressed emotion, the better the classification of affective information. However, attempts to map the relation between affective intensity and inferred meaning are controversial. Based on a newly developed stimulus database of carefully validated non-speech expressions ranging across the entire intensity spectrum from low to peak, we show that the intuition is false. Based on three experiments (N = 90), we demonstrate that intensity in fact has a paradoxical role. Participants were asked to rate and classify the authenticity, intensity and emotion, as well as valence and arousal of the wide range of vocalizations. Listeners are clearly able to infer expressed intensity and arousal; in contrast, and surprisingly, emotion category and valence have a perceptual sweet spot: moderate and strong emotions are clearly categorized, but peak emotions are maximally ambiguous. This finding, which converges with related observations from visual experiments, raises interesting theoretical challenges for the emotion communication literature.
Collapse
Affiliation(s)
- N Holz
- Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, Frankfurt/M, Germany.
| | - P Larrouy-Maestri
- Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, Frankfurt/M, Germany
- Max Planck NYU Center for Language, Music, and Emotion, Frankfurt/M, Germany
| | - D Poeppel
- Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, Frankfurt/M, Germany
- Max Planck NYU Center for Language, Music, and Emotion, Frankfurt/M, Germany
- Department of Psychology, New York University, New York, NY, USA
| |
Collapse
|
26
|
Engelberg JWM, Schwartz JW, Gouzoules H. The emotional canvas of human screams: patterns and acoustic cues in the perceptual categorization of a basic call type. PeerJ 2021; 9:e10990. [PMID: 33854835 PMCID: PMC7953872 DOI: 10.7717/peerj.10990] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 02/01/2021] [Indexed: 11/20/2022] Open
Abstract
Screams occur across taxonomically widespread species, typically in antipredator situations, and are strikingly similar acoustically, but in nonhuman primates, they have taken on acoustically varied forms in association with more contextually complex functions related to agonistic recruitment. Humans scream in an even broader range of contexts, but the extent to which acoustic variation allows listeners to perceive different emotional meanings remains unknown. We investigated how listeners responded to 30 contextually diverse human screams on six different emotion prompts as well as how selected acoustic cues predicted these responses. We found that acoustic variation in screams was associated with the perception of different emotions from these calls. Emotion ratings generally fell along two dimensions: one contrasting perceived anger, frustration, and pain with surprise and happiness, roughly associated with call duration and roughness, and one related to perceived fear, associated with call fundamental frequency. Listeners were more likely to rate screams highly in emotion prompts matching the source context, suggesting that some screams conveyed information about emotional context, but it is noteworthy that the analysis of screams from happiness contexts (n = 11 screams) revealed that they more often yielded higher ratings of fear. We discuss the implications of these findings for the role and evolution of nonlinguistic vocalizations in human communication, including consideration of how the expanded diversity in calls such as human screams might represent a derived function of language.
Collapse
Affiliation(s)
| | - Jay W. Schwartz
- Department of Psychology, Emory University, Atlanta, GA, USA
- Psychological Sciences Department, Western Oregon University, Monmouth, OR, USA
| | | |
Collapse
|
27
|
Cosme G, Rosa PJ, Lima CF, Tavares V, Scott S, Chen S, Wilcockson TDW, Crawford TJ, Prata D. Pupil dilation reflects the authenticity of received nonverbal vocalizations. Sci Rep 2021; 11:3733. [PMID: 33580104 PMCID: PMC7880996 DOI: 10.1038/s41598-021-83070-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 01/27/2021] [Indexed: 11/10/2022] Open
Abstract
The ability to infer the authenticity of other’s emotional expressions is a social cognitive process taking place in all human interactions. Although the neurocognitive correlates of authenticity recognition have been probed, its potential recruitment of the peripheral autonomic nervous system is not known. In this work, we asked participants to rate the authenticity of authentic and acted laughs and cries, while simultaneously recording their pupil size, taken as proxy of cognitive effort and arousal. We report, for the first time, that acted laughs elicited higher pupil dilation than authentic ones and, reversely, authentic cries elicited higher pupil dilation than acted ones. We tentatively suggest the lack of authenticity in others’ laughs elicits increased pupil dilation through demanding higher cognitive effort; and that, reversely, authenticity in cries increases pupil dilation, through eliciting higher emotional arousal. We also show authentic vocalizations and laughs (i.e. main effects of authenticity and emotion) to be perceived as more authentic, arousing and contagious than acted vocalizations and cries, respectively. In conclusion, we show new evidence that the recognition of emotional authenticity can be manifested at the level of the autonomic nervous system in humans. Notwithstanding, given its novelty, further independent research is warranted to ascertain its psychological meaning.
Collapse
Affiliation(s)
- Gonçalo Cosme
- Instituto de Biofísica e Engenharia Biomédica, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016, Lisboa, Portugal
| | - Pedro J Rosa
- HEI-LAB: Human-Environment Interaction Lab, Universidade Lusófona de Humanidades e Tecnologias, Campo Grande, 376, 1749-024, Lisboa, Portugal.,Instituto Universitário de Lisboa (Iscte-IUL), CIS-Iscte, Avenida das Forças Armadas, 1649-026, Lisboa, Portugal
| | - César F Lima
- Instituto Universitário de Lisboa (Iscte-IUL), CIS-Iscte, Avenida das Forças Armadas, 1649-026, Lisboa, Portugal
| | - Vânia Tavares
- Instituto de Biofísica e Engenharia Biomédica, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016, Lisboa, Portugal.,Faculdade de Medicina da Universidade de Lisboa, Avenida Professor Egas Moniz MB, 1649-028, Lisboa, Portugal
| | - Sophie Scott
- Institute of Cognitive Neuroscience, University College London, Alexandra House, 17-19 Queen Square, Bloomsbury, London, UK
| | - Sinead Chen
- Risk Society and Policy Research Center, National Taiwan University, Roosevelt Rd., Daan Dist., Taipei, 10617, Taiwan
| | - Thomas D W Wilcockson
- Department of Psychology, Lancaster University, Lancaster, LA1 4YF, UK.,School of Sport, Exercise, and Health Sciences, Loughborough University, Clyde Williams Building, Epinal Way, Loughborough, LE11 3GE, UK
| | - Trevor J Crawford
- Department of Psychology, Lancaster University, Lancaster, LA1 4YF, UK
| | - Diana Prata
- Instituto de Biofísica e Engenharia Biomédica, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016, Lisboa, Portugal. .,Instituto Universitário de Lisboa (Iscte-IUL), CIS-Iscte, Avenida das Forças Armadas, 1649-026, Lisboa, Portugal. .,Department of Neuroimaging, Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigny Park, Camberwell, London, SE5 8AF, UK.
| |
Collapse
|
28
|
Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis. PROCEEDINGS OF THE CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING. CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING 2020; 2020:1823-1833. [PMID: 33969363 DOI: 10.18653/v1/2020.emnlp-main.143] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The human language can be expressed through multiple sources of information known as modalities, including tones of voice, facial gestures, and spoken language. Recent multimodal learning with strong performances on human-centric tasks such as sentiment analysis and emotion recognition are often black-box, with very limited interpretability. In this paper we propose Multimodal Routing, which dynamically adjusts weights between input modalities and output representations differently for each input sample. Multimodal routing can identify relative importance of both individual modalities and cross-modality features. Moreover, the weight assignment by routing allows us to interpret modality-prediction relationships not only globally (i.e. general trends over the whole dataset), but also locally for each single input sample, mean-while keeping competitive performance compared to state-of-the-art methods.
Collapse
|
29
|
Abstract
Researchers examining nonverbal communication of emotions are becoming increasingly interested in differentiations between different positive emotional states like interest, relief, and pride. But despite the importance of the voice in communicating emotion in general and positive emotion in particular, there is to date no systematic review of what characterizes vocal expressions of different positive emotions. Furthermore, integration and synthesis of current findings are lacking. In this review, we comprehensively review studies (N = 108) investigating acoustic features relating to specific positive emotions in speech prosody and nonverbal vocalizations. We find that happy voices are generally loud with considerable variability in loudness, have high and variable pitch, and are high in the first two formant frequencies. When specific positive emotions are directly compared with each other, pitch mean, loudness mean, and speech rate differ across positive emotions, with patterns mapping onto clusters of emotions, so-called emotion families. For instance, pitch is higher for epistemological emotions (amusement, interest, relief), moderate for savouring emotions (contentment and pleasure), and lower for a prosocial emotion (admiration). Some, but not all, of the differences in acoustic patterns also map on to differences in arousal levels. We end by pointing to limitations in extant work and making concrete proposals for future research on positive emotions in the voice.
Collapse
|
30
|
Whiting CM, Kotz SA, Gross J, Giordano BL, Belin P. The perception of caricatured emotion in voice. Cognition 2020; 200:104249. [PMID: 32413547 PMCID: PMC7315128 DOI: 10.1016/j.cognition.2020.104249] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Revised: 02/10/2020] [Accepted: 02/27/2020] [Indexed: 11/29/2022]
Abstract
Affective vocalisations such as screams and laughs can convey strong emotional content without verbal information. Previous research using morphed vocalisations (e.g. 25% fear/75% anger) has revealed categorical perception of emotion in voices, showing sudden shifts at emotion category boundaries. However, it is currently unknown how further modulation of vocalisations beyond the veridical emotion (e.g. 125% fear) affects perception. Caricatured facial expressions produce emotions that are perceived as more intense and distinctive, with faster recognition relative to the original and anti-caricatured (e.g. 75% fear) emotions, but a similar effect using vocal caricatures has not been previously examined. Furthermore, caricatures can play a key role in assessing how distinctiveness is identified, in particular by evaluating accounts of emotion perception with reference to prototypes (distance from the central stimulus) and exemplars (density of the stimulus space). Stimuli consisted of four emotions (anger, disgust, fear, and pleasure) morphed at 25% intervals between a neutral expression and each emotion from 25% to 125%, and between each pair of emotions. Emotion perception was assessed using emotion intensity ratings, valence and arousal ratings, speeded categorisation and paired similarity ratings. We report two key findings: 1) across tasks, there was a strongly linear effect of caricaturing, with caricatured emotions (125%) perceived as higher in emotion intensity and arousal, and recognised faster compared to the original emotion (100%) and anti-caricatures (25%-75%); 2) our results reveal evidence for a unique contribution of a prototype-based account in emotion recognition. We show for the first time that vocal caricature effects are comparable to those found previously with facial caricatures. The set of caricatured vocalisations provided open a promising line of research for investigating vocal affect perception and emotion processing deficits in clinical populations.
Collapse
Affiliation(s)
- Caroline M Whiting
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK.
| | - Sonja A Kotz
- Faculty of Psychology and Neuroscience, Department of Neuropsychology and Psychopharmacology, Maastricht University, Maastricht, the Netherlands; Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK; Institute for Biomagnetism and Biosignalanalysis, University of Münster, Germany
| | - Bruno L Giordano
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK; Institut de Neurosciences de la Timone, CNRS UMR 7289, Aix-Marseille Université, Marseille, France.
| | - Pascal Belin
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK; Institut de Neurosciences de la Timone, CNRS UMR 7289, Aix-Marseille Université, Marseille, France
| |
Collapse
|
31
|
Soma CS, Baucom BRW, Xiao B, Butner JE, Hilpert P, Narayanan S, Atkins DC, Imel ZE. Coregulation of therapist and client emotion during psychotherapy. Psychother Res 2020; 30:591-603. [PMID: 32400306 DOI: 10.1080/10503307.2019.1661541] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
OBJECTIVE Close interpersonal relationships are fundamental to emotion regulation. Clinical theory suggests that one role of therapists in psychotherapy is to help clients regulate emotions, however, if and how clients and therapists serve to regulate each other's emotions has not been empirically tested. Emotion coregulation - the bidirectional emotional linkage of two people that promotes emotional stability - is a specific, temporal process that provides a framework for testing the way in which therapists' and clients' emotions may be related on a moment to moment basis in clinically relevant ways. METHOD Utilizing 227 audio recordings from a relationally oriented treatment (Motivational Interviewing), we estimated continuous values of vocally encoded emotional arousal via mean fundamental frequency. We used dynamic systems models to examine emotional coregulation, and tested the hypothesis that each individual's emotional arousal would be significantly associated with fluctuations in the other's emotional state over the course of a psychotherapy session. RESULTS Results indicated that when clients became more emotionally labile over the course of the session, therapists became less so. When changes in therapist arousal increased, the client's tendency to become more aroused during session slowed. Alternatively, when changes in client arousal increased, the therapist's tendency to become less aroused slowed.
Collapse
Affiliation(s)
- Christina S Soma
- Department of Educational Psychology, University of Utah, Salt Lake City, UT, USA
| | - Brian R W Baucom
- Department of Psychology, University of Utah, Salt Lake City, UT, USA
| | - Bo Xiao
- Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| | - Jonathan E Butner
- Department of Psychology, University of Utah, Salt Lake City, UT, USA
| | - Peter Hilpert
- School of Psychology, University of Surrey, Guilford, UK
| | - Shrikanth Narayanan
- Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| | - David C Atkins
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Zac E Imel
- Department of Educational Psychology, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
32
|
Zarotti N, Fletcher I, Simpson J. New Perspectives on Emotional Processing in People with Symptomatic Huntington's Disease: Impaired Emotion Regulation and Recognition of Emotional Body Language†. Arch Clin Neuropsychol 2020; 34:610-624. [PMID: 30395151 DOI: 10.1093/arclin/acy085] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 10/05/2018] [Accepted: 10/09/2018] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVE Emotion regulation and emotional body language (EBL) recognition represent two fundamental components of emotional processing that have recently seen a considerable surge in research interest, in part due to the role they play in optimizing mental health. This appears to be particularly true for clinical conditions that can profoundly affect emotional functioning. Among these is Huntington's disease (HD), a neurodegenerative disorder that is associated with several psychological difficulties and cognitive impairments, including well-established deficits in facial emotion recognition. However, although the theoretical case for impairments is strong, the current evidence in HD on other components such as emotion regulation and EBL recognition is sparse. METHOD In this study, it was hypothesized that emotion regulation and recognition of EBL are impaired in people with symptomatic HD, and that these impairments significantly and positively correlate with each other. A between-subjects design was adopted to compare 13 people with symptomatic HD with 12 non-affected controls matched for age and education. RESULTS The results showed that emotion regulation and EBL recognition were significantly impaired in individuals with HD. Moreover, a significant positive correlation was observed between facial and EBL recognition impairments, whereas EBL performance was negatively related to the disease stage. However, emotion regulation and recognition performances were not significantly correlated. CONCLUSIONS This investigation represents the first evidence of a deficit of emotion regulation and EBL recognition in individuals with HD. The clinical implications of these findings are explored, and indications for future research are proposed.
Collapse
Affiliation(s)
- Nicolò Zarotti
- Division of Health Research, Faculty of Health and Medicine, Lancaster University, Lancaster, UK
| | - Ian Fletcher
- Division of Health Research, Faculty of Health and Medicine, Lancaster University, Lancaster, UK
| | - Jane Simpson
- Division of Health Research, Faculty of Health and Medicine, Lancaster University, Lancaster, UK
| |
Collapse
|
33
|
Abstract
To ensure that listeners pay attention and do not habituate, emotionally intense vocalizations may be under evolutionary pressure to exploit processing biases in the auditory system by maximising their bottom-up salience. This "salience code" hypothesis was tested using 128 human nonverbal vocalizations representing eight emotions: amusement, anger, disgust, effort, fear, pain, pleasure, and sadness. As expected, within each emotion category salience ratings derived from pairwise comparisons strongly correlated with perceived emotion intensity. For example, while laughs as a class were less salient than screams of fear, salience scores almost perfectly explained the perceived intensity of both amusement and fear considered separately. Validating self-rated salience evaluations, high- vs. low-salience sounds caused 25% more recall errors in a short-term memory task, whereas emotion intensity had no independent effect on recall errors. Furthermore, the acoustic characteristics of salient vocalizations were similar to those previously described for non-emotional sounds (greater duration and intensity, high pitch, bright timbre, rapid modulations, and variable spectral characteristics), confirming that vocalizations were not salient merely because of their emotional content. The acoustic code in nonverbal communication is thus aligned with sensory biases, offering a general explanation for some non-arbitrary properties of human and animal high-arousal vocalizations.
Collapse
Affiliation(s)
- Andrey Anikin
- Division of Cognitive Science, Lund University, Lund, Sweden
| |
Collapse
|
34
|
Anikin A. A Moan of Pleasure Should Be Breathy: The Effect of Voice Quality on the Meaning of Human Nonverbal Vocalizations. PHONETICA 2020; 77:327-349. [PMID: 31962309 PMCID: PMC7592904 DOI: 10.1159/000504855] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 11/15/2019] [Indexed: 05/19/2023]
Abstract
Prosodic features, such as intonation and voice intensity, have a well-documented role in communicating emotion, but less is known about the role of laryngeal voice quality in speech and particularly in nonverbal vocalizations such as laughs and moans. Potentially, however, variations in voice quality between tense and breathy may convey rich information about the speaker's physiological and affective state. In this study breathiness was manipulated in synthetic human nonverbal vocalizations by adjusting the relative strength of upper harmonics and aspiration noise. In experiment 1 (28 prototypes × 3 manipulations = 84 sounds), otherwise identical vocalizations with tense versus breathy voice quality were associated with higher arousal (general alertness), higher dominance, and lower valence (unpleasant states). Ratings on discrete emotions in experiment 2 (56 × 3 = 168 sounds) confirmed that breathiness was reliably associated with positive emotions, particularly in ambiguous vocalizations (gasps and moans). The spectral centroid did not fully account for the effect of manipulation, confirming that the perceived change in voice quality was more specific than a general shift in timbral brightness. Breathiness is thus involved in communicating emotion with nonverbal vocalizations, possibly due to changes in low-level auditory salience and perceived vocal effort.
Collapse
|
35
|
Trösch M, Cuzol F, Parias C, Calandreau L, Nowak R, Lansade L. Horses Categorize Human Emotions Cross-Modally Based on Facial Expression and Non-Verbal Vocalizations. Animals (Basel) 2019; 9:ani9110862. [PMID: 31653088 PMCID: PMC6912773 DOI: 10.3390/ani9110862] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 10/13/2019] [Accepted: 10/22/2019] [Indexed: 12/16/2022] Open
Abstract
Simple Summary Recently, an increasing number of studies have investigated the expression and perception of emotions by non-human animals. In particular, it is of interest to determine whether animals can link emotion stimuli of different modalities (e.g., visual and oral) based on the emotions that are expressed (i.e., to recognize emotions cross-modally). For domestic species that share a close relationship with humans, we might even wonder whether this ability extends to human emotions. Here, we investigated whether domestic horses recognize human emotions cross-modally. We simultaneously presented two animated pictures of human facial expressions, one typical of joy and the other of anger; simultaneously, a speaker played a human non-verbal vocalization expressing joy or anger. Horses looked at the picture that did not match the emotion of the vocalization more (probably because they were intrigued by the paradoxical combination). Moreover, their behavior and heart rate differed depending on the vocalization: they reacted more negatively to the anger vocalization and more positively to the joy vocalization. These results suggest that horses can match visual and vocal cues for the same emotion and can perceive the emotional valence of human non-verbal vocalizations. Abstract Over the last few years, an increasing number of studies have aimed to gain more insight into the field of animal emotions. In particular, it is of interest to determine whether animals can cross-modally categorize the emotions of others. For domestic animals that share a close relationship with humans, we might wonder whether this cross-modal recognition of emotions extends to humans, as well. In this study, we tested whether horses could recognize human emotions and attribute the emotional valence of visual (facial expression) and vocal (non-verbal vocalization) stimuli to the same perceptual category. Two animated pictures of different facial expressions (anger and joy) were simultaneously presented to the horses, while a speaker played an emotional human non-verbal vocalization matching one of the two facial expressions. Horses looked at the picture that was incongruent with the vocalization more, probably because they were intrigued by the paradoxical combination. Moreover, horses reacted in accordance with the valence of the vocalization, both behaviorally and physiologically (heart rate). These results show that horses can cross-modally recognize human emotions and react emotionally to the emotional states of humans, assessed by non-verbal vocalizations.
Collapse
Affiliation(s)
- Miléna Trösch
- INRA, PRC, CNRS, IFCE, Université de Tours, 37380 Nouzilly, France.
| | - Florent Cuzol
- INRA, PRC, CNRS, IFCE, Université de Tours, 37380 Nouzilly, France.
| | - Céline Parias
- INRA, PRC, CNRS, IFCE, Université de Tours, 37380 Nouzilly, France.
| | | | - Raymond Nowak
- INRA, PRC, CNRS, IFCE, Université de Tours, 37380 Nouzilly, France.
| | - Léa Lansade
- INRA, PRC, CNRS, IFCE, Université de Tours, 37380 Nouzilly, France.
| |
Collapse
|
36
|
The Hoosier Vocal Emotions Corpus: A validated set of North American English pseudo-words for evaluating emotion processing. Behav Res Methods 2019; 52:901-917. [PMID: 31485866 DOI: 10.3758/s13428-019-01288-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This article presents the development of the "Hoosier Vocal Emotions Corpus," a stimulus set of recorded pseudo-words based on the pronunciation rules of English. The corpus contains 73 controlled audio pseudo-words uttered by two actresses in five different emotions (i.e., happiness, sadness, fear, anger, and disgust) and in a neutral tone, yielding 1,763 audio files. In this article, we describe the corpus as well as a validation study of the pseudo-words. A total of 96 native English speakers completed a forced choice emotion identification task. All emotions were recognized better than chance overall, with substantial variability among the different tokens. All of the recordings, including the ambiguous stimuli, are made freely available, and the recognition rates and the full confusion matrices for each stimulus are provided in order to assist researchers and clinicians in the selection of stimuli. The corpus has unique characteristics that can be useful for experimental paradigms that require controlled stimuli (e.g., electroencephalographic or fMRI studies). Stimuli from this corpus could be used by researchers and clinicians to answer a variety of questions, including investigations of emotion processing in individuals with certain temperamental or behavioral characteristics associated with difficulties in emotion recognition (e.g., individuals with psychopathic traits); in bilingual individuals or nonnative English speakers; in patients with aphasia, schizophrenia, or other mental health disorders (e.g., depression); or in training automatic emotion recognition algorithms. The Hoosier Vocal Emotions Corpus is available at https://psycholinguistics.indiana.edu/hoosiervocalemotions.htm.
Collapse
|
37
|
Mitrenga KJ, Alderson-Day B, May L, Moffatt J, Moseley P, Fernyhough C. Reading characters in voices: Ratings of personality characteristics from voices predict proneness to auditory verbal hallucinations. PLoS One 2019; 14:e0221127. [PMID: 31404114 PMCID: PMC6690516 DOI: 10.1371/journal.pone.0221127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 07/30/2019] [Indexed: 11/23/2022] Open
Abstract
People rapidly make first impressions of others, often based on very little information–minimal exposure to faces or voices is sufficient for humans to make up their mind about personality of others. While there has been considerable research on voice personality perception, much less is known about its relevance to hallucination-proneness, despite auditory hallucinations being frequently perceived as personified social agents. The present paper reports two studies investigating the relation between voice personality perception and hallucination-proneness in non-clinical samples. A voice personality perception task was created, in which participants rated short voice recordings on four personality characteristics, relating to dimensions of the voice’s perceived Valence and Dominance. Hierarchical regression was used to assess contributions of Valence and Dominance voice personality ratings to hallucination-proneness scores, controlling for paranoia-proneness and vividness of mental imagery. Results from Study 1 suggested that high ratings of voices as dominant might be related to high hallucination-proneness; however, this relation seemed to be dependent on reported levels of paranoid thinking. In Study 2, we show that hallucination-proneness was associated with high ratings of voice dominance, and this was independent of paranoia and imagery abilities scores, both of which were found to be significant predictors of hallucination-proneness. Results from Study 2 suggest an interaction between gender of participants and the gender of the voice actor, where only ratings of own gender voices on Dominance characteristics are related to hallucination-proneness scores. These results are important for understanding the perception of characterful features of voices and its significance for psychopathology.
Collapse
Affiliation(s)
- Kaja Julia Mitrenga
- Department of Psychology, Durham University, Durham, England, United Kingdom
- * E-mail:
| | - Ben Alderson-Day
- Department of Psychology, Durham University, Durham, England, United Kingdom
| | - Lucy May
- School of Psychology and Clinical Language Science, University of Reading, Reading, England, United Kingdom
| | - Jamie Moffatt
- Department of Psychology, Durham University, Durham, England, United Kingdom
- School of Psychology, University of Sussex, Falmer, England, United Kingdom
| | - Peter Moseley
- Department of Psychology, Durham University, Durham, England, United Kingdom
- Department of Psychology, University of Central Lancashire, Preston, England, United Kingdom
| | - Charles Fernyhough
- Department of Psychology, Durham University, Durham, England, United Kingdom
| |
Collapse
|
38
|
Castiajo P, Pinheiro AP. Decoding emotions from nonverbal vocalizations: How much voice signal is enough? MOTIVATION AND EMOTION 2019. [DOI: 10.1007/s11031-019-09783-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
39
|
Saffarian A, Shavaki YA, Shahidi GA, Jafari Z. Effect of Parkinson Disease on Emotion Perception Using the Persian Affective Voices Test. J Voice 2019; 33:580.e1-580.e9. [DOI: 10.1016/j.jvoice.2018.01.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Accepted: 01/16/2018] [Indexed: 12/01/2022]
|
40
|
Abstract
Voice synthesis is a useful method for investigating the communicative role of different acoustic features. Although many text-to-speech systems are available, researchers of human nonverbal vocalizations and bioacousticians may profit from a dedicated simple tool for synthesizing and manipulating natural-sounding vocalizations. Soundgen ( https://CRAN.R-project.org/package=soundgen ) is an open-source R package that synthesizes nonverbal vocalizations based on meaningful acoustic parameters, which can be specified from the command line or in an interactive app. This tool was validated by comparing the perceived emotion, valence, arousal, and authenticity of 60 recorded human nonverbal vocalizations (screams, moans, laughs, and so on) and their approximate synthetic reproductions. Each synthetic sound was created by manually specifying only a small number of high-level control parameters, such as syllable length and a few anchors for the intonation contour. Nevertheless, the valence and arousal ratings of synthetic sounds were similar to those of the original recordings, and the authenticity ratings were comparable, maintaining parity with the originals for less complex vocalizations. Manipulating the precise acoustic characteristics of synthetic sounds may shed light on the salient predictors of emotion in the human voice. More generally, soundgen may prove useful for any studies that require precise control over the acoustic features of nonspeech sounds, including research on animal vocalizations and auditory perception.
Collapse
Affiliation(s)
- Andrey Anikin
- Division of Cognitive Science, Department of Philosophy, Lund University, Box 192, SE-221 00, Lund, Sweden.
| |
Collapse
|
41
|
Pinheiro AP, Lima D, Albuquerque PB, Anikin A, Lima CF. Spatial location and emotion modulate voice perception. Cogn Emot 2019; 33:1577-1586. [PMID: 30870109 DOI: 10.1080/02699931.2019.1586647] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
How do we perceive voices coming from different spatial locations, and how is this affected by emotion? The current study probed the interplay between space and emotion during voice perception. Thirty participants listened to nonverbal vocalizations coming from different locations around the head (left vs. right; front vs. back), and differing in valence (neutral, positive [amusement] or negative [anger]). They were instructed to identify the location of the vocalizations (Experiment 1) and to evaluate their emotional qualities (Experiment 2). Emotion-space interactions were observed, but only in Experiment 1: emotional vocalizations were better localised than neutral ones when they were presented from the back and the right side. In Experiment 2, emotion recognition accuracy was increased for positive vs. negative and neutral vocalizations, and perceived arousal was increased for emotional vs. neutral vocalizations, but this was independent of spatial location. These findings indicate that emotional salience affects how we perceive the spatial location of voices. They additionally suggest that the interaction between spatial ("where") and emotional ("what") properties of the voice differs as a function of task.
Collapse
Affiliation(s)
- Ana P Pinheiro
- Faculdade de Psicologia, Universidade de Lisboa , Lisboa , Portugal
| | - Diogo Lima
- School of Psychology, University of Minho , Braga , Portugal
| | | | - Andrey Anikin
- Division of Cognitive Science, Department of Philosophy, Lund University , Lund , Sweden
| | - César F Lima
- Instituto Universitário de Lisboa (ISCTE-IUL) , Lisboa , Portugal
| |
Collapse
|
42
|
Anikin A. The perceptual effects of manipulating nonlinear phenomena in synthetic nonverbal vocalizations. BIOACOUSTICS 2019. [DOI: 10.1080/09524622.2019.1581839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Andrey Anikin
- Division of Cognitive Science, Department of Philosophy, Lund University, Lund, Sweden
| |
Collapse
|
43
|
Courbalay A, Deroche T, Pradon D, Oliveira AM, Amorim MA. Clinical experience changes the combination and the weighting of audio-visual sources of information. Acta Psychol (Amst) 2018; 191:219-227. [PMID: 30336350 DOI: 10.1016/j.actpsy.2018.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 09/19/2018] [Accepted: 09/26/2018] [Indexed: 11/30/2022] Open
Abstract
OBJECTIVE Although audio and visual information constitute relevant channels to communicate pain, it remains unclear to what extent observers combine and weight these sources of information when estimating others' pain. The present study aimed to examine this issue through the theoretical framework of the Information Integration Theory. The combination and weighting processes were addressed in view of familiarity with others' pain. METHOD Twenty-six participants familiar with pain (novice podiatry clinicians) and thirty non-specialists were asked to estimate the level of pain associated with different displayed locomotor behaviors. Audio and visual information (i.e., sound and gait kinematics) were combined across different intensities and implemented in animated human stick figures performing a walking task (from normal to pathological gaits). RESULTS The novice clinicians and non-specialists relied significantly on gaits and sounds to estimate others' pain intensity. The combination of the two types of information obeyed an averaging rule for the majority of the novice clinicians and an additive rule for the non-specialists. The novice clinicians leaned more on gaits in the absence of limping, whereas they depended more on sounds in the presence of limping. The non-specialists relied more on gaits than on sounds. Overall, the novice clinicians attributed greater pain levels than the non-specialists did. CONCLUSION Depending on a person's clinical experience, the combination of audio and visual pain-related behavior can qualitatively change the processes related to the assessment of others' pain. Non-verbal pain-related behaviors as well as the clinical implications are discussed in view of the assessment of others' pain.
Collapse
Affiliation(s)
- Anne Courbalay
- CIAMS, Univ. Paris-Sud, Université Paris-Saclay, 91405 Orsay Cedex, France; CIAMS, Université d'Orléans, 45067 Orléans, France; APCoSS - Institute of Physical Education and Sports Sciences (IFEPSA), UCO, Angers, France.
| | - Thomas Deroche
- CIAMS, Univ. Paris-Sud, Université Paris-Saclay, 91405 Orsay Cedex, France; CIAMS, Université d'Orléans, 45067 Orléans, France.
| | - Didier Pradon
- UMR 1179 END-ICAP (INSERM-UVSQ), Hôpital Universitaire Raymond Poincaré, APHP, Garches, France.
| | - Armando M Oliveira
- Institute of Cognitive Psychology, Faculty of Psychology and Educational Sciences, University of Coimbra, Coimbra, Portugal.
| | - Michel-Ange Amorim
- CIAMS, Univ. Paris-Sud, Université Paris-Saclay, 91405 Orsay Cedex, France; CIAMS, Université d'Orléans, 45067 Orléans, France.
| |
Collapse
|
44
|
DAVID: An open-source platform for real-time transformation of infra-segmental emotional cues in running speech. Behav Res Methods 2018; 50:323-343. [PMID: 28374144 PMCID: PMC5809549 DOI: 10.3758/s13428-017-0873-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
We present an open-source software platform that transforms emotional cues expressed by speech signals using audio effects like pitch shifting, inflection, vibrato, and filtering. The emotional transformations can be applied to any audio file, but can also run in real time, using live input from a microphone, with less than 20-ms latency. We anticipate that this tool will be useful for the study of emotions in psychology and neuroscience, because it enables a high level of control over the acoustical and emotional content of experimental stimuli in a variety of laboratory situations, including real-time social situations. We present here results of a series of validation experiments aiming to position the tool against several methodological requirements: that transformed emotions be recognized at above-chance levels, valid in several languages (French, English, Swedish, and Japanese) and with a naturalness comparable to natural speech.
Collapse
|
45
|
Affiliation(s)
- Jordan Raine
- Mammal Vocal Communication and Cognition Research Group, School of Psychology, University of Sussex, Brighton, UK
| | - Katarzyna Pisanski
- Mammal Vocal Communication and Cognition Research Group, School of Psychology, University of Sussex, Brighton, UK
| | - Julia Simner
- MULTISENSE Research Lab, School of Psychology, University of Sussex, Brighton, UK
| | - David Reby
- Mammal Vocal Communication and Cognition Research Group, School of Psychology, University of Sussex, Brighton, UK
| |
Collapse
|
46
|
Lazzeri N, Mazzei D, Ben Moussa M, Magnenat-Thalmann N, De Rossi D. The influence of dynamics and speech on understanding humanoid facial expressions. INT J ADV ROBOT SYST 2018. [DOI: 10.1177/1729881418783158] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Human communication relies mostly on nonverbal signals expressed through body language. Facial expressions, in particular, convey emotional information that allows people involved in social interactions to mutually judge the emotional states and to adjust its behavior appropriately. First studies aimed at investigating the recognition of facial expressions were based on static stimuli. However, facial expressions are rarely static, especially in everyday social interactions. Therefore, it has been hypothesized that the dynamics inherent in a facial expression could be fundamental in understanding its meaning. In addition, it has been demonstrated that nonlinguistic and linguistic information can contribute to reinforce the meaning of a facial expression making it easier to be recognized. Nevertheless, few studies have been performed on realistic humanoid robots. This experimental work aimed at demonstrating the human-like expressive capability of a humanoid robot by examining whether the effect of motion and vocal content influenced the perception of its facial expressions. The first part of the experiment aimed at studying the recognition capability of two kinds of stimuli related to the six basic expressions (i.e. anger, disgust, fear, happiness, sadness, and surprise): static stimuli, that is, photographs, and dynamic stimuli, that is, video recordings. The second and third parts were focused on comparing the same six basic expressions performed by a virtual avatar and by a physical robot under three different conditions: (1) muted facial expressions, (2) facial expressions with nonlinguistic vocalizations, and (3) facial expressions with an emotionally neutral verbal sentence. The results show that static stimuli performed by a human being and by the robot were more ambiguous than the corresponding dynamic stimuli on which motion and vocalization were associated. This hypothesis has been also investigated with a 3-dimensional replica of the physical robot demonstrating that even in case of a virtual avatar, dynamic and vocalization improve the emotional conveying capability.
Collapse
Affiliation(s)
- Nicole Lazzeri
- Research Center E. Piaggio, Faculty of Engineering, University of Pisa, Pisa, Italy
| | - Daniele Mazzei
- Computer Science Department, University of Pisa, Pisa, Italy
| | - Maher Ben Moussa
- Computer Science Centre, University of Geneva, Geneva, Switzerland
| | | | - Danilo De Rossi
- Research Center E. Piaggio, Faculty of Engineering, University of Pisa, Pisa, Italy
| |
Collapse
|
47
|
Lima CF, Anikin A, Monteiro AC, Scott SK, Castro SL. Automaticity in the recognition of nonverbal emotional vocalizations. ACTA ACUST UNITED AC 2018; 19:219-233. [PMID: 29792444 DOI: 10.1037/emo0000429] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The ability to perceive the emotions of others is crucial for everyday social interactions. Important aspects of visual socioemotional processing, such as the recognition of facial expressions, are known to depend on largely automatic mechanisms. However, whether and how properties of automaticity extend to the auditory domain remains poorly understood. Here we ask if nonverbal auditory emotion recognition is a controlled deliberate or an automatic efficient process, using vocalizations such as laughter, crying, and screams. In a between-subjects design (N = 112), and covering eight emotions (four positive), we determined whether emotion recognition accuracy (a) is improved when participants actively deliberate about their responses (compared with when they respond as fast as possible) and (b) is impaired when they respond under low and high levels of cognitive load (concurrent task involving memorizing sequences of six or eight digits, respectively). Response latencies were also measured. Mixed-effects models revealed that recognition accuracy was high across emotions, and only minimally affected by deliberation and cognitive load; the benefits of deliberation and costs of cognitive load were significant mostly for positive emotions, notably amusement/laughter, and smaller or absent for negative ones; response latencies did not suffer under low or high cognitive load; and high recognition accuracy (approximately 90%) could be reached within 500 ms after the stimulus onset, with performance exceeding chance-level already between 300 and 360 ms. These findings indicate that key features of automaticity, namely fast and efficient/effortless processing, might be a modality-independent component of emotion recognition. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
Affiliation(s)
- César F Lima
- Faculty of Psychology and Education Sciences, University of Porto
| | - Andrey Anikin
- Division of Cognitive Science, Department of Philosophy, Lund University
| | | | - Sophie K Scott
- Institute of Cognitive Neuroscience, University College London
| | - São Luís Castro
- Faculty of Psychology and Education Sciences, University of Porto
| |
Collapse
|
48
|
Livingstone SR, Russo FA. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One 2018; 13:e0196391. [PMID: 29768426 PMCID: PMC5955500 DOI: 10.1371/journal.pone.0196391] [Citation(s) in RCA: 196] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 04/12/2018] [Indexed: 11/19/2022] Open
Abstract
The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976.
Collapse
Affiliation(s)
- Steven R. Livingstone
- Department of Psychology, Ryerson University, Toronto, Canada
- Department of Computer Science and Information Systems, University of Wisconsin-River Falls, Wisconsin, WI, United States of America
| | - Frank A. Russo
- Department of Psychology, Ryerson University, Toronto, Canada
| |
Collapse
|
49
|
Buimer HP, Bittner M, Kostelijk T, van der Geest TM, Nemri A, van Wezel RJA, Zhao Y. Conveying facial expressions to blind and visually impaired persons through a wearable vibrotactile device. PLoS One 2018; 13:e0194737. [PMID: 29584738 PMCID: PMC5870993 DOI: 10.1371/journal.pone.0194737] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 03/08/2018] [Indexed: 11/18/2022] Open
Abstract
In face-to-face social interactions, blind and visually impaired persons (VIPs) lack access to nonverbal cues like facial expressions, body posture, and gestures, which may lead to impaired interpersonal communication. In this study, a wearable sensory substitution device (SSD) consisting of a head mounted camera and a haptic belt was evaluated to determine whether vibrotactile cues around the waist could be used to convey facial expressions to users and whether such a device is desired by VIPs for use in daily living situations. Ten VIPs (mean age: 38.8, SD: 14.4) and 10 sighted persons (SPs) (mean age: 44.5, SD: 19.6) participated in the study, in which validated sets of pictures, silent videos, and videos with audio of facial expressions were presented to the participant. A control measurement was first performed to determine how accurately participants could identify facial expressions while relying on their functional senses. After a short training, participants were asked to determine facial expressions while wearing the emotion feedback system. VIPs using the device showed significant improvements in their ability to determine which facial expressions were shown. A significant increase in accuracy of 44.4% was found across all types of stimuli when comparing the scores of the control (mean±SEM: 35.0±2.5%) and supported (mean±SEM: 79.4±2.1%) phases. The greatest improvements achieved with the support of the SSD were found for silent stimuli (68.3% for pictures and 50.8% for silent videos). SPs also showed consistent, though not statistically significant, improvements while supported. Overall, our study shows that vibrotactile cues are well suited to convey facial expressions to VIPs in real-time. Participants became skilled with the device after a short training session. Further testing and development of the SSD is required to improve its accuracy and aesthetics for potential daily use.
Collapse
Affiliation(s)
- Hendrik P. Buimer
- Department of Biomedical Signals and Systems, MIRA Institute, University of Twente, Enschede, The Netherlands
- Department of Biophysics, Donders Institute, Radboud University, Nijmegen, The Netherlands
- * E-mail:
| | - Marian Bittner
- Department of Biomedical Signals and Systems, MIRA Institute, University of Twente, Enschede, The Netherlands
| | | | - Thea M. van der Geest
- Department of Media, Communication, & Organization, University of Twente, Enschede, The Netherlands
- Department of Media and Design, HAN University of Applied Sciences, Arnhem, The Netherlands
| | - Abdellatif Nemri
- Department of Biomedical Signals and Systems, MIRA Institute, University of Twente, Enschede, The Netherlands
| | - Richard J. A. van Wezel
- Department of Biomedical Signals and Systems, MIRA Institute, University of Twente, Enschede, The Netherlands
- Department of Biophysics, Donders Institute, Radboud University, Nijmegen, The Netherlands
| | - Yan Zhao
- Department of Biomedical Signals and Systems, MIRA Institute, University of Twente, Enschede, The Netherlands
| |
Collapse
|
50
|
Amorim M, Pinheiro AP. Is the sunny side up and the dark side down? Effects of stimulus type and valence on a spatial detection task. Cogn Emot 2018; 33:346-360. [PMID: 29564964 DOI: 10.1080/02699931.2018.1452718] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
In verbal communication, affective information is commonly conveyed to others through spatial terms (e.g. in "I am feeling down", negative affect is associated with a lower spatial location). This study used a target location discrimination task with neutral, positive and negative stimuli (words, facial expressions, and vocalizations) to test the automaticity of the emotion-space association, both in the vertical and horizontal spatial axes. The effects of stimulus type on emotion-space representations were also probed. A congruency effect (reflected in reaction times) was observed in the vertical axis: detection of upper targets preceded by positive stimuli was faster. This effect occurred for all stimulus types, indicating that the emotion-space association is not dependent on sensory modality and on the verbal content of affective stimuli.
Collapse
Affiliation(s)
- Maria Amorim
- a Faculdade de Psicologia, Universidade de Lisboa Lisboa , Portugal.,b School of Psychology, University of Minho , Braga , Portugal
| | - Ana P Pinheiro
- a Faculdade de Psicologia, Universidade de Lisboa Lisboa , Portugal.,b School of Psychology, University of Minho , Braga , Portugal
| |
Collapse
|