1
|
Smith HMJ, Ritchie KL, Baguley TS, Lavan N. Face and voice identity matching accuracy is not improved by multimodal identity information. Br J Psychol 2025; 116:367-385. [PMID: 39690725 PMCID: PMC11984343 DOI: 10.1111/bjop.12757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 11/28/2024] [Indexed: 12/19/2024]
Abstract
Identity verification from both faces and voices can be error-prone. Previous research has shown that faces and voices signal concordant information and cross-modal unfamiliar face-to-voice matching is possible, albeit often with low accuracy. In the current study, we ask whether performance on a face or voice identity matching task can be improved by using multimodal stimuli which add a second modality (voice or face). We find that overall accuracy is higher for face matching than for voice matching. However, contrary to predictions, presenting one unimodal and one multimodal stimulus within a matching task did not improve face or voice matching compared to presenting two unimodal stimuli. Additionally, we find that presenting two multimodal stimuli does not improve accuracy compared to presenting two unimodal face stimuli. Thus, multimodal information does not improve accuracy. However, intriguingly, we find that cross-modal face-voice matching accuracy predicts voice matching accuracy but not face matching accuracy. This suggests cross-modal information can nonetheless play a role in identity matching, and face and voice information combine to inform matching decisions. We discuss our findings in light of current models of person perception, and consider the implications for identity verification in security and forensic settings.
Collapse
Affiliation(s)
| | | | | | - Nadine Lavan
- Department of Biological and Experimental PsychologyQueen Mary University LondonLondonUK
| |
Collapse
|
2
|
Liu C, Ma Y, Liang X, Xiang M, Wu H, Ning X. Decoding the Spatiotemporal Dynamics of Neural Response Similarity in Auditory Processing: A Multivariate Analysis Based on OPM-MEG. Hum Brain Mapp 2025; 46:e70175. [PMID: 40016919 PMCID: PMC11868016 DOI: 10.1002/hbm.70175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 01/26/2025] [Accepted: 02/12/2025] [Indexed: 03/01/2025] Open
Abstract
The brain represents information through the encoding of neural populations, where the activity patterns of these neural groups constitute the content of this information. Understanding these activity patterns and their dynamic changes is of significant importance to cognitive neuroscience and related research areas. Current studies focus more on brain regions that show differential responses to stimuli, but they lack the ability to capture information about the representational or process-level dynamics within these regions. In this study, we recorded neural data from 10 healthy participants during auditory experiments using optically pumped magnetometer magnetoencephalography (OPM-MEG) and electroencephalography (EEG). We constructed representational similarity matrices (RSMs) to investigate the similarity of neural response patterns during auditory decoding. The results indicate that RSA can reveal the dynamic changes in pattern similarity during different stages of auditory processing through the neural activity patterns reflected by OPM-MEG. Comparisons with EEG results showed that both techniques captured the same processes during the early stages of auditory decoding. However, differences in sensitivity at later stages highlighted both common and distinct aspects of neural representation between the two modalities. Further analysis indicated that this process involved widespread neural network activation, including the Heschl's gyrus, superior temporal gyrus, middle temporal gyrus, inferior temporal gyrus, parahippocampal gyrus, and orbitofrontal gyrus. This study demonstrates that the combination of OPM-MEG and RSA is sufficiently sensitive to detect changes in pattern similarity during neural representation processes and to identify their anatomical origins, offering new insights and references for the future application of RSA and other multivariate pattern analysis methods in the MEG field.
Collapse
Affiliation(s)
- Changzeng Liu
- Key Laboratory of Ultra‐Weak Magnetic Field Measurement Technology, Ministry of Education, School of Instrumentation and Optoelectronic EngineeringBeihang UniversityBeijingChina
- Hangzhou Institute of National Extremely‐Weak Magnetic Field InfrastructureHangzhouZhejiangChina
| | - Yuyu Ma
- Key Laboratory of Ultra‐Weak Magnetic Field Measurement Technology, Ministry of Education, School of Instrumentation and Optoelectronic EngineeringBeihang UniversityBeijingChina
- Hangzhou Institute of National Extremely‐Weak Magnetic Field InfrastructureHangzhouZhejiangChina
| | - Xiaoyu Liang
- Key Laboratory of Ultra‐Weak Magnetic Field Measurement Technology, Ministry of Education, School of Instrumentation and Optoelectronic EngineeringBeihang UniversityBeijingChina
- Hangzhou Institute of National Extremely‐Weak Magnetic Field InfrastructureHangzhouZhejiangChina
| | - Min Xiang
- Key Laboratory of Ultra‐Weak Magnetic Field Measurement Technology, Ministry of Education, School of Instrumentation and Optoelectronic EngineeringBeihang UniversityBeijingChina
- Hangzhou Institute of National Extremely‐Weak Magnetic Field InfrastructureHangzhouZhejiangChina
- Hefei National LaboratoryHefeiAnhuiChina
- Key Laboratory of Traditional Chinese Medicine SyndromeNational Institute of Extremely‐Weak Magnetic Field InfrastructureHangzhouZhejiangChina
| | - Huanqi Wu
- Key Laboratory of Ultra‐Weak Magnetic Field Measurement Technology, Ministry of Education, School of Instrumentation and Optoelectronic EngineeringBeihang UniversityBeijingChina
- Hangzhou Institute of National Extremely‐Weak Magnetic Field InfrastructureHangzhouZhejiangChina
| | - Xiaoling Ning
- Key Laboratory of Ultra‐Weak Magnetic Field Measurement Technology, Ministry of Education, School of Instrumentation and Optoelectronic EngineeringBeihang UniversityBeijingChina
- Hangzhou Institute of National Extremely‐Weak Magnetic Field InfrastructureHangzhouZhejiangChina
- Hefei National LaboratoryHefeiAnhuiChina
- Key Laboratory of Traditional Chinese Medicine SyndromeNational Institute of Extremely‐Weak Magnetic Field InfrastructureHangzhouZhejiangChina
- Shandong Key Laboratory for Magnetic Field‐Free Medicine and Functional Imaging, Institute of Magnetic Field‐Free Medicine and Functional ImagingShandong UniversityJinanShandongChina
| |
Collapse
|
3
|
Xu T, Jiang X, Zhang P, Wang A. Introducing the Sisu Voice Matching Test (SVMT): A novel tool for assessing voice discrimination in Chinese. Behav Res Methods 2025; 57:86. [PMID: 39900852 DOI: 10.3758/s13428-025-02608-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2025] [Indexed: 02/05/2025]
Abstract
Existing standardized tests for voice discrimination are based mainly on Indo-European languages, particularly English. However, voice identity perception is influenced by language familiarity, with listeners generally performing better in their native language than in a foreign one. To provide a more accurate and comprehensive assessment of voice discrimination, it is crucial to develop tests tailored to the native language of the test takers. In response, we developed the Sisu Voice Matching Test (SVMT), a pioneering tool designed specifically for Mandarin Chinese speakers. The SVMT was designed to model real-world communication since it includes both pseudo-word and pseudo-sentence stimuli and covers both the ability to categorize identical voices as the same and the ability to categorize distinct voices as different. Built on a neurally validated voice-space model and item response theory, the SVMT ensures high reliability, validity, appropriate difficulty, and strong discriminative power, while maintaining a concise test duration of approximately 10 min. Therefore, by taking into account the effects of language nativeness, the SVMT complements existing voice tests based on other languages' phonologies to provide a more accurate assessment of voice discrimination ability for Mandarin Chinese speakers. Future research can use the SVMT to deepen our understanding of the mechanisms underlying human voice identity perception, especially in special populations, and to examining the relationship between voice identity recognition and other cognitive processes.
Collapse
Affiliation(s)
- Tianze Xu
- Institute of Linguistics, Shanghai International Studies University, Shanghai, 201620, China
| | - Xiaoming Jiang
- Institute of Linguistics, Shanghai International Studies University, Shanghai, 201620, China.
- Key Laboratory of Language Science and Multilingual Artificial Intelligence, Shanghai International Studies University, Shanghai, 201620, China.
| | - Peng Zhang
- Institute of Linguistics, Shanghai International Studies University, Shanghai, 201620, China
| | - Anni Wang
- Institute of Linguistics, Shanghai International Studies University, Shanghai, 201620, China
| |
Collapse
|
4
|
Müller UWD, Gerdes ABM, Alpers GW. Setting the tone: crossmodal emotional face-voice combinations in continuous flash suppression. Front Psychol 2025; 15:1472489. [PMID: 39886372 PMCID: PMC11780550 DOI: 10.3389/fpsyg.2024.1472489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 12/11/2024] [Indexed: 02/01/2025] Open
Abstract
Emotional stimuli are preferentially processed in the visual system, in particular, fearful faces. Evidence comes from unimodal studies with emotional faces, although real-life emotional encounters typically involve input from multiple sensory channels, such as a face paired with a voice. Therefore, in this study, we investigated how emotional voices influence preferential processing of co-occurring emotional faces. To investigate early visual processing, we used the breaking continuous flash suppression paradigm (b-CFS): We presented fearful, happy, or neutral faces to one eye, which were initially inaccessible to conscious awareness due to the predominant perception of a dynamic mask presented to the other eye. Faces were presented either unimodally or paired with non-linguistic vocalizations (fearful, happy, neutral). Thirty-six healthy participants were asked to respond as soon as the faces reached conscious awareness. We replicated earlier findings that fearful faces broke suppression faster overall, supporting a threat bias. Moreover, all faces broke suppression faster when paired with voices. Interestingly, faces paired with neutral and happy voices broke suppression the fastest, followed by faces with fearful voices. Thus, in addition to supporting a threat bias in unimodally presented fearful faces, we found evidence for crossmodal facilitation.
Collapse
Affiliation(s)
| | | | - Georg W. Alpers
- Department of Psychology, School of Social Sciences, University of Mannheim, Mannheim, Germany
| |
Collapse
|
5
|
Pickron CB, Kutlu E. Toward characterization of perceptual specialization for faces in Multiracial contexts. Front Psychol 2024; 15:1392042. [PMID: 39691664 PMCID: PMC11649437 DOI: 10.3389/fpsyg.2024.1392042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 11/22/2024] [Indexed: 12/19/2024] Open
Abstract
This conceptual analysis focuses on opportunities to advance research and current hypotheses of perceptual development by examining what is presently known and unknown about perceptual specialization in a Multiracial context during the first year of life. The impact of being raised in a Multiracial family or community is discussed to further characterize the development of perceptual expertise for faces and languages. Historical and present-day challenges faced by researchers in defining what race is, identifying Multiracial individuals or contexts, and how to study perceptual and cognitive processes in this population are discussed. We propose to leverage current data from developmental Multilingual populations as a guide for future research questions and hypotheses characterizing perceptual specialization based on face race for Multiracial/Multiethnic individuals and contexts. Variability of input and the pattern of specialization are two factors identified from the developmental Multilingual literature that are likely useful for studying Multiracial contexts and development. Several methodological considerations are proposed in hopes of facilitating research questions and practices that are reflective of and informed by the diversity of experiences and social complexities within Multiracial populations.
Collapse
Affiliation(s)
- Charisse B. Pickron
- Institute of Child Development, University of Minnesota Twin Cities, Minneapolis, MN, United States
| | - Ethan Kutlu
- Department of Linguistics, University of Iowa, Iowa City, IA, United States
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, United States
| |
Collapse
|
6
|
Pickron CB, Brown AJ, Hudac CM, Scott LS. Diverse Face Images (DFI): Validated for racial representation and eye gaze. Behav Res Methods 2024; 56:8801-8819. [PMID: 39285143 DOI: 10.3758/s13428-024-02504-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2024] [Indexed: 11/01/2024]
Abstract
Face processing is a central component of human communication and social engagement. The present investigation introduces a set of racially and ethnically inclusive faces created for researchers interested in perceptual and socio-cognitive processes linked to human faces. The Diverse Face Images (DFI) stimulus set includes high-quality still images of female faces that are racially and ethnically representative, include multiple images of direct and indirect gaze for each model and control for low-level perceptual variance between images. The DFI stimuli will support researchers interested in studying face processing throughout the lifespan as well as other questions that require a diversity of faces or gazes. This report includes a detailed description of stimuli development and norming data for each model. Adults completed a questionnaire rating each image in the DFI stimuli set on three major qualities relevant to face processing: (1) strength of race/ethnicity group associations, (2) strength of eye gaze orientation, and (3) strength of emotion expression. These validation data highlight the presence of rater variability within and between individual model images as well as within and between race and ethnicity groups.
Collapse
Affiliation(s)
- Charisse B Pickron
- Institute of Child Development, University of Minnesota, Minneapolis, MN, USA.
| | - Alexia J Brown
- Department of Psychology, University of Florida, Gainesville, FL, USA
| | - Caitlin M Hudac
- Department of Psychology and Center for Autism and Neurodevelopment Research Center, University of South Carolina, Columbia, SC, USA
| | - Lisa S Scott
- Department of Psychology, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
7
|
Gao C, Oh S, Yang X, Stanley JM, Shinkareva SV. Neural Representations of Emotions in Visual, Auditory, and Modality-Independent Regions Reflect Idiosyncratic Conceptual Knowledge. Hum Brain Mapp 2024; 45:e70040. [PMID: 39394899 PMCID: PMC11470372 DOI: 10.1002/hbm.70040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Revised: 08/27/2024] [Accepted: 09/23/2024] [Indexed: 10/14/2024] Open
Abstract
Growing evidence suggests that conceptual knowledge influences emotion perception, yet the neural mechanisms underlying this effect are not fully understood. Recent studies have shown that brain representations of facial emotion categories in visual-perceptual areas are predicted by conceptual knowledge, but it remains to be seen if auditory regions are similarly affected. Moreover, it is not fully clear whether these conceptual influences operate at a modality-independent level. To address these questions, we conducted a functional magnetic resonance imaging study presenting participants with both facial and vocal emotional stimuli. This dual-modality approach allowed us to investigate effects on both modality-specific and modality-independent brain regions. Using univariate and representational similarity analyses, we found that brain representations in both visual (middle and lateral occipital cortices) and auditory (superior temporal gyrus) regions were predicted by conceptual understanding of emotions for faces and voices, respectively. Additionally, we discovered that conceptual knowledge also influenced supra-modal representations in the superior temporal sulcus. Dynamic causal modeling revealed a brain network showing both bottom-up and top-down flows, suggesting a complex interplay of modality-specific and modality-independent regions in emotional processing. These findings collectively indicate that the neural representations of emotions in both sensory-perceptual and modality-independent regions are likely shaped by each individual's conceptual knowledge.
Collapse
Affiliation(s)
- Chuanji Gao
- School of PsychologyNanjing Normal UniversityNanjingChina
| | - Sewon Oh
- Department of Psychology, Institute for Mind and BrainUniversity of South CarolinaColumbiaSouth CarolinaUSA
| | - Xuan Yang
- Department of Psychology, Institute for Mind and BrainUniversity of South CarolinaColumbiaSouth CarolinaUSA
| | - Jacob M. Stanley
- Department of Psychology, Institute for Mind and BrainUniversity of South CarolinaColumbiaSouth CarolinaUSA
| | - Svetlana V. Shinkareva
- Department of Psychology, Institute for Mind and BrainUniversity of South CarolinaColumbiaSouth CarolinaUSA
| |
Collapse
|
8
|
Lavan N, Sutherland CAM. Idiosyncratic and shared contributions shape impressions from voices and faces. Cognition 2024; 251:105881. [PMID: 39029363 DOI: 10.1016/j.cognition.2024.105881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 07/06/2024] [Accepted: 07/07/2024] [Indexed: 07/21/2024]
Abstract
Voices elicit rich first impressions of what the person we are hearing might be like. Research stresses that these impressions from voices are shared across different listeners, such that people on average agree which voices sound trustworthy or old and which do not. However, can impressions from voices also be shaped by the 'ear of the beholder'? We investigated whether - and how - listeners' idiosyncratic, personal preferences contribute to first impressions from voices. In two studies (993 participants, 156 voices), we find evidence for substantial idiosyncratic contributions to voice impressions using a variance portioning approach. Overall, idiosyncratic contributions were as important as shared contributions to impressions from voices for inferred person characteristics (e.g., trustworthiness, friendliness). Shared contributions were only more influential for impressions of more directly apparent person characteristics (e.g., gender, age). Both idiosyncratic and shared contributions were reduced when stimuli were limited in their (perceived) variability, suggesting that natural variation in voices is key to understanding this impression formation. When comparing voice impressions to face impressions, we found that idiosyncratic and shared contributions to impressions similarly across modality when stimulus properties are closely matched - although voice impressions were overall less consistent than face impressions. We thus reconceptualise impressions from voices as being formed not only based on shared but also idiosyncratic contributions. We use this new framing to suggest future directions of research, including understanding idiosyncratic mechanisms, development, and malleability of voice impression formation.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, United Kingdom.
| | - Clare A M Sutherland
- School of Psychology, King's College, University of Aberdeen, United Kingdom; School of Psychological Science, University of Western Australia, Australia
| |
Collapse
|
9
|
Hashimoto RI, Okada R, Aoki R, Nakamura M, Ohta H, Itahashi T. Functional alterations of lateral temporal cortex for processing voice prosody in adults with autism spectrum disorder. Cereb Cortex 2024; 34:bhae363. [PMID: 39270675 DOI: 10.1093/cercor/bhae363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 08/17/2024] [Accepted: 08/21/2024] [Indexed: 09/15/2024] Open
Abstract
The human auditory system includes discrete cortical patches and selective regions for processing voice information, including emotional prosody. Although behavioral evidence indicates individuals with autism spectrum disorder (ASD) have difficulties in recognizing emotional prosody, it remains understudied whether and how localized voice patches (VPs) and other voice-sensitive regions are functionally altered in processing prosody. This fMRI study investigated neural responses to prosodic voices in 25 adult males with ASD and 33 controls using voices of anger, sadness, and happiness with varying degrees of emotion. We used a functional region-of-interest analysis with an independent voice localizer to identify multiple VPs from combined ASD and control data. We observed a general response reduction to prosodic voices in specific VPs of left posterior temporal VP (TVP) and right middle TVP. Reduced cortical responses in right middle TVP were consistently correlated with the severity of autistic symptoms for all examined emotional prosodies. Moreover, representation similarity analysis revealed the reduced effect of emotional intensity in multivoxel activation patterns in left anterior superior temporal cortex only for sad prosody. These results indicate reduced response magnitudes to voice prosodies in specific TVPs and altered emotion intensity-dependent multivoxel activation patterns in adult ASDs, potentially underlying their socio-communicative difficulties.
Collapse
Affiliation(s)
- Ryu-Ichiro Hashimoto
- Medical Institute of Developmental Disabilities Research, Showa University, 6-11-11 Kita-Karasuyama, Setagaya-ku, Tokyo 157-8577, Japan
- Department of Language Sciences, Graduate School of Humanities, Tokyo Metropolitan University, 1-1 Minami-Osawa, Hachioji-shi, Tokyo 192-0397, Japan
| | - Rieko Okada
- Faculty of Intercultural Japanese Studies, Otemae University, 6-42 Ochayasho-cho, Nishinomiya-shi Hyogo 662-8552, Japan
| | - Ryuta Aoki
- Department of Language Sciences, Graduate School of Humanities, Tokyo Metropolitan University, 1-1 Minami-Osawa, Hachioji-shi, Tokyo 192-0397, Japan
- Human Brain Research Center, Graduate School of Medicine, Kyoto University, 54 Shogoin-Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan
| | - Motoaki Nakamura
- Medical Institute of Developmental Disabilities Research, Showa University, 6-11-11 Kita-Karasuyama, Setagaya-ku, Tokyo 157-8577, Japan
| | - Haruhisa Ohta
- Medical Institute of Developmental Disabilities Research, Showa University, 6-11-11 Kita-Karasuyama, Setagaya-ku, Tokyo 157-8577, Japan
| | - Takashi Itahashi
- Medical Institute of Developmental Disabilities Research, Showa University, 6-11-11 Kita-Karasuyama, Setagaya-ku, Tokyo 157-8577, Japan
| |
Collapse
|
10
|
Volfart A, Rossion B. The neuropsychological evaluation of face identity recognition. Neuropsychologia 2024; 198:108865. [PMID: 38522782 DOI: 10.1016/j.neuropsychologia.2024.108865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 03/14/2024] [Accepted: 03/18/2024] [Indexed: 03/26/2024]
Abstract
Facial identity recognition (FIR) is arguably the ultimate form of recognition for the adult human brain. Even if the term prosopagnosia is reserved for exceptionally rare brain-damaged cases with a category-specific abrupt loss of FIR at adulthood, subjective and objective impairments or difficulties of FIR are common in the neuropsychological population. Here we provide a critical overview of the evaluation of FIR both for clinicians and researchers in neuropsychology. FIR impairments occur following many causes that should be identified objectively by both general and specific, behavioral and neural examinations. We refute the commonly used dissociation between perceptual and memory deficits/tests for FIR, since even a task involving the discrimination of unfamiliar face images presented side-by-side relies on cortical memories of faces in the right-lateralized ventral occipito-temporal cortex. Another frequently encountered confusion is between specific deficits of the FIR function and a more general impairment of semantic memory (of people), the latter being most often encountered following anterior temporal lobe damage. Many computerized tests aimed at evaluating FIR have appeared over the last two decades, as reviewed here. However, despite undeniable strengths, they often suffer from ecological limitations, difficulties of instruction, as well as a lack of consideration for processing speed and qualitative information. Taking into account these issues, a recently developed behavioral test with natural images manipulating face familiarity, stimulus inversion, and correct response times as a key variable appears promising. The measurement of electroencephalographic (EEG) activity in the frequency domain from fast periodic visual stimulation also appears as a particularly promising tool to complete and enhance the neuropsychological assessment of FIR.
Collapse
Affiliation(s)
- Angélique Volfart
- School of Psychology and Counselling, Faculty of Health, Queensland University of Technology, Australia.
| | - Bruno Rossion
- Centre for Biomedical Technologies, Queensland University of Technology, Australia; Université de Lorraine, CNRS, IMoPA, F-54000, Nancy, France.
| |
Collapse
|
11
|
Gainotti G. Human Recognition: The Utilization of Face, Voice, Name and Interactions-An Extended Editorial. Brain Sci 2024; 14:345. [PMID: 38671996 PMCID: PMC11048321 DOI: 10.3390/brainsci14040345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 03/21/2024] [Indexed: 04/28/2024] Open
Abstract
The many stimulating contributions to this Special Issue of Brain Science focused on some basic issues of particular interest in current research, with emphasis on human recognition using faces, voices, and names [...].
Collapse
Affiliation(s)
- Guido Gainotti
- Institute of Neurology, Università Cattolica del Sacro Cuore, Fondazione Policlinico A. Gemelli, Istituto di Ricovero e Cura a Carattere Scientifico, 00168 Rome, Italy
| |
Collapse
|
12
|
Neuenswander KL, Gillespie GSR, Lick DJ, Bryant GA, Johnson KL. Social evaluative implications of sensory adaptation to human voices. ROYAL SOCIETY OPEN SCIENCE 2024; 11:231348. [PMID: 38544561 PMCID: PMC10966390 DOI: 10.1098/rsos.231348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 11/16/2023] [Accepted: 02/09/2024] [Indexed: 04/26/2024]
Abstract
People form social evaluations of others following brief exposure to their voices, and these impressions are calibrated based on recent perceptual experience. Participants adapted to voices with fundamental frequency (f o; the acoustic correlate of perceptual pitch) manipulated to be gender-typical (i.e. masculine men and feminine women) or gender-atypical (i.e. feminine men and masculine women) before evaluating unaltered test voices within the same sex. Adaptation resulted in contrastive aftereffects. Listening to gender-atypical voices caused female voices to sound more feminine and attractive (Study 1) and male voices to sound more masculine and attractive (Study 2). Studies 3a and 3b tested whether adaptation occurred on a conceptual or perceptual level, respectively. In Study 3a, perceivers adapted to gender-typical or gender-atypical voices for both men and women (i.e. adaptors pitch manipulated in opposite directions for men and women) before evaluating unaltered test voices. Findings showed weak evidence that evaluations differed between conditions. In Study 3b, perceivers adapted to masculinized or feminized voices for both men and women (i.e. adaptors pitch manipulated in the same direction for men and women) before evaluating unaltered test voices. In the feminized condition, participants rated male targets as more masculine and attractive. Conversely, in the masculinized condition, participants rated female targets as more feminine and attractive. Voices appear to be evaluated according to gender norms that are updated based on perceptual experience as well as conceptual knowledge.
Collapse
Affiliation(s)
| | | | | | - Gregory A. Bryant
- Department of Communication, University of California, Los Angeles, CA90095, USA
| | - Kerri L. Johnson
- Department of Communication, University of California, Los Angeles, CA90095, USA
- Department of Psychology, University of California, Los Angeles, CA, USA
| |
Collapse
|
13
|
Webster MA, Parthasarathy MK, Zuley ML, Bandos AI, Whitehead L, Abbey CK. Designing for sensory adaptation: what you see depends on what you've been looking at - Recommendations, guidelines and standards should reflect this. POLICY INSIGHTS FROM THE BEHAVIORAL AND BRAIN SCIENCES 2024; 11:43-50. [PMID: 38933347 PMCID: PMC11198979 DOI: 10.1177/23727322231220494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
Sensory systems continuously recalibrate their responses according to the current stimulus environment. As a result, perception is strongly affected by the current and recent context. These adaptative changes affect both sensitivity (e.g., habituating to noise, seeing better in the dark) and appearance (e.g. how things look, what catches attention) and adjust to many perceptual properties (e.g. from light level to the characteristics of someone's face). They therefore have a profound effect on most perceptual experiences, and on how and how well the senses work in different settings. Characterizing the properties of adaptation, how it manifests, and when it influences perception in modern environments can provide insights into the diversity of human experience. Adaptation could also be leveraged both to optimize perceptual abilities (e.g. in visual inspection tasks like radiology) and to mitigate unwanted consequences (e.g. exposure to potentially unhealthy stimulus environments).
Collapse
Affiliation(s)
- Michael A Webster
- Department of Psychology and Integrative Neuroscience Program, University of Nevada, Reno
| | | | - Margarita L Zuley
- Department of Radiology, University of Pittsburgh, School of Medicine
| | - Andriy I Bandos
- Department of Radiology, University of Pittsburgh, School of Medicine
- Department of Biostatistics, University of Pittsburgh
| | - Lorne Whitehead
- Department of Physics and Astronomy, University of British Columbia
| | - Craig K Abbey
- Department of Psychological and Brain Sciences, University of California, Santa Barbara
| |
Collapse
|
14
|
Stevenage SV, Edey R, Keay R, Morrison R, Robertson DJ. Familiarity Is Key: Exploring the Effect of Familiarity on the Face-Voice Correlation. Brain Sci 2024; 14:112. [PMID: 38391687 PMCID: PMC10887171 DOI: 10.3390/brainsci14020112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 01/15/2024] [Accepted: 01/19/2024] [Indexed: 02/24/2024] Open
Abstract
Recent research has examined the extent to which face and voice processing are associated by virtue of the fact that both tap into a common person perception system. However, existing findings do not yet fully clarify the role of familiarity in this association. Given this, two experiments are presented that examine face-voice correlations for unfamiliar stimuli (Experiment 1) and for familiar stimuli (Experiment 2). With care being taken to use tasks that avoid floor and ceiling effects and that use realistic speech-based voice clips, the results suggested a significant positive but small-sized correlation between face and voice processing when recognizing unfamiliar individuals. In contrast, the correlation when matching familiar individuals was significant and positive, but much larger. The results supported the existing literature suggesting that face and voice processing are aligned as constituents of an overarching person perception system. However, the difference in magnitude of their association here reinforced the view that familiar and unfamiliar stimuli are processed in different ways. This likely reflects the importance of a pre-existing mental representation and cross-talk within the neural architectures when processing familiar faces and voices, and yet the reliance on more superficial stimulus-based and modality-specific analysis when processing unfamiliar faces and voices.
Collapse
Affiliation(s)
- Sarah V Stevenage
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - Rebecca Edey
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - Rebecca Keay
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - Rebecca Morrison
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - David J Robertson
- Department of Psychological Sciences and Health, University of Strathclyde, Glasgow G1 1QE, UK
| |
Collapse
|
15
|
Har-Shai Yahav P, Sharaabi A, Zion Golumbic E. The effect of voice familiarity on attention to speech in a cocktail party scenario. Cereb Cortex 2024; 34:bhad475. [PMID: 38142293 DOI: 10.1093/cercor/bhad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 12/25/2023] Open
Abstract
Selective attention to one speaker in multi-talker environments can be affected by the acoustic and semantic properties of speech. One highly ecological feature of speech that has the potential to assist in selective attention is voice familiarity. Here, we tested how voice familiarity interacts with selective attention by measuring the neural speech-tracking response to both target and non-target speech in a dichotic listening "Cocktail Party" paradigm. We measured Magnetoencephalography from n = 33 participants, presented with concurrent narratives in two different voices, and instructed to pay attention to one ear ("target") and ignore the other ("non-target"). Participants were familiarized with one of the voices during the week prior to the experiment, rendering this voice familiar to them. Using multivariate speech-tracking analysis we estimated the neural responses to both stimuli and replicate their well-established modulation by selective attention. Importantly, speech-tracking was also affected by voice familiarity, showing enhanced response for target speech and reduced response for non-target speech in the contra-lateral hemisphere, when these were in a familiar vs. an unfamiliar voice. These findings offer valuable insight into how voice familiarity, and by extension, auditory-semantics, interact with goal-driven attention, and facilitate perceptual organization and speech processing in noisy environments.
Collapse
Affiliation(s)
- Paz Har-Shai Yahav
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Aviya Sharaabi
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
16
|
Liu M, Sommer W, Yue S, Li W. Dominance of face over voice in human attractiveness judgments: ERP evidence. Psychophysiology 2023; 60:e14358. [PMID: 37271749 DOI: 10.1111/psyp.14358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 04/27/2023] [Accepted: 05/12/2023] [Indexed: 06/06/2023]
Abstract
The attractiveness of a person, a complex, and socially relevant type of information, is transmitted in many ways, not least through face and voice. However, it is unclear how the stimulus domains carrying attractiveness information interact. The present study explored the audiovisual perception of attractiveness in a Stroop-like paradigm using event-related potentials (ERPs). Participants were presented with face-voice pairs carrying congruent or incongruent attractiveness information and, in turn, judged the attractiveness level of each domain while ignoring the other. Voice attractiveness judgments were influenced by unattended face attractiveness, in terms of both, early perceptual encoding (N170, P200) as well as later evaluative stages (N400, LPC). In contrast, effects of unattended voice attractiveness on face attractiveness judgments were confined to early perceptual encoding (N170). These results demonstrate not only the interaction of multiple domains in human attractiveness perception at different processing stages but also a relative dominance of face over voice attractiveness.
Collapse
Affiliation(s)
- Meng Liu
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian, China
- Key Laboratory of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian, China
| | - Werner Sommer
- Institut für Psychologie, Humboldt-Universität zu Berlin, Berlin, Germany
- Department of Psychology, Zhejiang Normal University, Jinhua, China
- Institute for Creativity, Hong Kong Baptist University, Hong Kong, China
| | - Siqi Yue
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian, China
- Key Laboratory of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian, China
| | - Weijun Li
- Research Center of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian, China
- Key Laboratory of Brain and Cognitive Neuroscience, Liaoning Normal University, Dalian, China
| |
Collapse
|
17
|
Gainotti G, Quaranta D, Luzzi S. Apperceptive and Associative Forms of Phonagnosia. Curr Neurol Neurosci Rep 2023; 23:327-333. [PMID: 37133717 PMCID: PMC10257619 DOI: 10.1007/s11910-023-01271-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/18/2023] [Indexed: 05/04/2023]
Abstract
PURPOSE OF REVIEW Pronagnosia is a rare acquired or developmental pathological condition that consists of a selective difficulty to recognize familiar people by their voices. It can be distinguished into two different categories: apperceptive phonagnosia, which denotes a purely perceptual form of voice recognition disorder; and associative phonagnosia, in which patients have no perceptual defects, but cannot evaluate if the voice of a known person is or not familiar. The neural substrate of these two forms of voice recognition is still controversial, but it could concern different components of the core temporal voice areas and of extratemporal voice processing areas. This article reviews recent research on the neuropsychological and anatomo-clinical aspects of this condition. RECENT FINDINGS Data obtained in group studies or single case reports of phonagnosic patients suggest that apperceptive phonagnosia might be due to disruption of the core temporal voice areas, bilaterally located in the posterior parts of the superior temporal gyrus, whereas associative phonagnosia might result from impaired access to structures where voice representations are stored, due to a disconnection of these areas from structures of the voice extended system. Although these results must be confirmed by further investigations, they represent an important step toward understanding the nature and neural substrate of apperceptive and associative forms of phonagnosia.
Collapse
Affiliation(s)
- Guido Gainotti
- Institute of Neurology, Catholic University of the Sacred Heart, Largo A. Gemell, 8, 00168, Rome, Italy.
| | - Davide Quaranta
- Neurology Unit, Department of Science of Elderly, Neuroscience, Head and Neck and Orthopaedics, Fondazione Policlinico A. Gemelli, IRCCS, Rome, Italy
| | - Simona Luzzi
- Department of Experimental and Clinical Medicine, Polytechnic University of Marche, Ancona, Italy
| |
Collapse
|
18
|
Pang W, Zhou W, Ruan Y, Zhang L, Shu H, Zhang Y, Zhang Y. Visual Deprivation Alters Functional Connectivity of Neural Networks for Voice Recognition: A Resting-State fMRI Study. Brain Sci 2023; 13:brainsci13040636. [PMID: 37190601 DOI: 10.3390/brainsci13040636] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Revised: 03/29/2023] [Accepted: 04/04/2023] [Indexed: 05/17/2023] Open
Abstract
Humans recognize one another by identifying their voices and faces. For sighted people, the integration of voice and face signals in corresponding brain networks plays an important role in facilitating the process. However, individuals with vision loss primarily resort to voice cues to recognize a person's identity. It remains unclear how the neural systems for voice recognition reorganize in the blind. In the present study, we collected behavioral and resting-state fMRI data from 20 early blind (5 females; mean age = 22.6 years) and 22 sighted control (7 females; mean age = 23.7 years) individuals. We aimed to investigate the alterations in the resting-state functional connectivity (FC) among the voice- and face-sensitive areas in blind subjects in comparison with controls. We found that the intranetwork connections among voice-sensitive areas, including amygdala-posterior "temporal voice areas" (TVAp), amygdala-anterior "temporal voice areas" (TVAa), and amygdala-inferior frontal gyrus (IFG) were enhanced in the early blind. The blind group also showed increased FCs of "fusiform face area" (FFA)-IFG and "occipital face area" (OFA)-IFG but decreased FCs between the face-sensitive areas (i.e., FFA and OFA) and TVAa. Moreover, the voice-recognition accuracy was positively related to the strength of TVAp-FFA in the sighted, and the strength of amygdala-FFA in the blind. These findings indicate that visual deprivation shapes functional connectivity by increasing the intranetwork connections among voice-sensitive areas while decreasing the internetwork connections between the voice- and face-sensitive areas. Moreover, the face-sensitive areas are still involved in the voice-recognition process in blind individuals through pathways such as the subcortical-occipital or occipitofrontal connections, which may benefit the visually impaired greatly during voice processing.
Collapse
Affiliation(s)
- Wenbin Pang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
- China National Clinical Research Center for Neurological Diseases, Beijing 100070, China
| | - Wei Zhou
- Beijing Key Lab of Learning and Cognition, School of Psychology, Capital Normal University, Beijing 100048, China
| | - Yufang Ruan
- School of Communication Sciences and Disorders, Faculty of Medicine and Health Sciences, McGill University, Montréal, QC H3A 1G1, Canada
- Centre for Research on Brain, Language and Music, Montréal, QC H3A 1G1, Canada
| | - Linjun Zhang
- School of Chinese as a Second Language, Peking University, Beijing 100871, China
| | - Hua Shu
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China
| | - Yang Zhang
- Department of Speech-Language-Hearing Sciences and Center for Neurobehavioral Development, The University of Minnesota, Minneapolis, MN 55455, USA
| | - Yumei Zhang
- China National Clinical Research Center for Neurological Diseases, Beijing 100070, China
- Department of Rehabilitation, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
| |
Collapse
|
19
|
Wen F, Gao J, Ke W, Zuo B, Dai Y, Ju Y, Long J. The Effect of Face-Voice Gender Consistency on Impression Evaluation. ARCHIVES OF SEXUAL BEHAVIOR 2023; 52:1123-1139. [PMID: 36719490 DOI: 10.1007/s10508-022-02524-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 12/24/2022] [Accepted: 12/25/2022] [Indexed: 06/18/2023]
Abstract
Face and voice are important information cues of interpersonal interaction. Most previous studies have investigated the cross-modal perception of face and voice from the perspective of cognitive psychology, but few empirical studies have focused on the effect of gender consistency of face and voice on the impression evaluation of the target from the perspective of social cognition. Based on the two-stage model of stereotype activation and the stereotype content model, this research examined the effects of face-voice gender consistency on impression evaluation (gender categorization and warmth competence evaluation) by using a cross-modal priming paradigm (Study 1, 20 males and 23 females, Mage = 21.00, SDage = 2.59), a sequential presentation task (Study 2a, 57 males and 70 females, Mage = 18.54, SDage = 1.54; Study 2b, 52 males and 51 females, Mage = 18.54, SDage = 1.36), and a simultaneous presentation task (Study 3, 51 males and 55 females, Mage = 23.58, SDage = 3.20), respectively. The results showed that: (1) there was a face-voice gender consistency preference in gender categorization, and the response of face-voice consistent condition was faster than that of inconsistent condition; (2) compared with the face-voice gender-inconsistent individuals, the participants showed a higher and more stable evaluation of the warmth and competence of the gender-consistent individuals, indicating the effect of matching preference of the face-voice gender consistency on the impression evaluation; (3) people paid more attention to the gender information of faces in the impression evaluation, and the female face could improve people's evaluation on the target's warmth and competence; (4) males were more intolerant of face-voice gender inconsistency when presented sequentially; the "voice needs to match face" effect was stronger for females when presented simultaneously. These findings, on the one hand, enrich and expand previous theories and research on cross-modal processing of face and voice from the perspective of social cognitive impression evaluation; on the other hand, these findings have important practical implications for impression management and decision-making in social interaction.
Collapse
Affiliation(s)
- Fangfang Wen
- School of Psychology, Center for Studies of Social Psychology, Central China Normal University, Wuhan, 430079, China
| | - Jia Gao
- School of Psychology, Center for Studies of Social Psychology, Central China Normal University, Wuhan, 430079, China
| | - Wenlin Ke
- School of Psychology, Center for Studies of Social Psychology, Central China Normal University, Wuhan, 430079, China
| | - Bin Zuo
- School of Psychology, Center for Studies of Social Psychology, Central China Normal University, Wuhan, 430079, China.
| | - Yu Dai
- School of Psychology, Center for Studies of Social Psychology, Central China Normal University, Wuhan, 430079, China
| | - Yiyan Ju
- School of Psychology, Center for Studies of Social Psychology, Central China Normal University, Wuhan, 430079, China
| | - Jiahui Long
- School of Psychology, Center for Studies of Social Psychology, Central China Normal University, Wuhan, 430079, China
| |
Collapse
|
20
|
Lima MF, Sarudiansky M, Oddo S, Giagante B, Kochen S, D'Alessio L. Comorbid psychosis in temporal lobe epilepsy is associated with auditory emotion recognition impairments. Schizophr Res 2023; 254:8-10. [PMID: 36736101 DOI: 10.1016/j.schres.2023.01.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 01/05/2023] [Accepted: 01/15/2023] [Indexed: 02/04/2023]
Affiliation(s)
- Mónica Fernández Lima
- Studies of Neurosciences and Complex Systems (ENyS), National Scientific and Technical Research Council (CONICET), Florencio Varela, Argentina; Buenos Aires University, Institute of Cellular Biology and Neurosciences E de Robertis (IBCN-CONICET), Buenos Aires, Argentina; Buenos Aires University, Epilepsy Center, Ramos Mejía Hospital, Buenos Aires city, Argentina
| | | | - Silvia Oddo
- Studies of Neurosciences and Complex Systems (ENyS), National Scientific and Technical Research Council (CONICET), Florencio Varela, Argentina; Buenos Aires University, Epilepsy Center, Ramos Mejía Hospital, Buenos Aires city, Argentina
| | - Brenda Giagante
- Studies of Neurosciences and Complex Systems (ENyS), National Scientific and Technical Research Council (CONICET), Florencio Varela, Argentina
| | - Silvia Kochen
- Studies of Neurosciences and Complex Systems (ENyS), National Scientific and Technical Research Council (CONICET), Florencio Varela, Argentina; Buenos Aires University, Epilepsy Center, Ramos Mejía Hospital, Buenos Aires city, Argentina
| | - Luciana D'Alessio
- Buenos Aires University, Institute of Cellular Biology and Neurosciences E de Robertis (IBCN-CONICET), Buenos Aires, Argentina; Buenos Aires University, Epilepsy Center, Ramos Mejía Hospital, Buenos Aires city, Argentina.
| |
Collapse
|
21
|
Human voices escape the auditory attentional blink: Evidence from detections and pupil responses. Brain Cogn 2023; 165:105928. [PMID: 36459865 DOI: 10.1016/j.bandc.2022.105928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/30/2022] [Accepted: 11/03/2022] [Indexed: 11/30/2022]
Abstract
Attentional selection of a second target in a rapid stream of stimuli embedding two targets tends to be briefly impaired when two targets are presented in close temporal proximity, an effect known as an attentional blink (AB). Two target sounds (T1 and T2) were embedded in a rapid serial auditory presentation of environmental sounds with a short (Lag 3) or long lag (Lag 9). Participants were to first identify T1 (bell or sine tone) and then to detect T2 (present or absent). Individual stimuli had durations of either 30 or 90 ms, and were presented in streams of 20 sounds. The T2 varied in category: human voice, cello, or dog sound. Previous research has introduced pupillometry as a useful marker of the intensity of cognitive processing and attentional allocation in the visual AB paradigm. Results suggest that the interplay of stimulus factors is critical for target detection accuracy and provides support for the hypothesis that the human voice is the least likely to show an auditory AB (in the 90 ms condition). For the other stimuli, accuracy for T2 was significantly worse at Lag 3 than at Lag 9 in the 90 ms condition, suggesting the presence of an auditory AB. When AB occurred (at Lag 3), we observed smaller pupil dilations, time-locked to the onset of T2, compared to Lag 9, reflecting lower attentional processing when 'blinking' during target detection. Taken together, these findings support the conclusion that human voices escape the AB and that the pupillary changes are consistent with the so-called T2 attentional deficit. In addition, we found some indication that salient stimuli like human voices could require a less intense allocation of attention, or noradrenergic potentiation, compared to other auditory stimuli.
Collapse
|
22
|
Lavan N. How do we describe other people from voices and faces? Cognition 2023; 230:105253. [PMID: 36215763 DOI: 10.1016/j.cognition.2022.105253] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 07/29/2022] [Accepted: 08/06/2022] [Indexed: 11/07/2022]
Abstract
When seeing someone's face or hearing their voice, perceivers routinely infer information about a person's age, sex and social traits. While many experiments have explored how individual person characteristics are perceived in isolation, less is known about which person characteristics are described spontaneously from voices and faces and how descriptions may differ across modalities. In Experiment 1, participants provided free descriptions for voices and faces. These free descriptions followed similar patterns for voices and faces - and for individual identities: Participants spontaneously referred to a wide range of descriptors. Psychological descriptors, such as character traits, were used most frequently; physical characteristics, such as age and sex, were notable as they were mentioned earlier than other types of descriptors. After finding primarily similarities between modalities when analysing person descriptions across identities, Experiment 2 asked whether free descriptions encode how individual identities differ. For this purpose, the measures derived from the free descriptions were linked to voice/face discrimination judgements that are known to describe differences in perceptual properties between identity pairs. Significant relationships emerged within and across modalities, showing that free descriptions indeed encode differences between identities - information that is shared with discrimination judgements. This suggests that the two tasks tap into similar, high-level person representations. These findings show that free description data can offer valuable insights into person perception and underline that person perception is a multivariate process during which perceivers rapidly and spontaneously infer many different person characteristics to form a holistic impression of a person.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, United Kingdom.
| |
Collapse
|
23
|
Akça M, Vuoskoski JK, Laeng B, Bishop L. Recognition of brief sounds in rapid serial auditory presentation. PLoS One 2023; 18:e0284396. [PMID: 37053212 PMCID: PMC10101377 DOI: 10.1371/journal.pone.0284396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Accepted: 03/30/2023] [Indexed: 04/14/2023] Open
Abstract
Two experiments were conducted to test the role of participant factors (i.e., musical sophistication, working memory capacity) and stimulus factors (i.e., sound duration, timbre) on auditory recognition using a rapid serial auditory presentation paradigm. Participants listened to a rapid stream of very brief sounds ranging from 30 to 150 milliseconds and were tested on their ability to distinguish the presence from the absence of a target sound selected from various sound sources placed amongst the distracters. Experiment 1a established that brief exposure to stimuli (60 to 150 milliseconds) does not necessarily correspond to impaired recognition. In Experiment 1b we found evidence that 30 milliseconds of exposure to the stimuli significantly impairs recognition of single auditory targets, but the recognition for voice and sine tone targets impaired the least, suggesting that the lower limit required for successful recognition could be lower than 30 milliseconds for voice and sine tone targets. Critically, the effect of sound duration on recognition completely disappeared when differences in musical sophistication were controlled for. Participants' working memory capacities did not seem to predict their recognition performances. Our behavioral results extend the studies oriented to understand the processing of brief timbres under temporal constraint by suggesting that the musical sophistication may play a larger role than previously thought. These results can also provide a working hypothesis for future research, namely, that underlying neural mechanisms for the processing of various sound sources may have different temporal constraints.
Collapse
Affiliation(s)
- Merve Akça
- RITMO Center for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Musicology, University of Oslo, Oslo, Norway
| | - Jonna Katariina Vuoskoski
- RITMO Center for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Musicology, University of Oslo, Oslo, Norway
- Department of Psychology, University of Oslo, Oslo, Norway
| | - Bruno Laeng
- RITMO Center for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Psychology, University of Oslo, Oslo, Norway
| | - Laura Bishop
- RITMO Center for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Musicology, University of Oslo, Oslo, Norway
| |
Collapse
|
24
|
Fransson S, Corrow S, Yeung S, Schaefer H, Barton JJS. Effects of Faces and Voices on the Encoding of Biographic Information. Brain Sci 2022; 12:brainsci12121716. [PMID: 36552175 PMCID: PMC9775626 DOI: 10.3390/brainsci12121716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 12/10/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022] Open
Abstract
There are multiple forms of knowledge about people. Whether diverse person-related data interact is of interest regarding the more general issue of integration of multi-source information about the world. Our goal was to examine whether perception of a person's face or voice enhanced the encoding of their biographic data. We performed three experiments. In the first experiment, subjects learned the biographic data of a character with or without a video clip of their face. In the second experiment, they learned the character's data with an audio clip of either a generic narrator's voice or the character's voice relating the same biographic information. In the third experiment, an audiovisual clip of both the face and voice of either a generic narrator or the character accompanied the learning of biographic data. After learning, a test phase presented biographic data alone, and subjects were tested first for familiarity and second for matching of biographic data to the name. The results showed equivalent learning of biographic data across all three experiments, and none showed evidence that a character's face or voice enhanced the learning of biographic information. We conclude that the simultaneous processing of perceptual representations of people may not modulate the encoding of biographic data.
Collapse
Affiliation(s)
- Sarah Fransson
- Faculty of Medicine, Linköping University, 581 83 Linköping, Sweden
| | - Sherryse Corrow
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vanacouver, BC V5Z 3N9, Canada
- Department of Psychology, Bethel University, St. Paul, MN 55112, USA
| | - Shanna Yeung
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vanacouver, BC V5Z 3N9, Canada
| | - Heidi Schaefer
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vanacouver, BC V5Z 3N9, Canada
| | - Jason J. S. Barton
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vanacouver, BC V5Z 3N9, Canada
- Correspondence: ; Tel.: +1-604-875-4339; Fax: +1-604-875-4302
| |
Collapse
|
25
|
Zhang Y, Zhou W, Huang J, Hong B, Wang X. Neural correlates of perceived emotions in human insula and amygdala for auditory emotion recognition. Neuroimage 2022; 260:119502. [PMID: 35878727 DOI: 10.1016/j.neuroimage.2022.119502] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 06/21/2022] [Accepted: 07/21/2022] [Indexed: 11/28/2022] Open
Abstract
The emotional status of a speaker is an important non-linguistic cue carried by human voice and can be perceived by a listener in vocal communication. Understanding the neural circuits involved in processing emotions carried by human voice is crucial for understanding the neural basis of social interaction. Previous studies have shown that human insula and amygdala responded more selectively to emotional sounds than non-emotional sounds. However, it is not clear whether the neural selectivity to emotional sounds in these brain structures is determined by the emotion presented by a speaker which is associated with the acoustic properties of the sounds or by the emotion perceived by a listener. In this study, we recorded intracranial electroencephalography (iEEG) responses to emotional human voices while subjects performed emotion recognition tasks. We found that the iEEG responses of Heschl's gyrus (HG) and posterior insula were determined by the presented emotion, whereas the iEEG responses of anterior insula and amygdala were driven by the perceived emotion. These results suggest that the anterior insula and amygdala play a crucial role in conscious perception of emotions carried by human voice.
Collapse
Affiliation(s)
- Yang Zhang
- Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Biomedical Engineering, Tsinghua University, Beijing 100084, PR China; Department of Biomedical Engineering, the Johns Hopkins University, Baltimore, MD 21205, United States.
| | - Wenjing Zhou
- Department of Epilepsy Center, Tsinghua University Yuquan Hospital, Beijing 100040, PR China
| | - Juan Huang
- Department of Biomedical Engineering, the Johns Hopkins University, Baltimore, MD 21205, United States
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Biomedical Engineering, Tsinghua University, Beijing 100084, PR China.
| | - Xiaoqin Wang
- Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Biomedical Engineering, Tsinghua University, Beijing 100084, PR China; Department of Biomedical Engineering, the Johns Hopkins University, Baltimore, MD 21205, United States.
| |
Collapse
|
26
|
Look Who is Talking. Identities and Expressions in the Prefrontal Cortex. Neuroscience 2022; 496:241-242. [PMID: 35750112 DOI: 10.1016/j.neuroscience.2022.06.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 06/15/2022] [Indexed: 11/21/2022]
|
27
|
Di Dona G, Scaltritti M, Sulpizio S. Formant-invariant voice and pitch representations are pre-attentively formed from constantly varying speech and non-speech stimuli. Eur J Neurosci 2022; 56:4086-4106. [PMID: 35673798 PMCID: PMC9545905 DOI: 10.1111/ejn.15730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 05/23/2022] [Accepted: 05/24/2022] [Indexed: 11/30/2022]
Abstract
The present study investigated whether listeners can form abstract voice representations while ignoring constantly changing phonological information and if they can use the resulting information to facilitate voice change detection. Further, the study aimed at understanding whether the use of abstraction is restricted to the speech domain or can be deployed also in non‐speech contexts. We ran an electroencephalogram (EEG) experiment including one passive and one active oddball task, each featuring a speech and a rotated speech condition. In the speech condition, participants heard constantly changing vowels uttered by a male speaker (standard stimuli) which were infrequently replaced by vowels uttered by a female speaker with higher pitch (deviant stimuli). In the rotated speech condition, participants heard rotated vowels, in which the natural formant structure of speech was disrupted. In the passive task, the mismatch negativity was elicited after the presentation of the deviant voice in both conditions, indicating that listeners could successfully group together different stimuli into a formant‐invariant voice representation. In the active task, participants showed shorter reaction times (RTs), higher accuracy and a larger P3b in the speech condition with respect to the rotated speech condition. Results showed that whereas at a pre‐attentive level the cognitive system can track pitch regularities while presumably ignoring constantly changing formant information both in speech and in rotated speech, at an attentive level the use of such information is facilitated for speech. This facilitation was also testified by a stronger synchronisation in the theta band (4–7 Hz), potentially pointing towards differences in encoding/retrieval processes.
Collapse
Affiliation(s)
- Giuseppe Di Dona
- Dipartimento di Psicologia e Scienze Cognitive, Università degli Studi di Trento, Trento, Italy
| | - Michele Scaltritti
- Dipartimento di Psicologia e Scienze Cognitive, Università degli Studi di Trento, Trento, Italy
| | - Simone Sulpizio
- Dipartimento di Psicologia, Università degli Studi di Milano-Bicocca, Milano, Italy.,Milan Center for Neuroscience (NeuroMi), Università degli Studi di Milano-Bicocca, Milano, Italy
| |
Collapse
|
28
|
Rossion B. Twenty years of investigation with the case of prosopagnosia PS to understand human face identity recognition. Part I: Function. Neuropsychologia 2022; 173:108278. [DOI: 10.1016/j.neuropsychologia.2022.108278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 03/28/2022] [Accepted: 05/25/2022] [Indexed: 10/18/2022]
|
29
|
Lee Y, Kreiman J. Acoustic voice variation in spontaneous speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:3462. [PMID: 35649890 PMCID: PMC9135459 DOI: 10.1121/10.0011471] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
This study replicates and extends the recent findings of Lee, Keating, and Kreiman [J. Acoust. Soc. Am. 146(3), 1568-1579 (2019)] on acoustic voice variation in read speech, which showed remarkably similar acoustic voice spaces for groups of female and male talkers and the individual talkers within these groups. Principal component analysis was applied to acoustic indices of voice quality measured from phone conversations for 99/100 of the same talkers studied previously. The acoustic voice spaces derived from spontaneous speech are highly similar to those based on read speech, except that unlike read speech, variability in fundamental frequency accounted for significant acoustic variability. Implications of these findings for prototype models of speaker recognition and discrimination are considered.
Collapse
Affiliation(s)
- Yoonjeong Lee
- Department of Head and Neck Surgery, David Geffen School of Medicine at UCLA, Los Angeles, California 90095-1794, USA
| | - Jody Kreiman
- Department of Head and Neck Surgery, David Geffen School of Medicine at UCLA, Los Angeles, California 90095-1794, USA
| |
Collapse
|
30
|
Diehl MM, Plakke B, Albuquerque E, Romanski LM. Representation of expression and identity by ventral prefrontal neurons. Neuroscience 2022; 496:243-260. [PMID: 35654293 PMCID: PMC10363293 DOI: 10.1016/j.neuroscience.2022.05.033] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 05/20/2022] [Accepted: 05/25/2022] [Indexed: 01/26/2023]
Abstract
Evidence has suggested that the ventrolateral prefrontal cortex (VLPFC) processes social stimuli, including faces and vocalizations, which are essential for communication. Features embedded within audiovisual stimuli, including emotional expression and caller identity, provide abundant information about an individual's intention, emotional state, motivation, and social status, which are important to encode in a social exchange. However, it is unknown to what extent the VLPFC encodes such features. To investigate the role of VLPFC during social communication, we recorded single-unit activity while rhesus macaques (Macaca mulatta) performed a nonmatch-to-sample task using species-specific face-vocalization stimuli that differed in emotional expression or caller identity. 75% of recorded cells were task-related and of these >70% were responsive during the nonmatch period. A larger proportion of nonmatch cells encoded the stimulus rather than the context of the trial type. A subset of responsive neurons were most commonly modulated by the identity of the nonmatch stimulus and less by the emotional expression, or both features within the face-vocalization stimuli presented during the nonmatch period. Neurons encoding identity were found in VLPFC across a broader region than expression related cells which were confined to only the anterolateral portion of the recording chamber in VLPFC. These findings suggest that, within a working memory paradigm, VLPFC processes features of face and vocal stimuli, such as emotional expression and identity, in addition to task and contextual information. Thus, stimulus and contextual information may be integrated by VLPFC during social communication.
Collapse
|
31
|
Staib M, Frühholz S. Distinct functional levels of human voice processing in the auditory cortex. Cereb Cortex 2022; 33:1170-1185. [PMID: 35348635 PMCID: PMC9930621 DOI: 10.1093/cercor/bhac128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 02/03/2022] [Accepted: 03/07/2022] [Indexed: 11/12/2022] Open
Abstract
Voice signaling is integral to human communication, and a cortical voice area seemed to support the discrimination of voices from other auditory objects. This large cortical voice area in the auditory cortex (AC) was suggested to process voices selectively, but its functional differentiation remained elusive. We used neuroimaging while humans processed voices and nonvoice sounds, and artificial sounds that mimicked certain voice sound features. First and surprisingly, specific auditory cortical voice processing beyond basic acoustic sound analyses is only supported by a very small portion of the originally described voice area in higher-order AC located centrally in superior Te3. Second, besides this core voice processing area, large parts of the remaining voice area in low- and higher-order AC only accessorily process voices and might primarily pick up nonspecific psychoacoustic differences between voices and nonvoices. Third, a specific subfield of low-order AC seems to specifically decode acoustic sound features that are relevant but not exclusive for voice detection. Taken together, the previously defined voice area might have been overestimated since cortical support for human voice processing seems rather restricted. Cortical voice processing also seems to be functionally more diverse and embedded in broader functional principles of the human auditory system.
Collapse
Affiliation(s)
- Matthias Staib
- Cognitive and Affective Neuroscience Unit, University of Zurich, 8050 Zurich, Switzerland
| | - Sascha Frühholz
- Corresponding author: Department of Psychology, University of Zürich, Binzmuhlestrasse 14/18, 8050 Zürich, Switzerland.
| |
Collapse
|
32
|
Morningstar M, Mattson WI, Nelson EE. Longitudinal Change in Neural Response to Vocal Emotion in Adolescence. Soc Cogn Affect Neurosci 2022; 17:890-903. [PMID: 35323933 PMCID: PMC9527472 DOI: 10.1093/scan/nsac021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 02/25/2022] [Accepted: 03/21/2022] [Indexed: 01/09/2023] Open
Abstract
Adolescence is associated with maturation of function within neural networks supporting the processing of social information. Previous longitudinal studies have established developmental influences on youth’s neural response to facial displays of emotion. Given the increasing recognition of the importance of non-facial cues to social communication, we build on existing work by examining longitudinal change in neural response to vocal expressions of emotion in 8- to 19-year-old youth. Participants completed a vocal emotion recognition task at two timepoints (1 year apart) while undergoing functional magnetic resonance imaging. The right inferior frontal gyrus, right dorsal striatum and right precentral gyrus showed decreases in activation to emotional voices across timepoints, which may reflect focalization of response in these areas. Activation in the dorsomedial prefrontal cortex was positively associated with age but was stable across timepoints. In addition, the slope of change across visits varied as a function of participants’ age in the right temporo-parietal junction (TPJ): this pattern of activation across timepoints and age may reflect ongoing specialization of function across childhood and adolescence. Decreased activation in the striatum and TPJ across timepoints was associated with better emotion recognition accuracy. Findings suggest that specialization of function in social cognitive networks may support the growth of vocal emotion recognition skills across adolescence.
Collapse
Affiliation(s)
- Michele Morningstar
- Correspondence should be addressed to Michele Morningstar, Department of Psychology, Queen’s University, 62 Arch Street, Kingston, ON K7L 3L3, Canada. E-mail:
| | - Whitney I Mattson
- Center for Biobehavioral Health, Nationwide Children’s Hospital, Columbus, OH 43205, USA
| | - Eric E Nelson
- Center for Biobehavioral Health, Nationwide Children’s Hospital, Columbus, OH 43205, USA
- Department of Pediatrics, The Ohio State University, Columbus, OH 43205, USA
| |
Collapse
|
33
|
Pell MD, Sethi S, Rigoulot S, Rothermich K, Liu P, Jiang X. Emotional voices modulate perception and predictions about an upcoming face. Cortex 2022; 149:148-164. [DOI: 10.1016/j.cortex.2021.12.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 09/15/2021] [Accepted: 01/05/2022] [Indexed: 11/26/2022]
|
34
|
Wu Q, Liu Y, Li D, Leng H, Iqbal Z, Jiang Z. Understanding one's character through the voice: Dimensions of personality perception from Chinese greeting word "Ni Hao". The Journal of Social Psychology 2021; 161:653-663. [PMID: 33413047 DOI: 10.1080/00224545.2020.1856026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Previous western studies revealed a two-dimensional model (valence and dominance) in voice impressions. To explore the cross-cultural validity of this model, the present study recruited Chinese participants to evaluate other people's personality from recordings of Chinese vocal greeting word "Ni Hao". Principal Component Analysis (PCA) with Varimax Rotation and Parallel Analysis was used to investigate the dimensions underlying personality judgments. The results also revealed a two-dimensional model: approachability and capability. The approachability dimension was similar to the valence dimension reported in a previous study. It indicated that the approachability/valence dimension has cross-cultural commonality. Unlike the dimension of dominance which was closely related to aggressiveness, the dimension of capability emphasized the social aspects of capability such as intellectuality, social skills, and tenacity. In addition, the acoustic parameters that were used to infer the personality of speakers, as well as the relationship between vocal attractiveness and the personality dimensions of voice, were also partially different from the findings in Western culture.
Collapse
Affiliation(s)
- Qi Wu
- Liaoning Normal University
| | | | | | | | | | | |
Collapse
|
35
|
Lowe MX, Mohsenzadeh Y, Lahner B, Charest I, Oliva A, Teng S. Cochlea to categories: The spatiotemporal dynamics of semantic auditory representations. Cogn Neuropsychol 2021; 38:468-489. [PMID: 35729704 PMCID: PMC10589059 DOI: 10.1080/02643294.2022.2085085] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 03/31/2022] [Accepted: 05/25/2022] [Indexed: 10/17/2022]
Abstract
How does the auditory system categorize natural sounds? Here we apply multimodal neuroimaging to illustrate the progression from acoustic to semantically dominated representations. Combining magnetoencephalographic (MEG) and functional magnetic resonance imaging (fMRI) scans of observers listening to naturalistic sounds, we found superior temporal responses beginning ∼55 ms post-stimulus onset, spreading to extratemporal cortices by ∼100 ms. Early regions were distinguished less by onset/peak latency than by functional properties and overall temporal response profiles. Early acoustically-dominated representations trended systematically toward category dominance over time (after ∼200 ms) and space (beyond primary cortex). Semantic category representation was spatially specific: Vocalizations were preferentially distinguished in frontotemporal voice-selective regions and the fusiform; scenes and objects were distinguished in parahippocampal and medial place areas. Our results are consistent with real-world events coded via an extended auditory processing hierarchy, in which acoustic representations rapidly enter multiple streams specialized by category, including areas typically considered visual cortex.
Collapse
Affiliation(s)
- Matthew X. Lowe
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- Unlimited Sciences, Colorado Springs, CO
| | - Yalda Mohsenzadeh
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- The Brain and Mind Institute, The University of Western Ontario, London, ON, Canada
- Department of Computer Science, The University of Western Ontario, London, ON, Canada
| | - Benjamin Lahner
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
| | - Ian Charest
- Département de Psychologie, Université de Montréal, Montréal, Québec, Canada
- Center for Human Brain Health, University of Birmingham, UK
| | - Aude Oliva
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
| | - Santani Teng
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- Smith-Kettlewell Eye Research Institute (SKERI), San Francisco, CA
| |
Collapse
|
36
|
Lavan N, Collins MRN, Miah JFM. Audiovisual identity perception from naturally-varying stimuli is driven by visual information. Br J Psychol 2021; 113:248-263. [PMID: 34490897 DOI: 10.1111/bjop.12531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/19/2021] [Indexed: 11/30/2022]
Abstract
Identity perception often takes place in multimodal settings, where perceivers have access to both visual (face) and auditory (voice) information. Despite this, identity perception is usually studied in unimodal contexts, where face and voice identity perception are modelled independently from one another. In this study, we asked whether and how much auditory and visual information contribute to audiovisual identity perception from naturally-varying stimuli. In a between-subjects design, participants completed an identity sorting task with either dynamic video-only, audio-only or dynamic audiovisual stimuli. In this task, participants were asked to sort multiple, naturally-varying stimuli from three different people by perceived identity. We found that identity perception was more accurate for video-only and audiovisual stimuli compared with audio-only stimuli. Interestingly, there was no difference in accuracy between video-only and audiovisual stimuli. Auditory information nonetheless played a role alongside visual information as audiovisual identity judgements per stimulus could be predicted from both auditory and visual identity judgements, respectively. While the relationship was stronger for visual information and audiovisual information, auditory information still uniquely explained a significant portion of the variance in audiovisual identity judgements. Our findings thus align with previous theoretical and empirical work that proposes that, compared with faces, voices are an important but relatively less salient and a weaker cue to identity perception. We expand on this work to show that, at least in the context of this study, having access to voices in addition to faces does not result in better identity perception accuracy.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| | - Madeleine Rose Niamh Collins
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| | - Jannatul Firdaus Monisha Miah
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| |
Collapse
|
37
|
Unimodal and cross-modal identity judgements using an audio-visual sorting task: Evidence for independent processing of faces and voices. Mem Cognit 2021; 50:216-231. [PMID: 34254274 PMCID: PMC8763756 DOI: 10.3758/s13421-021-01198-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/02/2021] [Indexed: 11/18/2022]
Abstract
Unimodal and cross-modal information provided by faces and voices contribute to identity percepts. To examine how these sources of information interact, we devised a novel audio-visual sorting task in which participants were required to group video-only and audio-only clips into two identities. In a series of three experiments, we show that unimodal face and voice sorting were more accurate than cross-modal sorting: While face sorting was consistently most accurate followed by voice sorting, cross-modal sorting was at chancel level or below. In Experiment 1, we compared performance in our novel audio-visual sorting task to a traditional identity matching task, showing that unimodal and cross-modal identity perception were overall moderately more accurate than the traditional identity matching task. In Experiment 2, separating unimodal from cross-modal sorting led to small improvements in accuracy for unimodal sorting, but no change in cross-modal sorting performance. In Experiment 3, we explored the effect of minimal audio-visual training: Participants were shown a clip of the two identities in conversation prior to completing the sorting task. This led to small, nonsignificant improvements in accuracy for unimodal and cross-modal sorting. Our results indicate that unfamiliar face and voice perception operate relatively independently with no evidence of mutual benefit, suggesting that extracting reliable cross-modal identity information is challenging.
Collapse
|
38
|
Papagno C, Pisoni A, Gainotti G. False alarms during recognition of famous people from faces and voices in patients with unilateral temporal lobe resection and normal participants tested after anodal tDCS over the left or right ATL. Neuropsychologia 2021; 159:107926. [PMID: 34216595 DOI: 10.1016/j.neuropsychologia.2021.107926] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 06/04/2021] [Accepted: 06/23/2021] [Indexed: 10/21/2022]
Abstract
Data gathered in the field of the experimental social psychology have shown that it is more difficult to recognize a person through his/her voice than through his/her face and that false alarms (FA) are produced more in voice than in face recognition. Furthermore, some neuropsychological investigations have suggested that in patients with damage to the right anterior temporal lobe (ATL) the number of FA could be higher for voice than for face recognition. In the present study we assessed FA during recognition of famous people from faces and voices in patients with unilateral ATL tumours and in normal participants tested after anodal transcranial direct current stimulation (tCDS), over the left or right ATL. The number of FA was significantly higher in patients with right than in those with left temporal tumours on both face and voice familiarity. Furthermore, lesion side did not differentially affect patient's sensitivity or response criterion when recognizing famous faces, but influenced both these measures on a voice recognition task. In fact, in this condition patients with right temporal tumours showed a lower sensitivity index and a lower response criterion than those with left-sided lesions. In normal subjects, the greater right sided involvement in voice than in face processing was confirmed by the observation that right ATL anodal stimulation significantly increased voice but only marginally influenced face sensitivity. This asymmetry between face and voice processing in the right hemisphere could be due to the greater complexity of voice processing and to the difficulty of forming stable and well-structured representations, allowing to evaluate if a presented voice matches or not with an already known voice.
Collapse
Affiliation(s)
- C Papagno
- CIMeC, Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy; Department of Psychology, University of Milano-Bicocca, Milano, Italy.
| | - A Pisoni
- Department of Psychology, University of Milano-Bicocca, Milano, Italy
| | - G Gainotti
- Catholic University, Policlinico Gemelli, Roma, Italy
| |
Collapse
|
39
|
Auditory cortical micro-networks show differential connectivity during voice and speech processing in humans. Commun Biol 2021; 4:801. [PMID: 34172824 PMCID: PMC8233416 DOI: 10.1038/s42003-021-02328-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 06/09/2021] [Indexed: 02/05/2023] Open
Abstract
The temporal voice areas (TVAs) in bilateral auditory cortex (AC) appear specialized for voice processing. Previous research assumed a uniform functional profile for the TVAs which are broadly spread along the bilateral AC. Alternatively, the TVAs might comprise separate AC nodes controlling differential neural functions for voice and speech decoding, organized as local micro-circuits. To investigate micro-circuits, we modeled the directional connectivity between TVA nodes during voice processing in humans while acquiring brain activity using neuroimaging. Results show several bilateral AC nodes for general voice decoding (speech and non-speech voices) and for speech decoding in particular. Furthermore, non-hierarchical and differential bilateral AC networks manifest distinct excitatory and inhibitory pathways for voice and speech processing. Finally, while voice and speech processing seem to have distinctive but integrated neural circuits in the left AC, the right AC reveals disintegrated neural circuits for both sounds. Altogether, we demonstrate a functional heterogeneity in the TVAs for voice decoding based on local micro-circuits.
Collapse
|
40
|
Abbatecola C, Gerardin P, Beneyton K, Kennedy H, Knoblauch K. The Role of Unimodal Feedback Pathways in Gender Perception During Activation of Voice and Face Areas. Front Syst Neurosci 2021; 15:669256. [PMID: 34122023 PMCID: PMC8194406 DOI: 10.3389/fnsys.2021.669256] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 04/22/2021] [Indexed: 11/18/2022] Open
Abstract
Cross-modal effects provide a model framework for investigating hierarchical inter-areal processing, particularly, under conditions where unimodal cortical areas receive contextual feedback from other modalities. Here, using complementary behavioral and brain imaging techniques, we investigated the functional networks participating in face and voice processing during gender perception, a high-level feature of voice and face perception. Within the framework of a signal detection decision model, Maximum likelihood conjoint measurement (MLCM) was used to estimate the contributions of the face and voice to gender comparisons between pairs of audio-visual stimuli in which the face and voice were independently modulated. Top–down contributions were varied by instructing participants to make judgments based on the gender of either the face, the voice or both modalities (N = 12 for each task). Estimated face and voice contributions to the judgments of the stimulus pairs were not independent; both contributed to all tasks, but their respective weights varied over a 40-fold range due to top–down influences. Models that best described the modal contributions required the inclusion of two different top–down interactions: (i) an interaction that depended on gender congruence across modalities (i.e., difference between face and voice modalities for each stimulus); (ii) an interaction that depended on the within modalities’ gender magnitude. The significance of these interactions was task dependent. Specifically, gender congruence interaction was significant for the face and voice tasks while the gender magnitude interaction was significant for the face and stimulus tasks. Subsequently, we used the same stimuli and related tasks in a functional magnetic resonance imaging (fMRI) paradigm (N = 12) to explore the neural correlates of these perceptual processes, analyzed with Dynamic Causal Modeling (DCM) and Bayesian Model Selection. Results revealed changes in effective connectivity between the unimodal Fusiform Face Area (FFA) and Temporal Voice Area (TVA) in a fashion that paralleled the face and voice behavioral interactions observed in the psychophysical data. These findings explore the role in perception of multiple unimodal parallel feedback pathways.
Collapse
Affiliation(s)
- Clement Abbatecola
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM, Stem Cell and Brain Research Institute U1208, Bron, France.,Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Peggy Gerardin
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM, Stem Cell and Brain Research Institute U1208, Bron, France
| | - Kim Beneyton
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM, Stem Cell and Brain Research Institute U1208, Bron, France
| | - Henry Kennedy
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM, Stem Cell and Brain Research Institute U1208, Bron, France.,Institute of Neuroscience, State Key Laboratory of Neuroscience, Chinese Academy of Sciences Key Laboratory of Primate Neurobiology, Shanghai, China
| | - Kenneth Knoblauch
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM, Stem Cell and Brain Research Institute U1208, Bron, France.,National Centre for Optics, Vision and Eye Care, Faculty of Health and Social Sciences, University of South-Eastern Norway, Kongsberg, Norway
| |
Collapse
|
41
|
Jenkins RE, Tsermentseli S, Monks CP, Robertson DJ, Stevenage SV, Symons AE, Davis JP. Are super‐face‐recognisers also super‐voice‐recognisers? Evidence from cross‐modal identification tasks. APPLIED COGNITIVE PSYCHOLOGY 2021. [DOI: 10.1002/acp.3813] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Ryan E. Jenkins
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| | - Stella Tsermentseli
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| | - Claire P. Monks
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| | - David J. Robertson
- School of Psychological Sciences and Health University of Strathclyde Glasgow UK
| | | | - Ashley E. Symons
- Department of Psychology University of Southampton Southampton UK
| | - Josh P. Davis
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| |
Collapse
|
42
|
Cortical voice processing is grounded in elementary sound analyses for vocalization relevant sound patterns. Prog Neurobiol 2020; 200:101982. [PMID: 33338555 DOI: 10.1016/j.pneurobio.2020.101982] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 12/05/2020] [Accepted: 12/11/2020] [Indexed: 01/31/2023]
Abstract
A subregion of the auditory cortex (AC) was proposed to selectively process voices. This selectivity of the temporal voice area (TVA) and its role in processing non-voice sounds however have remained elusive. For a better functional description of the TVA, we investigated its neural responses both to voice and non-voice sounds, and critically also to textural sound patterns (TSPs) that share basic features with natural sounds but that are perceptually very distant from voices. Listening to these TSPs, first, elicited activity in large subregions of the TVA, which was mainly driven by perpetual ratings of TSPs along a voice similarity scale. This similar TVA activity in response to TSPs might partially explain activation patterns typically observed during voice processing. Second, we reconstructed the TVA activity that is usually observed in voice processing with a linear combination of activation patterns from TSPs. An analysis of the reconstruction model weights demonstrated that the TVA similarly processes both natural voice and non-voice sounds as well as TSPs along their acoustic and perceptual features. The predominant factor in reconstructing the TVA pattern by TSPs were the perceptual voice similarity ratings. Third, a multi-voxel pattern analysis confirms that the TSPs contain sufficient sound information to explain TVA activity for voice processing. Altogether, rather than being restricted to higher-order voice processing only, the human "voice area" uses mechanisms to evaluate the perceptual and acoustic quality of non-voice sounds, and responds to the latter with a "voice-like" processing pattern when detecting some rudimentary perceptual similarity with voices.
Collapse
|
43
|
Tsantani M, Cook R. Normal recognition of famous voices in developmental prosopagnosia. Sci Rep 2020; 10:19757. [PMID: 33184411 PMCID: PMC7661722 DOI: 10.1038/s41598-020-76819-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 11/03/2020] [Indexed: 02/06/2023] Open
Abstract
Developmental prosopagnosia (DP) is a condition characterised by lifelong face recognition difficulties. Recent neuroimaging findings suggest that DP may be associated with aberrant structure and function in multimodal regions of cortex implicated in the processing of both facial and vocal identity. These findings suggest that both facial and vocal recognition may be impaired in DP. To test this possibility, we compared the performance of 22 DPs and a group of typical controls, on closely matched tasks that assessed famous face and famous voice recognition ability. As expected, the DPs showed severe impairment on the face recognition task, relative to typical controls. In contrast, however, the DPs and controls identified a similar number of voices. Despite evidence of interactions between facial and vocal processing, these findings suggest some degree of dissociation between the two processing pathways, whereby one can be impaired while the other develops typically. A possible explanation for this dissociation in DP could be that the deficit originates in the early perceptual encoding of face structure, rather than at later, post-perceptual stages of face identity processing, which may be more likely to involve interactions with other modalities.
Collapse
Affiliation(s)
- Maria Tsantani
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK
| | - Richard Cook
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK.
| |
Collapse
|
44
|
Progressive phonagnosia in a telephone operator carrying a C9orf72 expansion. Cortex 2020; 132:92-98. [DOI: 10.1016/j.cortex.2020.05.022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 02/21/2020] [Accepted: 05/13/2020] [Indexed: 12/13/2022]
|
45
|
Morningstar M, Mattson WI, Singer S, Venticinque JS, Nelson EE. Children and adolescents' neural response to emotional faces and voices: Age-related changes in common regions of activation. Soc Neurosci 2020; 15:613-629. [PMID: 33017278 DOI: 10.1080/17470919.2020.1832572] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The perception of facial and vocal emotional expressions engages overlapping regions of the brain. However, at a behavioral level, the ability to recognize the intended emotion in both types of nonverbal cues follows a divergent developmental trajectory throughout childhood and adolescence. The current study a) identified regions of common neural activation to facial and vocal stimuli in 8- to 19-year-old typically-developing adolescents, and b) examined age-related changes in blood-oxygen-level dependent (BOLD) response within these areas. Both modalities elicited activation in an overlapping network of subcortical regions (insula, thalamus, dorsal striatum), visual-motor association areas, prefrontal regions (inferior frontal cortex, dorsomedial prefrontal cortex), and the right superior temporal gyrus. Within these regions, increased age was associated with greater frontal activation to voices, but not faces. Results suggest that processing facial and vocal stimuli elicits activation in common areas of the brain in adolescents, but that age-related changes in response within these regions may vary by modality.
Collapse
Affiliation(s)
- M Morningstar
- Center for Biobehavioral Health, Nationwide Children's Hospital , Columbus, OH, USA.,Department of Pediatrics, The Ohio State University , Columbus, OH, USA.,Department of Psychology, Queen's University , Kingston, ON, Canada
| | - W I Mattson
- Center for Biobehavioral Health, Nationwide Children's Hospital , Columbus, OH, USA
| | - S Singer
- Center for Biobehavioral Health, Nationwide Children's Hospital , Columbus, OH, USA
| | - J S Venticinque
- Center for Biobehavioral Health, Nationwide Children's Hospital , Columbus, OH, USA
| | - E E Nelson
- Center for Biobehavioral Health, Nationwide Children's Hospital , Columbus, OH, USA.,Department of Pediatrics, The Ohio State University , Columbus, OH, USA
| |
Collapse
|
46
|
Johnson J, McGettigan C, Lavan N. Comparing unfamiliar voice and face identity perception using identity sorting tasks. Q J Exp Psychol (Hove) 2020; 73:1537-1545. [PMID: 32530364 PMCID: PMC7534197 DOI: 10.1177/1747021820938659] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 02/11/2020] [Accepted: 03/03/2020] [Indexed: 11/16/2022]
Abstract
Identity sorting tasks, in which participants sort multiple naturally varying stimuli of usually two identities into perceived identities, have recently gained popularity in voice and face processing research. In both modalities, participants who are unfamiliar with the identities tend to perceive multiple stimuli of the same identity as different people and thus fail to "tell people together." These similarities across modalities suggest that modality-general mechanisms may underpin sorting behaviour. In this study, participants completed a voice sorting and a face sorting task. Taking an individual differences approach, we asked whether participants' performance on voice and face sorting of unfamiliar identities is correlated. Participants additionally completed a voice discrimination (Bangor Voice Matching Test) and a face discrimination task (Glasgow Face Matching Test). Using these tasks, we tested whether performance on sorting related to explicit identity discrimination. Performance on voice sorting and face sorting tasks was correlated, suggesting that common modality-general processes underpin these tasks. However, no significant correlations were found between sorting and discrimination performance, with the exception of significant relationships for performance on "same identity" trials with "telling people together" for voices and faces. Overall, any reported relationships were however relatively weak, suggesting the presence of additional modality-specific and task-specific processes.
Collapse
Affiliation(s)
- Justine Johnson
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Nadine Lavan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
47
|
Gandolfo M, Downing PE. Asymmetric visual representation of sex from human body shape. Cognition 2020; 205:104436. [PMID: 32919115 DOI: 10.1016/j.cognition.2020.104436] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 08/05/2020] [Accepted: 08/07/2020] [Indexed: 01/21/2023]
Abstract
We efficiently infer others' states and traits from their appearance, and these inferences powerfully shape our social behaviour. One key trait is sex, which is strongly cued by the appearance of the body. What are the visual representations that link body shape to sex? Previous studies of visual sex judgment tasks find observers have a bias to report "male", particularly for ambiguous stimuli. This finding implies a representational asymmetry - that for the processes that generate a sex percept, the default output is "male", and "female" is determined by the presence of additional perceptual evidence. That is, female body shapes are positively coded by reference to a male default shape. This perspective makes a novel prediction in line with Treisman's studies of visual search asymmetries: female body targets should be more readily detected amongst male distractors than vice versa. Across 10 experiments (N = 32 each) we confirmed this prediction and ruled out alternative low-level explanations. The asymmetry was found with profile and frontal body silhouettes, frontal photographs, and schematised icons. Low-level confounds were controlled by balancing silhouette images for size and homogeneity, and by matching physical properties of photographs. The female advantage was nulled for inverted icons, but intact for inverted photographs, suggesting reliance on distinct cues to sex for different body depictions. Together, these findings demonstrate a principle of the perceptual coding that links bodily appearance with a significant social trait: the female body shape is coded as an extension of a male default. We conclude by offering a visual experience account of how these asymmetric representations arise in the first place.
Collapse
|
48
|
Maylott SE, Paukner A, Ahn YA, Simpson EA. Human and monkey infant attention to dynamic social and nonsocial stimuli. Dev Psychobiol 2020; 62:841-857. [PMID: 32424813 PMCID: PMC7944642 DOI: 10.1002/dev.21979] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 03/23/2020] [Accepted: 03/31/2020] [Indexed: 12/14/2022]
Abstract
The present study explored behavioral norms for infant social attention in typically developing human and nonhuman primate infants. We examined the normative development of attention to dynamic social and nonsocial stimuli longitudinally in macaques (Macaca mulatta) at 1, 3, and 5 months of age (N = 75) and humans at 2, 4, 6, 8, and 13 months of age (N = 69) using eye tracking. All infants viewed concurrently played silent videos-one social video and one nonsocial video. Both macaque and human infants were faster to look to the social than the nonsocial stimulus, and both species grew faster to orient to the social stimulus with age. Further, macaque infants' social attention increased linearly from 1 to 5 months. In contrast, human infants displayed a nonlinear pattern of social interest, with initially greater attention to the social stimulus, followed by a period of greater interest in the nonsocial stimulus, and then a rise in social interest from 6 to 13 months. Overall, human infants looked longer than macaque infants, suggesting humans have more sustained attention in the first year of life. These findings highlight potential species similarities and differences, and reflect a first step in establishing baseline patterns of early social attention development.
Collapse
Affiliation(s)
- Sarah E Maylott
- Department of Psychology, University of Miami, Coral Gables, FL, USA
| | - Annika Paukner
- Department of Psychology, Nottingham Trent University, Nottingham, UK
| | - Yeojin A Ahn
- Department of Psychology, University of Miami, Coral Gables, FL, USA
| | | |
Collapse
|
49
|
Where Sounds Occur Matters: Context Effects Influence Processing of Salient Vocalisations. Brain Sci 2020; 10:brainsci10070429. [PMID: 32640750 PMCID: PMC7407900 DOI: 10.3390/brainsci10070429] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 06/26/2020] [Accepted: 07/02/2020] [Indexed: 11/23/2022] Open
Abstract
The social context in which a salient human vocalisation is heard shapes the affective information it conveys. However, few studies have investigated how visual contextual cues lead to differential processing of such vocalisations. The prefrontal cortex (PFC) is implicated in processing of contextual information and evaluation of saliency of vocalisations. Using functional Near-Infrared Spectroscopy (fNIRS), we investigated PFC responses of young adults (N = 18) to emotive infant and adult vocalisations while they passively viewed the scenes of two categories of environmental contexts: a domestic environment (DE) and an outdoors environment (OE). Compared to a home setting (DE) which is associated with a fixed mental representation (e.g., expect seeing a living room in a typical house), the outdoor setting (OE) is more variable and less predictable, thus might demand greater processing effort. From our previous study in Azhari et al. (2018) that employed the same experimental paradigm, the OE context was found to elicit greater physiological arousal compared to the DE context. Similarly, we hypothesised that greater PFC activation will be observed when salient vocalisations are paired with the OE compared to the DE condition. Our finding supported this hypothesis: the left rostrolateral PFC, an area of the brain that facilitates relational integration, exhibited greater activation in the OE than DE condition which suggests that greater cognitive resources are required to process outdoor situational information together with salient vocalisations. The result from this study bears relevance in deepening our understanding of how contextual information differentially modulates the processing of salient vocalisations.
Collapse
|
50
|
Xu M, Tachibana RO, Okanoya K, Hagiwara H, Hashimoto RI, Homae F. Unconscious and Distinctive Control of Vocal Pitch and Timbre During Altered Auditory Feedback. Front Psychol 2020; 11:1224. [PMID: 32581975 PMCID: PMC7294928 DOI: 10.3389/fpsyg.2020.01224] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Accepted: 05/11/2020] [Indexed: 01/01/2023] Open
Abstract
Vocal control plays a critical role in smooth social communication. Speakers constantly monitor auditory feedback (AF) and make adjustments when their voices deviate from their intentions. Previous studies have shown that when certain acoustic features of the AF are artificially altered, speakers compensate for this alteration in the opposite direction. However, little is known about how the vocal control system implements compensations for alterations of different acoustic features, and associates them with subjective consciousness. The present study investigated whether compensations for the fundamental frequency (F0), which corresponds to perceived pitch, and formants, which contribute to perceived timbre, can be performed unconsciously and independently. Forty native Japanese speakers received two types of altered AF during vowel production that involved shifts of either only the formant frequencies (formant modification; Fm) or both the pitch and formant frequencies (pitch + formant modification; PFm). For each type, three levels of shift (slight, medium, and severe) in both directions (increase or decrease) were used. After the experiment, participants were tested for whether they had perceived a change in the F0 and/or formants. The results showed that (i) only formants were compensated for in the Fm condition, while both the F0 and formants were compensated for in the PFm condition; (ii) the F0 compensation exhibited greater precision than the formant compensation in PFm; and (iii) compensation occurred even when participants misperceived or could not explicitly perceive the alteration in AF. These findings indicate that non-experts can compensate for both formant and F0 modifications in the AF during vocal production, even when the modifications are not explicitly or correctly perceived, which provides further evidence for a dissociation between conscious perception and action in vocal control. We propose that such unconscious control of voice production may enhance rapid adaptation to changing speech environments and facilitate mutual communication.
Collapse
Affiliation(s)
- Mingdi Xu
- Department of Language Sciences, Graduate School of Humanities, Tokyo Metropolitan University, Tokyo, Japan
| | - Ryosuke O Tachibana
- Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| | - Kazuo Okanoya
- Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| | - Hiroko Hagiwara
- Department of Language Sciences, Graduate School of Humanities, Tokyo Metropolitan University, Tokyo, Japan.,Research Center for Language, Brain and Genetics, Tokyo Metropolitan University, Tokyo, Japan
| | - Ryu-Ichiro Hashimoto
- Department of Language Sciences, Graduate School of Humanities, Tokyo Metropolitan University, Tokyo, Japan.,Research Center for Language, Brain and Genetics, Tokyo Metropolitan University, Tokyo, Japan
| | - Fumitaka Homae
- Department of Language Sciences, Graduate School of Humanities, Tokyo Metropolitan University, Tokyo, Japan.,Research Center for Language, Brain and Genetics, Tokyo Metropolitan University, Tokyo, Japan
| |
Collapse
|