1
|
Cordero G, Paredes-Paredes JR, von Kriegstein K, Díaz B. Perceiving speech from a familiar speaker engages the person identity network. PLoS One 2025; 20:e0322927. [PMID: 40367292 PMCID: PMC12077772 DOI: 10.1371/journal.pone.0322927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Accepted: 03/31/2025] [Indexed: 05/16/2025] Open
Abstract
Numerous studies show that speaker familiarity influences speech perception. Here, we investigated the brain regions and their changes in functional connectivity involved in the use of person-specific information during speech perception. We employed functional magnetic resonance imaging to study changes in functional connectivity and Blood-Oxygenation-Level-Dependent (BOLD) responses associated with speaker familiarity in human adults while they performed a speech perception task. Twenty-seven right-handed participants performed the speech task before and after being familiarized with the voice and numerous autobiographical details of one of the speakers featured in the task. We found that speech perception from a familiar speaker was associated with BOLD activity changes in regions of the person identity network: the right temporal pole, a voice-sensitive region, and the right supramarginal gyrus, a region sensitive to speaker-specific aspects of speech sound productions. A speech-sensitive region located in the left superior temporal gyrus also exhibited sensitivity to speaker familiarity during speech perception. Lastly, speaker familiarity increased connectivity strength between the right temporal pole and the right superior frontal gyrus, a region associated with verbal working memory. Our findings unveil that speaker familiarity engages the person identity network during speech perception, extending the neural basis of speech processing beyond the canonical language network.
Collapse
Affiliation(s)
- Gaël Cordero
- Department of Psychology, Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya, Barcelona, Spain
| | - Jazmin R. Paredes-Paredes
- Department of Psychology, Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya, Barcelona, Spain
| | | | - Begoña Díaz
- Department of Psychology, Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya, Barcelona, Spain
| |
Collapse
|
2
|
Xu T, Jiang X, Zhang P, Wang A. Introducing the Sisu Voice Matching Test (SVMT): A novel tool for assessing voice discrimination in Chinese. Behav Res Methods 2025; 57:86. [PMID: 39900852 DOI: 10.3758/s13428-025-02608-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2025] [Indexed: 02/05/2025]
Abstract
Existing standardized tests for voice discrimination are based mainly on Indo-European languages, particularly English. However, voice identity perception is influenced by language familiarity, with listeners generally performing better in their native language than in a foreign one. To provide a more accurate and comprehensive assessment of voice discrimination, it is crucial to develop tests tailored to the native language of the test takers. In response, we developed the Sisu Voice Matching Test (SVMT), a pioneering tool designed specifically for Mandarin Chinese speakers. The SVMT was designed to model real-world communication since it includes both pseudo-word and pseudo-sentence stimuli and covers both the ability to categorize identical voices as the same and the ability to categorize distinct voices as different. Built on a neurally validated voice-space model and item response theory, the SVMT ensures high reliability, validity, appropriate difficulty, and strong discriminative power, while maintaining a concise test duration of approximately 10 min. Therefore, by taking into account the effects of language nativeness, the SVMT complements existing voice tests based on other languages' phonologies to provide a more accurate assessment of voice discrimination ability for Mandarin Chinese speakers. Future research can use the SVMT to deepen our understanding of the mechanisms underlying human voice identity perception, especially in special populations, and to examining the relationship between voice identity recognition and other cognitive processes.
Collapse
Affiliation(s)
- Tianze Xu
- Institute of Linguistics, Shanghai International Studies University, Shanghai, 201620, China
| | - Xiaoming Jiang
- Institute of Linguistics, Shanghai International Studies University, Shanghai, 201620, China.
- Key Laboratory of Language Science and Multilingual Artificial Intelligence, Shanghai International Studies University, Shanghai, 201620, China.
| | - Peng Zhang
- Institute of Linguistics, Shanghai International Studies University, Shanghai, 201620, China
| | - Anni Wang
- Institute of Linguistics, Shanghai International Studies University, Shanghai, 201620, China
| |
Collapse
|
3
|
Li J, Fu C, Sun Y. The influence of gender stereotypes on gender judgement and impression evaluation based on face and voice. PeerJ 2025; 13:e18900. [PMID: 39902330 PMCID: PMC11789659 DOI: 10.7717/peerj.18900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 01/03/2025] [Indexed: 02/05/2025] Open
Abstract
The present study examined the influence of gender stereotype information on cognitive judgments and impression evaluations of faces and voices. A 2 × 2 × 2 design was employed, with Perceptual Target (Face vs. Voice), Gender Stereotype Information (Consistent vs. Inconsistent) and Gender of Perceptual Targets (Male and Female) serving as within-subject factors. The results demonstrated that when gender stereotype information was consistent with the perceptual target's gender, response times for face gender judgments were shorter than for voice gender judgments. Nevertheless, the accuracy of gender judgments was higher for voices than faces. Furthermore, likability ratings for targets were significantly higher when gender stereotype information was consistent with the target than when it was inconsistent, for both face and voice judgments. These findings indicate that visual and auditory cues are processed differently in the context of gender judgments, thereby highlighting the distinct roles of facial and vocal information in gender perception. The current study contributes to understanding the complex interplay between gender stereotypes and multimodal social information processing.
Collapse
Affiliation(s)
- Jingyu Li
- Faculty of Psychology, Tianjin Normal University, Tianjin, China
- College of Teacher Education, Weifang University of Science and Technology, Weifang, Shandong, China
| | - Chunye Fu
- Department of Social Psychology, Nankai University, Tianjin, China
| | - Yunrui Sun
- Faculty of Education, Tianjin Normal University, Tianjin, China
| |
Collapse
|
4
|
Roswandowitz C, Kathiresan T, Pellegrino E, Dellwo V, Frühholz S. Cortical-striatal brain network distinguishes deepfake from real speaker identity. Commun Biol 2024; 7:711. [PMID: 38862808 PMCID: PMC11166919 DOI: 10.1038/s42003-024-06372-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 05/22/2024] [Indexed: 06/13/2024] Open
Abstract
Deepfakes are viral ingredients of digital environments, and they can trick human cognition into misperceiving the fake as real. Here, we test the neurocognitive sensitivity of 25 participants to accept or reject person identities as recreated in audio deepfakes. We generate high-quality voice identity clones from natural speakers by using advanced deepfake technologies. During an identity matching task, participants show intermediate performance with deepfake voices, indicating levels of deception and resistance to deepfake identity spoofing. On the brain level, univariate and multivariate analyses consistently reveal a central cortico-striatal network that decoded the vocal acoustic pattern and deepfake-level (auditory cortex), as well as natural speaker identities (nucleus accumbens), which are valued for their social relevance. This network is embedded in a broader neural identity and object recognition network. Humans can thus be partly tricked by deepfakes, but the neurocognitive mechanisms identified during deepfake processing open windows for strengthening human resilience to fake information.
Collapse
Affiliation(s)
- Claudia Roswandowitz
- Cognitive and Affective Neuroscience Unit, Department of Psychology, University of Zurich, Zurich, Switzerland.
- Phonetics and Speech Sciences Group, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland.
- Neuroscience Centre Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland.
| | - Thayabaran Kathiresan
- Centre for Neuroscience of Speech, University Melbourne, Melbourne, Australia
- Redenlab, Melbourne, Australia
| | - Elisa Pellegrino
- Phonetics and Speech Sciences Group, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Volker Dellwo
- Phonetics and Speech Sciences Group, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Sascha Frühholz
- Cognitive and Affective Neuroscience Unit, Department of Psychology, University of Zurich, Zurich, Switzerland
- Neuroscience Centre Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
- Department of Psychology, University of Oslo, Oslo, Norway
| |
Collapse
|
5
|
Gainotti G. Human Recognition: The Utilization of Face, Voice, Name and Interactions-An Extended Editorial. Brain Sci 2024; 14:345. [PMID: 38671996 PMCID: PMC11048321 DOI: 10.3390/brainsci14040345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 03/21/2024] [Indexed: 04/28/2024] Open
Abstract
The many stimulating contributions to this Special Issue of Brain Science focused on some basic issues of particular interest in current research, with emphasis on human recognition using faces, voices, and names [...].
Collapse
Affiliation(s)
- Guido Gainotti
- Institute of Neurology, Università Cattolica del Sacro Cuore, Fondazione Policlinico A. Gemelli, Istituto di Ricovero e Cura a Carattere Scientifico, 00168 Rome, Italy
| |
Collapse
|
6
|
Har-Shai Yahav P, Sharaabi A, Zion Golumbic E. The effect of voice familiarity on attention to speech in a cocktail party scenario. Cereb Cortex 2024; 34:bhad475. [PMID: 38142293 DOI: 10.1093/cercor/bhad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 12/25/2023] Open
Abstract
Selective attention to one speaker in multi-talker environments can be affected by the acoustic and semantic properties of speech. One highly ecological feature of speech that has the potential to assist in selective attention is voice familiarity. Here, we tested how voice familiarity interacts with selective attention by measuring the neural speech-tracking response to both target and non-target speech in a dichotic listening "Cocktail Party" paradigm. We measured Magnetoencephalography from n = 33 participants, presented with concurrent narratives in two different voices, and instructed to pay attention to one ear ("target") and ignore the other ("non-target"). Participants were familiarized with one of the voices during the week prior to the experiment, rendering this voice familiar to them. Using multivariate speech-tracking analysis we estimated the neural responses to both stimuli and replicate their well-established modulation by selective attention. Importantly, speech-tracking was also affected by voice familiarity, showing enhanced response for target speech and reduced response for non-target speech in the contra-lateral hemisphere, when these were in a familiar vs. an unfamiliar voice. These findings offer valuable insight into how voice familiarity, and by extension, auditory-semantics, interact with goal-driven attention, and facilitate perceptual organization and speech processing in noisy environments.
Collapse
Affiliation(s)
- Paz Har-Shai Yahav
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Aviya Sharaabi
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
7
|
Gainotti G, Quaranta D, Luzzi S. Apperceptive and Associative Forms of Phonagnosia. Curr Neurol Neurosci Rep 2023; 23:327-333. [PMID: 37133717 PMCID: PMC10257619 DOI: 10.1007/s11910-023-01271-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/18/2023] [Indexed: 05/04/2023]
Abstract
PURPOSE OF REVIEW Pronagnosia is a rare acquired or developmental pathological condition that consists of a selective difficulty to recognize familiar people by their voices. It can be distinguished into two different categories: apperceptive phonagnosia, which denotes a purely perceptual form of voice recognition disorder; and associative phonagnosia, in which patients have no perceptual defects, but cannot evaluate if the voice of a known person is or not familiar. The neural substrate of these two forms of voice recognition is still controversial, but it could concern different components of the core temporal voice areas and of extratemporal voice processing areas. This article reviews recent research on the neuropsychological and anatomo-clinical aspects of this condition. RECENT FINDINGS Data obtained in group studies or single case reports of phonagnosic patients suggest that apperceptive phonagnosia might be due to disruption of the core temporal voice areas, bilaterally located in the posterior parts of the superior temporal gyrus, whereas associative phonagnosia might result from impaired access to structures where voice representations are stored, due to a disconnection of these areas from structures of the voice extended system. Although these results must be confirmed by further investigations, they represent an important step toward understanding the nature and neural substrate of apperceptive and associative forms of phonagnosia.
Collapse
Affiliation(s)
- Guido Gainotti
- Institute of Neurology, Catholic University of the Sacred Heart, Largo A. Gemell, 8, 00168, Rome, Italy.
| | - Davide Quaranta
- Neurology Unit, Department of Science of Elderly, Neuroscience, Head and Neck and Orthopaedics, Fondazione Policlinico A. Gemelli, IRCCS, Rome, Italy
| | - Simona Luzzi
- Department of Experimental and Clinical Medicine, Polytechnic University of Marche, Ancona, Italy
| |
Collapse
|