1
|
Meng Y, Liang C, Chen W, Liu Z, Yang C, Hu J, Gao Z, Gao S. Neural basis of language familiarity effects on voice recognition: An fNIRS study. Cortex 2024; 176:1-10. [PMID: 38723449 DOI: 10.1016/j.cortex.2024.04.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/18/2024] [Accepted: 04/10/2024] [Indexed: 06/11/2024]
Abstract
Recognizing talkers' identity via speech is an important social skill in interpersonal interaction. Behavioral evidence has shown that listeners can identify better the voices of their native language than those of a non-native language, which is known as the language familiarity effect (LFE). However, its underlying neural mechanisms remain unclear. This study therefore investigated how the LFE occurs at the neural level by employing functional near-infrared spectroscopy (fNIRS). Late unbalanced bilinguals were first asked to learn to associate strangers' voices with their identities and then tested for recognizing the talkers' identities based on their voices speaking a language either highly familiar (i.e., native language Chinese), or moderately familiar (i.e., second language English), or completely unfamiliar (i.e., Ewe) to participants. Participants identified talkers the most accurately in Chinese and the least accurately in Ewe. Talker identification was quicker in Chinese than in English and Ewe but reaction time did not differ between the two non-native languages. At the neural level, recognizing voices speaking Chinese relative to English/Ewe produced less activity in the inferior frontal gyrus, precentral/postcentral gyrus, supramarginal gyrus, and superior temporal sulcus/gyrus while no difference was found between English and Ewe, indicating facilitation of voice identification by the automatic phonological encoding in the native language. These findings shed new light on the interrelations between language ability and voice recognition, revealing that the brain activation pattern of the LFE depends on the automaticity of language processing.
Collapse
Affiliation(s)
- Yuan Meng
- School of Foreign Languages, University of Electronic Science and Technology of China, Chengdu, China
| | - Chunyan Liang
- School of Foreign Languages, University of Electronic Science and Technology of China, Chengdu, China; Zhuojin Branch of Yandaojie Primary School, Chengdu, China
| | - Wenjing Chen
- School of Foreign Languages, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhaoning Liu
- School of Foreign Languages, University of Electronic Science and Technology of China, Chengdu, China
| | - Chaoqing Yang
- School of Foreign Languages, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiehui Hu
- School of Foreign Languages, University of Electronic Science and Technology of China, Chengdu, China; The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for NeuroInformation, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhao Gao
- School of Foreign Languages, University of Electronic Science and Technology of China, Chengdu, China; The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for NeuroInformation, University of Electronic Science and Technology of China, Chengdu, China.
| | - Shan Gao
- School of Foreign Languages, University of Electronic Science and Technology of China, Chengdu, China; The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for NeuroInformation, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
2
|
Cutler A, Burchfield LA, Antoniou M. The Language-Specificity of Phonetic Adaptation to Talkers. LANGUAGE AND SPEECH 2024; 67:373-400. [PMID: 38054422 PMCID: PMC11141103 DOI: 10.1177/00238309231214244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Listeners adapt efficiently to new talkers by using lexical knowledge to resolve perceptual uncertainty. This adaptation has been widely observed, both in first (L1) and in second languages (L2). Here, adaptation was tested in both the L1 and L2 of speakers of Mandarin and English, two very dissimilar languages. A sound midway between /f/ and /s/ replacing either /f/ or /s/ in Mandarin words presented for lexical decision (e.g., bu4fa3 "illegal"; kuan1song1 "loose") prompted the expected adaptation; it induced an expanded /f/ category in phoneme categorization when it had replaced /f/, but an expanded /s/ category when it had replaced /s/. Both L1 listeners and English-native listeners with L2 Mandarin showed this effect. In English, however (with e.g., traffic; insane), we observed adaptation in L1 but not in L2; Mandarin-native listeners, despite scoring highly in the English lexical decision training, did not adapt their category boundaries for /f/ and /s/. Whether the ambiguous sound appeared syllable-initially (as in Mandarin phonology) versus word-finally (providing more word identity information) made no difference. Perceptual learning for talker adaptation is language-specific in that successful lexically guided adaptation in one language does not guarantee adaptation in other known languages; the enabling conditions for adaptation may be multiple and diverse.
Collapse
Affiliation(s)
- Anne Cutler
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Australia
| | - L Ann Burchfield
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Australia
| | - Mark Antoniou
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Australia
| |
Collapse
|
3
|
Ming L, Geng L, Zhao X, Wang Y, Hu N, Yang Y, Hu X. The mechanism of phonetic information in voice identity discrimination: a comparative study based on sighted and blind people. Front Psychol 2024; 15:1352692. [PMID: 38845764 PMCID: PMC11153856 DOI: 10.3389/fpsyg.2024.1352692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 05/10/2024] [Indexed: 06/09/2024] Open
Abstract
Purpose The purpose of this study is to examine whether phonetic information functions and how phonetic information affects voice identity processing in blind people. Method To address the first inquiry, 25 normal sighted participants and 30 blind participants discriminated voice identity, when listening forward speech and backward speech from their own native language and another unfamiliar language. To address the second inquiry, combining articulatory suppression paradigm, 26 normal sighted participants and 26 blind participants discriminated voice identity, when listening forward speech from their own native language and another unfamiliar language. Results In Experiment 1, not only in the voice identity discrimination task with forward speech, but also in the discrimination task with backward speech, both the sighted and blind groups showed the superiority of the native language. This finding supports the view that backward speech still retains some phonetic information, and indicates that phonetic information can affect voice identity processing in sighted and blind people. In addition, only the superiority of the native language of sighted people was regulated by the speech manner, which is related to articulatory rehearsal. In Experiment 2, only the superiority of the native language of sighted people was regulated by articulatory suppression. This indicates that phonetic information may act in different ways on voice identity processing in sighted and blind people. Conclusion The heightened dependence on voice source information in blind people appears not to undermine the function of phonetic information, but it appears to change the functional mechanism of phonetic information. These findings suggest that the present phonetic familiarity model needs to be improved with respect to the mechanism of phonetic information.
Collapse
Affiliation(s)
- Lili Ming
- School of Linguistic Sciences and Arts, Jiangsu Normal University, Xuzhou, China
- Key Laboratory of Language and Cognitive Neuroscience of Jiangsu Province, Collaborative Innovation Center for Language Ability, Xuzhou, China
| | - Libo Geng
- School of Linguistic Sciences and Arts, Jiangsu Normal University, Xuzhou, China
- Key Laboratory of Language and Cognitive Neuroscience of Jiangsu Province, Collaborative Innovation Center for Language Ability, Xuzhou, China
| | - Xinyu Zhao
- School of Linguistic Sciences and Arts, Jiangsu Normal University, Xuzhou, China
- Key Laboratory of Language and Cognitive Neuroscience of Jiangsu Province, Collaborative Innovation Center for Language Ability, Xuzhou, China
| | - Yichan Wang
- School of Linguistic Sciences and Arts, Jiangsu Normal University, Xuzhou, China
- Key Laboratory of Language and Cognitive Neuroscience of Jiangsu Province, Collaborative Innovation Center for Language Ability, Xuzhou, China
| | - Na Hu
- School of Preschool and Special Education, Kunming University, Yunnan, China
| | - Yiming Yang
- School of Linguistic Sciences and Arts, Jiangsu Normal University, Xuzhou, China
- Key Laboratory of Language and Cognitive Neuroscience of Jiangsu Province, Collaborative Innovation Center for Language Ability, Xuzhou, China
| | - Xueping Hu
- College of Education, Huaibei Normal University, Huaibei, China
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), Huaibei, China
| |
Collapse
|
4
|
Harford EE, Holt LL, Abel TJ. Unveiling the development of human voice perception: Neurobiological mechanisms and pathophysiology. CURRENT RESEARCH IN NEUROBIOLOGY 2024; 6:100127. [PMID: 38511174 PMCID: PMC10950757 DOI: 10.1016/j.crneur.2024.100127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 02/22/2024] [Accepted: 02/26/2024] [Indexed: 03/22/2024] Open
Abstract
The human voice is a critical stimulus for the auditory system that promotes social connection, informs the listener about identity and emotion, and acts as the carrier for spoken language. Research on voice processing in adults has informed our understanding of the unique status of the human voice in the mature auditory cortex and provided potential explanations for mechanisms that underly voice selectivity and identity processing. There is evidence that voice perception undergoes developmental change starting in infancy and extending through early adolescence. While even young infants recognize the voice of their mother, there is an apparent protracted course of development to reach adult-like selectivity for human voice over other sound categories and recognition of other talkers by voice. Gaps in the literature do not allow for an exact mapping of this trajectory or an adequate description of how voice processing and its neural underpinnings abilities evolve. This review provides a comprehensive account of developmental voice processing research published to date and discusses how this evidence fits with and contributes to current theoretical models proposed in the adult literature. We discuss how factors such as cognitive development, neural plasticity, perceptual narrowing, and language acquisition may contribute to the development of voice processing and its investigation in children. We also review evidence of voice processing abilities in premature birth, autism spectrum disorder, and phonagnosia to examine where and how deviations from the typical trajectory of development may manifest.
Collapse
Affiliation(s)
- Emily E. Harford
- Department of Neurological Surgery, University of Pittsburgh, USA
| | - Lori L. Holt
- Department of Psychology, The University of Texas at Austin, USA
| | - Taylor J. Abel
- Department of Neurological Surgery, University of Pittsburgh, USA
- Department of Bioengineering, University of Pittsburgh, USA
| |
Collapse
|
5
|
Yu K, Zhou Y, Zhang L, Li L, Li P, Wang R. How Different Types of Linguistic Information Impact Voice Perception: Evidence From the Language-Familiarity Effect. LANGUAGE AND SPEECH 2023; 66:1007-1029. [PMID: 36680473 DOI: 10.1177/00238309221143062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Previous studies have suggested the effect of linguistic information on voice perception (e.g., the language-familiarity effect [LFE]). However, it remains unclear which type of specific information in speech contributes to voice perception, including acoustic, phonological, lexical, and semantic information. It is also underexamined whether the roles of these different types of information are modulated by the experimental paradigm (speaker discrimination vs. speaker identification). In this study, we conducted two experiments to investigate these issues regarding LFEs. Experiment 1 examined the roles of acoustic and phonological information in speaker discrimination and identification with forward and time-reversed Mandarin and Indonesian sentences. Experiment 2 further identified the roles of phonological, lexical, and semantic information with forward, word-scrambled, and reconstructed (consisting of pseudo-Mandarin words) Mandarin and forward Indonesian sentences. For Mandarin-only participants, in Experiment 1, speaker discrimination was more accurate for forward than reversed sentences, but there was no LFE in either sentence. Speaker identification was also more accurate for forward than reversed sentences, whereas there was an LFE for forward sentences. In Experiment 2, speaker discrimination was better for word-scrambled than reconstructed Mandarin sentences. Speaker identification was more accurate for forward and word-scrambled Mandarin sentences but less accurate for Mandarin reconstructed and forward Indonesian sentences. In general, the pattern of the results for Indonesian learners was the same as that for Mandarin-only speakers. These results suggest that different kinds of information support speaker discrimination and identification in native and unfamiliar languages. The LFE in speaker identification depends on both phonological and lexical information.
Collapse
Affiliation(s)
- Keke Yu
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents, Ministry of Education, & Center for Studies of Psychological Application, School of Psychology, South China Normal University, China
| | - Yacong Zhou
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents, Ministry of Education, & Center for Studies of Psychological Application, School of Psychology, South China Normal University, China; Huanghe Science and Technology University, China
| | | | - Li Li
- The Key Laboratory of Chinese Learning and International Promotion, and College of International Culture, South China Normal University, China
| | - Ping Li
- The Pennsylvania State University, USA
| | - Ruiming Wang
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents, Ministry of Education, & Center for Studies of Psychological Application, School of Psychology, South China Normal University, China
| |
Collapse
|
6
|
Torre I, White L, Goslin J, Knight S. The irrepressible influence of vocal stereotypes on trust. Q J Exp Psychol (Hove) 2023:17470218231211549. [PMID: 37872679 DOI: 10.1177/17470218231211549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
There is a reciprocal relationship between trust and vocal communication in human interactions. On one hand, a predisposition towards trust is necessary for communication to be meaningful and effective. On the other hand, we use vocal cues to signal our own trustworthiness and to infer it from the speech of others. Research on trustworthiness attributions to vocal characteristics is scarce and contradictory, however, being typically based on explicit judgements which may not predict actual trust-oriented behaviour. We use a game theory paradigm to examine the influence of speaker accent and prosody on trusting behaviour towards a simulated game partner, who responds either trustworthily or untrustworthily in an investment game. We found that speaking in a non-regional standard accent increases trust, as does relatively slow articulation rate. The effect of accent persists over time, despite the accumulation of clear evidence regarding the speaker's level of trustworthiness in a negotiated interaction. Accents perceived as positive for trust can maintain this benefit even in the face of behavioural evidence of untrustworthiness.
Collapse
Affiliation(s)
- Ilaria Torre
- Division of Interaction Design and Software Engineering, Chalmers University of Technology, Gothenburg, Sweden
- KTH Royal Institute of Technology, Stockholm, Sweden
| | | | | | | |
Collapse
|
7
|
Zaltz Y. The effect of stimulus type and testing method on talker discrimination of school-age children. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2611. [PMID: 37129674 DOI: 10.1121/10.0017999] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 04/12/2023] [Indexed: 05/03/2023]
Abstract
Efficient talker discrimination (TD) improves speech understanding under multi-talker conditions. So far, TD of children has been assessed using various testing parameters, making it difficult to draw comparative conclusions. This study explored the effects of the stimulus type and variability on children's TD. Thirty-two children (7-10 years old) underwent eight TD assessments with fundamental frequency + formant changes using an adaptive procedure. Stimuli included consonant-vowel-consonant words or three-word sentences and were either fixed by run or by trial (changing throughout the run). Cognitive skills were also assessed. Thirty-one adults (18-35 years old) served as controls. The results showed (1) poorer TD for the fixed-by-trial than the fixed-by-run method, with both stimulus types for the adults but only with the words for the children; (2) poorer TD for the words than the sentences with the fixed-by-trial method only for the children; and (3) significant correlations between the children's age and TD. These results support a developmental trajectory in the use of perceptual anchoring for TD and in its reliance on comprehensive acoustic and linguistic information. The finding that the testing parameters may influence the top-down and bottom-up processing for TD should be considered when comparing data across studies or when planning new TD experiments.
Collapse
Affiliation(s)
- Yael Zaltz
- Department of Communication Disorders, The Steyer School of Health Professions, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
8
|
Talker and accent familiarity yield advantages for voice identity perception: A voice sorting study. Mem Cognit 2023; 51:175-187. [PMID: 35274221 PMCID: PMC9943951 DOI: 10.3758/s13421-022-01296-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/15/2022] [Indexed: 11/08/2022]
Abstract
In the current study, we examine and compare the effects of talker and accent familiarity in the context of a voice identity sorting task, using naturally varying voice recording samples from the TV show Derry Girls. Voice samples were thus all spoken with a regional accent of UK/Irish English (from [London]derry). We tested four listener groups: Listeners were either familiar or unfamiliar with the TV show (and therefore the talker identities) and were either highly familiar or relatively less familiar with Northern Irish accents. Both talker and accent familiarity significantly improved accuracy of voice identity sorting performance. However, the talker familiarity benefits were overall larger, and more consistent. We discuss the results in light of a possible hierarchy of familiarity effects and argue that our findings may provide additional evidence for interactions of speech and identity processing pathways in voice identity perception. We also identify some key limitations in the current work and provide suggestions for future studies to address these.
Collapse
|
9
|
Jebens A, Başkent D, Rachman L. Phonological effects on the perceptual weighting of voice cues for voice gender categorization. JASA EXPRESS LETTERS 2022; 2:125202. [PMID: 36586964 DOI: 10.1121/10.0016601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Voice perception and speaker identification interact with linguistic processing. This study investigated whether lexicality and/or phonological effects alter the perceptual weighting of voice pitch (F0) and vocal-tract length (VTL) cues for perceived voice gender categorization. F0 and VTL of forward words and nonwords (for lexicality effect), and time-reversed nonwords (for phonological effect through phonetic alterations) were manipulated. Participants provided binary "man"/"woman" judgements of the different voice conditions. Cue weights for time-reversed nonwords were significantly lower than cue weights for both forward words and nonwords, but there was no significant difference between forward words and nonwords. Hence, voice cue utilization for voice gender judgements seems to be affected by phonological, rather than lexicality effects.
Collapse
Affiliation(s)
- Almut Jebens
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands ; ;
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands ; ;
| | - Laura Rachman
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands ; ;
| |
Collapse
|
10
|
Drown L, Philip B, Francis AL, Theodore RM. Revisiting the left ear advantage for phonetic cues to talker identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3107. [PMID: 36456295 PMCID: PMC9715276 DOI: 10.1121/10.0015093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/13/2022] [Accepted: 10/18/2022] [Indexed: 06/17/2023]
Abstract
Previous research suggests that learning to use a phonetic property [e.g., voice-onset-time, (VOT)] for talker identity supports a left ear processing advantage. Specifically, listeners trained to identify two "talkers" who only differed in characteristic VOTs showed faster talker identification for stimuli presented to the left ear compared to that presented to the right ear, which is interpreted as evidence of hemispheric lateralization consistent with task demands. Experiment 1 (n = 97) aimed to replicate this finding and identify predictors of performance; experiment 2 (n = 79) aimed to replicate this finding under conditions that better facilitate observation of laterality effects. Listeners completed a talker identification task during pretest, training, and posttest phases. Inhibition, category identification, and auditory acuity were also assessed in experiment 1. Listeners learned to use VOT for talker identity, which was positively associated with auditory acuity. Talker identification was not influenced by ear of presentation, and Bayes factors indicated strong support for the null. These results suggest that talker-specific phonetic variation is not sufficient to induce a left ear advantage for talker identification; together with the extant literature, this instead suggests that hemispheric lateralization for talker-specific phonetic variation requires phonetic variation to be conditioned on talker differences in source characteristics.
Collapse
Affiliation(s)
- Lee Drown
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269-1085, USA
| | - Betsy Philip
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269-1085, USA
| | - Alexander L Francis
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907-2122, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269-1085, USA
| |
Collapse
|
11
|
Orena AJ, Mader AS, Werker JF. Learning to Recognize Unfamiliar Voices: An Online Study With 12- and 24-Month-Olds. Front Psychol 2022; 13:874411. [PMID: 35558718 PMCID: PMC9088808 DOI: 10.3389/fpsyg.2022.874411] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Accepted: 03/18/2022] [Indexed: 12/02/2022] Open
Abstract
Young infants are attuned to the indexical properties of speech: they can recognize highly familiar voices and distinguish them from unfamiliar voices. Less is known about how and when infants start to recognize unfamiliar voices, and to map them to faces. This skill is particularly challenging when portions of the speaker’s face are occluded, as is the case with masking. Here, we examined voice−face recognition abilities in infants 12 and 24 months of age. Using the online Lookit platform, children saw and heard four different speakers produce words with sonorous phonemes (high talker information), and words with phonemes that are less sonorous (low talker information). Infants aged 24 months, but not 12 months, were able to learn to link the voices to partially occluded faces of unfamiliar speakers, and only when the words were produced with high talker information. These results reveal that 24-month-old infants can encode and retrieve indexical properties of an unfamiliar speaker’s voice, and they can access this information even when visual access to the speaker’s mouth is blocked.
Collapse
Affiliation(s)
- Adriel John Orena
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada.,Department of Evaluation and Research Services, Fraser Health Authority, Surrey, BC, Canada
| | - Asia Sotera Mader
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| | - Janet F Werker
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
12
|
Implicit and explicit learning in talker identification. Atten Percept Psychophys 2022; 84:2002-2015. [PMID: 35534783 PMCID: PMC10081569 DOI: 10.3758/s13414-022-02500-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/23/2022] [Indexed: 11/08/2022]
Abstract
In the real world, listeners seem to implicitly learn talkers' vocal identities during interactions that prioritize attending to the content of talkers' speech. In contrast, most laboratory experiments of talker identification employ training paradigms that require listeners to explicitly practice identifying voices. Here, we investigated whether listeners become familiar with talkers' vocal identities during initial exposures that do not involve explicit talker identification. Participants were assigned to one of three exposure tasks, in which they heard identical stimuli but were differentially required to attend to the talkers' vocal identity or to the verbal content of their speech: (1) matching the talker to a concurrent visual cue (talker-matching); (2) discriminating whether the talker was the same as the prior trial (talker 1-back); or (3) discriminating whether speech content matched the previous trial (verbal 1-back). All participants were then tested on their ability to learn to identify talkers from novel speech content. Critically, we manipulated whether the talkers during this post-test differed from those heard during training. Compared to learning to identify novel talkers, listeners were significantly more accurate learning to identify the talkers they had previously been exposed to in the talker-matching and verbal 1-back tasks, but not the talker 1-back task. The correlation between talker identification test performance and exposure task performance was also greater when the talkers were the same in both tasks. These results suggest that listeners learn talkers' vocal identity implicitly during speech perception, even if they are not explicitly attending to the talkers' identity.
Collapse
|
13
|
Kamiloğlu RG, Tanaka A, Scott SK, Sauter DA. Perception of group membership from spontaneous and volitional laughter. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200404. [PMID: 34775822 PMCID: PMC8591384 DOI: 10.1098/rstb.2020.0404] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 06/28/2021] [Indexed: 11/25/2022] Open
Abstract
Laughter is a ubiquitous social signal. Recent work has highlighted distinctions between spontaneous and volitional laughter, which differ in terms of both production mechanisms and perceptual features. Here, we test listeners' ability to infer group identity from volitional and spontaneous laughter, as well as the perceived positivity of these laughs across cultures. Dutch (n = 273) and Japanese (n = 131) participants listened to decontextualized laughter clips and judged (i) whether the laughing person was from their cultural in-group or an out-group; and (ii) whether they thought the laughter was produced spontaneously or volitionally. They also rated the positivity of each laughter clip. Using frequentist and Bayesian analyses, we show that listeners were able to infer group membership from both spontaneous and volitional laughter, and that performance was equivalent for both types of laughter. Spontaneous laughter was rated as more positive than volitional laughter across the two cultures, and in-group laughs were perceived as more positive than out-group laughs by Dutch but not Japanese listeners. Our results demonstrate that both spontaneous and volitional laughter can be used by listeners to infer laughers' cultural group identity. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part II)'.
Collapse
Affiliation(s)
- Roza G. Kamiloğlu
- Department of Psychology, University of Amsterdam, REC G, Nieuwe Achtergracht 129B, 1001 NK, Amsterdam, The Netherlands
| | - Akihiro Tanaka
- Department of Psychology, Tokyo Woman's Christian University, Tokyo, Japan
| | - Sophie K. Scott
- Institute of Cognitive Neuroscience, University College London, London, UK
| | - Disa A. Sauter
- Department of Psychology, University of Amsterdam, REC G, Nieuwe Achtergracht 129B, 1001 NK, Amsterdam, The Netherlands
| |
Collapse
|
14
|
Clerc O, Fort M, Schwarzer G, Krasotkina A, Vilain A, Méary D, Lœvenbruck H, Pascalis O. Can language modulate perceptual narrowing for faces? Other-race face recognition in infants is modulated by language experience. INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT 2021. [DOI: 10.1177/01650254211053054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Between 6 and 9 months, while infant’s ability to discriminate faces within their own racial group is maintained, discrimination of faces within other-race groups declines to a point where 9-month-old infants fail to discriminate other-race faces. Such face perception narrowing can be overcome in various ways at 9 or 12 months of age, such as presenting faces with emotional expressions. Can language itself modulate face narrowing? Many adult studies suggest that language has an impact on the recognition of individuals. For example, adults remember faces previously paired with their native language more accurately than faces paired with a non-native language. We have previously found that from 9 months of age, own-race faces associated with the native language can be learned and recognized whereas own-race faces associated with a non-native language cannot. Based on the language familiarity effect, we hypothesized that the native language could restore recognition of other-race faces after perceptual narrowing has happened. We tested 9- and 12-month-old Caucasian infants. During a familiarization phase, infants were shown still photographs of an Asian face while audio was played either in the native or in the non-native language. Immediately after the familiarization, the familiar face and a novel one were displayed side-by-side for the recognition test. We compared the proportional looking time to the new face to the chance level. Both 9- and 12-month-old infants exhibited recognition memory for the other-race face when familiarized with non-native speech, but not with their native speech. Native language did not facilitate recognition of other-race faces after 9 months of age but a non-native language did, suggesting that 9- and 12-month-olds already have expectations about which language an individual should talk (or at least not talk). Our results confirm the strong links between face and speech processing during infancy.
Collapse
Affiliation(s)
- Olivier Clerc
- LPNC, Université Grenoble Alpes, Grenoble, France
- LPNC, CNRS, Grenoble, France
| | - Mathilde Fort
- LPNC, Université Grenoble Alpes, Grenoble, France
- Centre de Recherche en NeuroSciences de Lyon, CRNL UMR 5292, Université Lyon 1, Lyon, France
| | - Gudrun Schwarzer
- Department of Developmental Psychology, Justus-Liebig-University Giessen, Germany
| | - Anna Krasotkina
- Department of Developmental Psychology, Justus-Liebig-University Giessen, Germany
| | - Anne Vilain
- Gipsa-Lab, Département Parole et Cognition, CNRS UMR 5216 & Université Grenoble Alpes, Grenoble, France
| | - David Méary
- LPNC, Université Grenoble Alpes, Grenoble, France
- LPNC, CNRS, Grenoble, France
| | - Hélène Lœvenbruck
- LPNC, Université Grenoble Alpes, Grenoble, France
- LPNC, CNRS, Grenoble, France
| | - Olivier Pascalis
- LPNC, Université Grenoble Alpes, Grenoble, France
- LPNC, CNRS, Grenoble, France
| |
Collapse
|
15
|
Koelewijn T, Gaudrain E, Tamati T, Başkent D. The effects of lexical content, acoustic and linguistic variability, and vocoding on voice cue perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:1620. [PMID: 34598602 DOI: 10.1121/10.0005938] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 08/02/2021] [Indexed: 06/13/2023]
Abstract
Perceptual differences in voice cues, such as fundamental frequency (F0) and vocal tract length (VTL), can facilitate speech understanding in challenging conditions. Yet, we hypothesized that in the presence of spectrotemporal signal degradations, as imposed by cochlear implants (CIs) and vocoders, acoustic cues that overlap for voice perception and phonemic categorization could be mistaken for one another, leading to a strong interaction between linguistic and indexical (talker-specific) content. Fifteen normal-hearing participants performed an odd-one-out adaptive task measuring just-noticeable differences (JNDs) in F0 and VTL. Items used were words (lexical content) or time-reversed words (no lexical content). The use of lexical content was either promoted (by using variable items across comparison intervals) or not (fixed item). Finally, stimuli were presented without or with vocoding. Results showed that JNDs for both F0 and VTL were significantly smaller (better) for non-vocoded compared with vocoded speech and for fixed compared with variable items. Lexical content (forward vs reversed) affected VTL JNDs in the variable item condition, but F0 JNDs only in the non-vocoded, fixed condition. In conclusion, lexical content had a positive top-down effect on VTL perception when acoustic and linguistic variability was present but not on F0 perception. Lexical advantage persisted in the most degraded conditions and vocoding even enhanced the effect of item variability, suggesting that linguistic content could support compensation for poor voice perception in CI users.
Collapse
Affiliation(s)
- Thomas Koelewijn
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Etienne Gaudrain
- CNRS Unité Mixte de Recherche 5292, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Institut National de la Santé et de la Recherche Médicale, UMRS 1028, Université Claude Bernard Lyon 1, Université de Lyon, Lyon, France
| | - Terrin Tamati
- Department of Otolaryngology-Head & Neck Surgery, The Ohio State University Wexner Medical Center, The Ohio State University, Columbus, Ohio, USA
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
16
|
Abstract
The way we process language is influenced by our experience. We are more likely to attend to features that proved to be useful in the past. Importantly, the size of individuals’ social network can influence their experience, and consequently, how they process language. In the case of voice recognition, having a larger social network might provide more variable input and thus enhance the ability to recognise new voices. On the other hand, learning to recognise voices is more demanding and less beneficial for people with a larger social network as they have more speakers to learn yet spend less time with each. This paper tests whether social network size influences voice recognition, and if so, in which direction. Native Dutch speakers listed their social network and performed a voice recognition task. Results showed that people with larger social networks were poorer at learning to recognise voices. Experiment 2 replicated the results with a British sample and English stimuli. Experiment 3 showed that the effect does not generalise to voice recognition in an unfamiliar language suggesting that social network size influences attention to the linguistic rather than non-linguistic markers that differentiate speakers. The studies thus show that our social network size influences our inclination to learn speaker-specific patterns in our environment, and consequently, the development of skills that rely on such learned patterns, such as voice recognition.
Collapse
|
17
|
Yu ME, Schertz J, Johnson EK. The Other Accent Effect in Talker Recognition: Now You See It, Now You Don't. Cogn Sci 2021; 45:e12986. [PMID: 34170043 DOI: 10.1111/cogs.12986] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 04/16/2021] [Accepted: 04/29/2021] [Indexed: 11/30/2022]
Abstract
The existence of the Language Familiarity Effect (LFE), where talkers of a familiar language are easier to identify than talkers of an unfamiliar language, is well-documented and uncontroversial. However, a closely related phenomenon known as the Other Accent Effect (OAE), where accented talkers are more difficult to recognize, is less well understood. There are several possible explanations for why the OAE exists, but to date, little data exist to adjudicate differences between them. Here, we begin to address this issue by directly comparing listeners' recognition of talkers who speak in different types of accents, and by examining both the LFE and OAE in the same set of listeners. Specifically, Canadian English listeners were tested on their ability to recognize talkers within four types of voice line-ups: Canadian English talkers, Australian English talkers, Mandarin-accented English talkers, and Mandarin talkers. We predicted that the OAE would be present for talkers of Mandarin-accented English but not for talkers of Australian English-which is precisely what we observed. We also observed a disconnect between listeners' confidence and performance across different types of accents; that is, listeners performed equally poorly with Mandarin and Mandarin-accented talkers, but they were more confident with their performance with the latter group of talkers. The present findings set the stage for further investigation into the nature of the OAE by exploring a range of potential explanations for the effect, and introducing important implications for forensic scientists' evaluation of ear witness testimony.
Collapse
Affiliation(s)
| | - Jessamyn Schertz
- Department of Language Studies, University of Toronto Mississauga
| | | |
Collapse
|
18
|
Zhang L, Li Y, Zhou H, Zhang Y, Shu H. Language-familiarity effect on voice recognition by blind listeners. JASA EXPRESS LETTERS 2021; 1:055201. [PMID: 36154110 DOI: 10.1121/10.0004848] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The current study compared the language-familiarity effect on voice recognition by blind listeners and sighted individuals. Both groups performed better on the recognition of native voices than nonnative voices, but the language-familiarity effect is smaller in the blind than in the sighted group, with blind individuals performing better than their sighted counterparts only on the recognition of nonnative voices. Furthermore, recognition of native and nonnative voices was significantly correlated only in the blind group. These results indicate that language familiarity affects voice recognition by blind listeners, who differ to some extent from their sighted counterparts in the use of linguistic and nonlinguistic features during voice recognition.
Collapse
Affiliation(s)
- Linjun Zhang
- Beijing Advanced Innovation Center for Language Resources and College of Advanced Chinese Training, Beijing Language and Culture University, Beijing 100083, China
| | - Yu Li
- Division of Science and Technology, BNU-HKBU United International College, Zhuhai 519085, Guangdong, China
| | - Hong Zhou
- International Cultural Exchange School, Shanghai University of Finance and Economics, Shanghai 200433, China
| | - Yang Zhang
- Department of Speech-Language-Hearing Sciences and Center for Neurobehavioral Development, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Hua Shu
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China , , , ,
| |
Collapse
|
19
|
Developmental improvements in talker recognition are specific to the native language. J Exp Child Psychol 2021; 202:104991. [DOI: 10.1016/j.jecp.2020.104991] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 08/24/2020] [Accepted: 08/24/2020] [Indexed: 11/17/2022]
|
20
|
Luthra S. The Role of the Right Hemisphere in Processing Phonetic Variability Between Talkers. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2021; 2:138-151. [PMID: 37213418 PMCID: PMC10174361 DOI: 10.1162/nol_a_00028] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 11/13/2020] [Indexed: 05/23/2023]
Abstract
Neurobiological models of speech perception posit that both left and right posterior temporal brain regions are involved in the early auditory analysis of speech sounds. However, frank deficits in speech perception are not readily observed in individuals with right hemisphere damage. Instead, damage to the right hemisphere is often associated with impairments in vocal identity processing. Herein lies an apparent paradox: The mapping between acoustics and speech sound categories can vary substantially across talkers, so why might right hemisphere damage selectively impair vocal identity processing without obvious effects on speech perception? In this review, I attempt to clarify the role of the right hemisphere in speech perception through a careful consideration of its role in processing vocal identity. I review evidence showing that right posterior superior temporal, right anterior superior temporal, and right inferior / middle frontal regions all play distinct roles in vocal identity processing. In considering the implications of these findings for neurobiological accounts of speech perception, I argue that the recruitment of right posterior superior temporal cortex during speech perception may specifically reflect the process of conditioning phonetic identity on talker information. I suggest that the relative lack of involvement of other right hemisphere regions in speech perception may be because speech perception does not necessarily place a high burden on talker processing systems, and I argue that the extant literature hints at potential subclinical impairments in the speech perception abilities of individuals with right hemisphere damage.
Collapse
|
21
|
Sharma NK, Krishnamohan V, Ganapathy S, Gangopadhayay A, Fink L. Acoustic and linguistic features influence talker change detection. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:EL414. [PMID: 33261377 DOI: 10.1121/10.0002462] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Accepted: 10/20/2020] [Indexed: 06/12/2023]
Abstract
A listening test is proposed in which human participants detect talker changes in two natural, multi-talker speech stimuli sets-a familiar language (English) and an unfamiliar language (Chinese). Miss rate, false-alarm rate, and response times (RT) showed a significant dependence on language familiarity. Linear regression modeling of RTs using diverse acoustic features derived from the stimuli showed recruitment of a pool of acoustic features for the talker change detection task. Further, benchmarking the same task against the state-of-the-art machine diarization system showed that the machine system achieves human parity for the familiar language but not for the unfamiliar language.
Collapse
Affiliation(s)
- Neeraj Kumar Sharma
- Learning and Extraction of Acoustic Patterns Lab, Indian Institute of Science, Bangalore
| | - Venkat Krishnamohan
- Learning and Extraction of Acoustic Patterns Lab, Indian Institute of Science, Bangalore
| | - Sriram Ganapathy
- Learning and Extraction of Acoustic Patterns Lab, Indian Institute of Science, Bangalore
| | - Ahana Gangopadhayay
- Electrical and Systems Engineering, Washington University in St. Louis, Missouri, USA
| | - Lauren Fink
- Music Department, Max Planck Institute for Empirical Aesthetics, Frankfurt, , , , ,
| |
Collapse
|
22
|
May I Speak Freely? The Difficulty in Vocal Identity Processing Across Free and Scripted Speech. JOURNAL OF NONVERBAL BEHAVIOR 2020. [DOI: 10.1007/s10919-020-00348-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractIn the fields of face recognition and voice recognition, a growing literature now suggests that the ability to recognize an individual despite changes from one instance to the next is a considerable challenge. The present paper reports on one experiment in the voice domain designed to determine whether a change in the mere style of speech may result in a measurable difficulty when trying to discriminate between speakers. Participants completed a speaker discrimination task to pairs of speech clips, which represented either free speech or scripted speech segments. The results suggested that speaker discrimination was significantly better when the style of speech did not change compared to when it did change, and was significantly better from scripted than from free speech segments. These results support the emergent body of evidence suggesting that within-identity variability is a challenge, and the forensic implications of such a mild change in speech style are discussed.
Collapse
|
23
|
Cooper A, Fecher N, Johnson EK. Identifying children's voices. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:324. [PMID: 32752764 DOI: 10.1121/10.0001576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 06/26/2020] [Indexed: 06/11/2023]
Abstract
Human adults rely on both acoustic and linguistic information to identify adult talkers. Assuming favorable conditions, adult listeners recognize other adults fairly accurately and quickly. But how well can adult listeners recognize child talkers, whose speech productions often differ dramatically from adult speech productions? Although adult talker recognition has been heavily studied, only one study to date has directly compared the recognition of unfamiliar adult and child talkers [Creel and Jimenez (2012). J. Exp. Child Psychol. 113(4), 487-509]. Therefore, the current study revisits this question with a much larger and younger sample of child talkers (N = 20); performance with adult talkers (N = 20) was also tested to provide a baseline. In Experiment 1, adults successfully distinguished between adult talkers in an AX discrimination task but performed much worse with child talkers. In Experiment 2, adults were slower and less accurate at learning to identify child talkers than adult talkers in a training-identification task. Finally, in Experiment 3, adults failed to improve at identifying child talkers after three days of training with numerous child voices. Taken together, these findings reveal a sizable difference in adults' ability to recognize child versus adult talkers. Possible explanations and implications for understanding human talker recognition are discussed.
Collapse
Affiliation(s)
- Angela Cooper
- Department of Psychology, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, Ontario L5L 1C6, Canada
| | - Natalie Fecher
- Department of Psychology, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, Ontario L5L 1C6, Canada
| | - Elizabeth K Johnson
- Department of Psychology, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, Ontario L5L 1C6, Canada
| |
Collapse
|
24
|
Shao J, Wang L, Zhang C. Talker Processing in Mandarin-Speaking Congenital Amusics. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:1361-1375. [PMID: 32343927 DOI: 10.1044/2020_jslhr-19-00209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Purpose The ability to recognize individuals from their vocalizations is an important trait of human beings. In the current study, we aimed to examine how congenital amusia, an inborn pitch-processing disorder, affects discrimination and identification of talkers' voices. Method Twenty Mandarin-speaking amusics and 20 controls were tested on talker discrimination and identification in four types of contexts that varied in the degree of language familiarity: Mandarin real words, Mandarin pseudowords, Arabic words, and reversed Mandarin speech. Results The language familiarity effect was more evident in the talker identification task than the discrimination task for both participant groups, and talker identification accuracy decreased as native phonological representations were removed from the stimuli. Importantly, amusics demonstrated degraded performance in both native speech conditions that contained phonological/linguistic information to facilitate talker identification and nonnative conditions where talker voice processing primarily relied on phonetics cues, including pitch. Moreover, the performance in talker processing can be predicted by the participants' musical ability and phonological memory capacity. Conclusions The results provided a first set of behavioral evidence that individuals with amusia are impaired in the ability of human voice identification. Meanwhile, it is found that amusia is not only a pitch disorder but is likely to affect the phonological processing of speech, in terms of using phonological information in native speech to analyze a talker's identity. The above findings expanded the understanding of the nature and scope of congenital amusia. Supplemental Material https://doi.org/10.23641/asha.12170379.
Collapse
Affiliation(s)
- Jing Shao
- School of Humanities, Shanghai Jiao Tong University, China
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Lan Wang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Caicai Zhang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, China
- Research Centre for Language, Cognition, and Neuroscience, The Hong Kong Polytechnic University, China
| |
Collapse
|
25
|
Bodin C, Belin P. Exploring the cerebral substrate of voice perception in primate brains. Philos Trans R Soc Lond B Biol Sci 2019; 375:20180386. [PMID: 31735143 PMCID: PMC6895549 DOI: 10.1098/rstb.2018.0386] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
One can consider human language to be the Swiss army knife of the vast domain of animal communication. There is now growing evidence suggesting that this technology may have emerged from already operational material instead of being a sudden innovation. Sharing ideas and thoughts with conspecifics via language constitutes an amazing ability, but what value would it hold if our conspecifics were not first detected and recognized? Conspecific voice (CV) perception is fundamental to communication and widely shared across the animal kingdom. Two questions that arise then are: is this apparently shared ability reflected in common cerebral substrate? And, how has this substrate evolved? The paper addresses these questions by examining studies on the cerebral basis of CV perception in humans' closest relatives, non-human primates. Neuroimaging studies, in particular, suggest the existence of a ‘voice patch system’, a network of interconnected cortical areas that can provide a common template for the cerebral processing of CV in primates. This article is part of the theme issue ‘What can animal communication teach us about human language?’
Collapse
Affiliation(s)
- Clémentine Bodin
- Institut de Neurosciences de la Timone, UMR 7289 Centre National de la Recherche Scientifique and Aix-Marseille Université, Marseille, France
| | - Pascal Belin
- Institut de Neurosciences de la Timone, UMR 7289 Centre National de la Recherche Scientifique and Aix-Marseille Université, Marseille, France.,Département de Psychologie, Université de Montréal, Montréal, Canada
| |
Collapse
|
26
|
Perrachione TK, Furbeck KT, Thurston EJ. Acoustic and linguistic factors affecting perceptual dissimilarity judgments of voices. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3384. [PMID: 31795676 PMCID: PMC7043842 DOI: 10.1121/1.5126697] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2018] [Revised: 08/18/2019] [Accepted: 09/03/2019] [Indexed: 05/20/2023]
Abstract
The human voice is a complex acoustic signal that conveys talker identity via individual differences in numerous features, including vocal source acoustics, vocal tract resonances, and dynamic articulations during speech. It remains poorly understood how differences in these features contribute to perceptual dissimilarity of voices and, moreover, whether linguistic differences between listeners and talkers interact during perceptual judgments of voices. Here, native English- and Mandarin-speaking listeners rated the perceptual dissimilarity of voices speaking English or Mandarin from either forward or time-reversed speech. The language spoken by talkers, but not listeners, principally influenced perceptual judgments of voices. Perceptual dissimilarity judgments of voices were always highly correlated between listener groups and forward/time-reversed speech. Representational similarity analyses that explored how acoustic features (fundamental frequency mean and variation, jitter, harmonics-to-noise ratio, speech rate, and formant dispersion) contributed to listeners' perceptual dissimilarity judgments, including how talker- and listener-language affected these relationships, found the largest effects relating to voice pitch. Overall, these data suggest that, while linguistic factors may influence perceptual judgments of voices, the magnitude of such effects tends to be very small. Perceptual judgments of voices by listeners of different native language backgrounds tend to be more alike than different.
Collapse
Affiliation(s)
- Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Kristina T Furbeck
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Emily J Thurston
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
27
|
The Jena Speaker Set (JESS)-A database of voice stimuli from unfamiliar young and old adult speakers. Behav Res Methods 2019; 52:990-1007. [PMID: 31637667 DOI: 10.3758/s13428-019-01296-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Here we describe the Jena Speaker Set (JESS), a free database for unfamiliar adult voice stimuli, comprising voices from 61 young (18-25 years) and 59 old (60-81 years) female and male speakers uttering various sentences, syllables, read text, semi-spontaneous speech, and vowels. Listeners rated two voice samples (short sentences) per speaker for attractiveness, likeability, two measures of distinctiveness ("deviation"-based [DEV] and "voice in the crowd"-based [VITC]), regional accent, and age. Interrater reliability was high, with Cronbach's α between .82 and .99. Young voices were generally rated as more attractive than old voices, but particularly so when male listeners judged female voices. Moreover, young female voices were rated as more likeable than both young male and old female voices. Young voices were judged to be less distinctive than old voices according to the DEV measure, with no differences in the VITC measure. In age ratings, listeners almost perfectly discriminated young from old voices; additionally, young female voices were perceived as being younger than young male voices. Correlations between the rating dimensions above demonstrated (among other things) that DEV-based distinctiveness was strongly negatively correlated with rated attractiveness and likeability. By contrast, VITC-based distinctiveness was uncorrelated with rated attractiveness and likeability in young voices, although a moderate negative correlation was observed for old voices. Overall, the present results demonstrate systematic effects of vocal age and gender on impressions based on the voice and inform as to the selection of suitable voice stimuli for further research into voice perception, learning, and memory.
Collapse
|
28
|
Fecher N, Johnson EK. By 4.5 Months, Linguistic Experience Already Affects Infants’ Talker Processing Abilities. Child Dev 2019; 90:1535-1543. [DOI: 10.1111/cdev.13280] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
29
|
Hierarchical contributions of linguistic knowledge to talker identification: Phonological versus lexical familiarity. Atten Percept Psychophys 2019; 81:1088-1107. [PMID: 31218598 DOI: 10.3758/s13414-019-01778-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Listeners identify talkers more accurately when listening to their native language compared to an unfamiliar, foreign language. This language-familiarity effect in talker identification has been shown to arise from familiarity with both the sound patterns (phonetics and phonology) and the linguistic content (words) of one's native language. However, it has been unknown whether these two sources of information contribute independently to talker identification abilities, particularly whether hearing familiar words can facilitate talker identification in the absence of familiar phonetics. To isolate the contribution of lexical familiarity, we conducted three experiments that tested listeners' ability to identify talkers saying familiar words, but with unfamiliar phonetics. In two experiments, listeners identified talkers from recordings of their native language (English), an unfamiliar foreign language (Mandarin Chinese), or "hybrid" speech stimuli (sentences spoken in Mandarin, but which can be convincingly coerced to sound like English when presented with subtitles that prime plausible English-language lexical interpretations based on the Mandarin phonetics). In a third experiment, we explored natural variation in lexical-phonetic congruence as listeners identified talkers with varying degrees of a Mandarin accent. Priming listeners to hear English speech did not improve their ability to identify talkers speaking Mandarin, even after additional training, and talker identification accuracy decreased as talkers' phonetics became increasingly dissimilar to American English. Together, these experiments indicate that unfamiliar sound patterns preclude talker identification benefits otherwise afforded by familiar words. These results suggest that linguistic representations contribute hierarchically to talker identification; the facilitatory effect of familiar words requires the availability of familiar phonological forms.
Collapse
|
30
|
Ganugapati D, Theodore RM. Structured phonetic variation facilitates talker identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:EL469. [PMID: 31255121 DOI: 10.1121/1.5100166] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Accepted: 04/12/2019] [Indexed: 06/09/2023]
Abstract
Listeners use talker-specific phonetic structure to facilitate language comprehension. This study tests whether sensitivity to talker-specific phonetic variation also facilitates talker identification. During training, two listener groups learned to associate talkers' voices with cartoon pseudo-faces. For one group, each talker produced characteristically different voice-onset-time values; for the other group, no talker-specific phonetic structure was present. After training, listeners were tested on talker identification for trained and novel words, which was improved for those who heard structured phonetic variation compared to those who did not. These findings suggest an additive benefit of talker-specific phonetic variation for talker identification beyond traditional indexical cues.
Collapse
Affiliation(s)
- Divya Ganugapati
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Unit 1085, Storrs, Connecticut 06269-1085, ,
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Unit 1085, Storrs, Connecticut 06269-1085, ,
| |
Collapse
|
31
|
Abstract
Human voices are extremely variable: The same person can sound very different depending on whether they are speaking, laughing, shouting or whispering. In order to successfully recognise someone from their voice, a listener needs to be able to generalize across these different vocal signals (‘telling people together’). However, in most studies of voice-identity processing to date, the substantial within-person variability has been eliminated through the use of highly controlled stimuli, thus focussing on how we tell people apart. We argue that this obscures our understanding of voice-identity processing by controlling away an essential feature of vocal stimuli that may include diagnostic information. In this paper, we propose that we need to extend the focus of voice-identity research to account for both “telling people together” as well as “telling people apart.” That is, we must account for whether, and to what extent, listeners can overcome within-person variability to obtain a stable percept of person identity from vocal cues. To do this, our theoretical and methodological frameworks need to be adjusted to explicitly include the study of within-person variability.
Collapse
|
32
|
Tamati TN, Janse E, Başkent D. Perceptual Discrimination of Speaking Style Under Cochlear Implant Simulation. Ear Hear 2019; 40:63-76. [PMID: 29742545 PMCID: PMC6319584 DOI: 10.1097/aud.0000000000000591] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 03/12/2018] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Real-life, adverse listening conditions involve a great deal of speech variability, including variability in speaking style. Depending on the speaking context, talkers may use a more casual, reduced speaking style or a more formal, careful speaking style. Attending to fine-grained acoustic-phonetic details characterizing different speaking styles facilitates the perception of the speaking style used by the talker. These acoustic-phonetic cues are poorly encoded in cochlear implants (CIs), potentially rendering the discrimination of speaking style difficult. As a first step to characterizing CI perception of real-life speech forms, the present study investigated the perception of different speaking styles in normal-hearing (NH) listeners with and without CI simulation. DESIGN The discrimination of three speaking styles (conversational reduced speech, speech from retold stories, and carefully read speech) was assessed using a speaking style discrimination task in two experiments. NH listeners classified sentence-length utterances, produced in one of the three styles, as either formal (careful) or informal (conversational). Utterances were presented with unmodified speaking rates in experiment 1 (31 NH, young adult Dutch speakers) and with modified speaking rates set to the average rate across all utterances in experiment 2 (28 NH, young adult Dutch speakers). In both experiments, acoustic noise-vocoder simulations of CIs were used to produce 12-channel (CI-12) and 4-channel (CI-4) vocoder simulation conditions, in addition to a no-simulation condition without CI simulation. RESULTS In both experiments 1 and 2, NH listeners were able to reliably discriminate the speaking styles without CI simulation. However, this ability was reduced under CI simulation. In experiment 1, participants showed poor discrimination of speaking styles under CI simulation. Listeners used speaking rate as a cue to make their judgements, even though it was not a reliable cue to speaking style in the study materials. In experiment 2, without differences in speaking rate among speaking styles, listeners showed better discrimination of speaking styles under CI simulation, using additional cues to complete the task. CONCLUSIONS The findings from the present study demonstrate that perceiving differences in three speaking styles under CI simulation is a difficult task because some important cues to speaking style are not fully available in these conditions. While some cues like speaking rate are available, this information alone may not always be a reliable indicator of a particular speaking style. Some other reliable speaking styles cues, such as degraded acoustic-phonetic information and variability in speaking rate within an utterance, may be available but less salient. However, as in experiment 2, listeners' perception of speaking styles may be modified if they are constrained or trained to use these additional cues, which were more reliable in the context of the present study. Taken together, these results suggest that dealing with speech variability in real-life listening conditions may be a challenge for CI users.
Collapse
Affiliation(s)
- Terrin N. Tamati
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioral and Cognitive Neurosciences, Graduate School of Medical Sciences, University of Groningen, Groningen, The Netherlands
| | - Esther Janse
- Centre for Language Studies, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioral and Cognitive Neurosciences, Graduate School of Medical Sciences, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
33
|
Fecher N, Paquette‐Smith M, Johnson EK. Resolving the (Apparent) Talker Recognition Paradox in Developmental Speech Perception. INFANCY 2019; 24:570-588. [DOI: 10.1111/infa.12290] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 03/05/2019] [Accepted: 03/11/2019] [Indexed: 11/30/2022]
Affiliation(s)
- Natalie Fecher
- Department of Psychology University of Toronto Mississauga
| | | | | |
Collapse
|
34
|
Lavan N, Burston LF, Ladwa P, Merriman SE, Knight S, McGettigan C. Breaking voice identity perception: Expressive voices are more confusable for listeners. Q J Exp Psychol (Hove) 2019; 72:2240-2248. [PMID: 30808271 DOI: 10.1177/1747021819836890] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The human voice is a highly flexible instrument for self-expression, yet voice identity perception is largely studied using controlled speech recordings. Using two voice-sorting tasks with naturally varying stimuli, we compared the performance of listeners who were familiar and unfamiliar with the TV show Breaking Bad. Listeners organised audio clips of speech with (1) low-expressiveness and (2) high-expressiveness into perceived identities. We predicted that increased expressiveness (e.g., shouting, strained voice) would significantly impair performance. Overall, while unfamiliar listeners were less able to generalise identity across exemplars, the two groups performed equivalently well when telling voices apart when dealing with low-expressiveness stimuli. However, high vocal expressiveness significantly impaired telling apart in both the groups: this led to increased misidentifications, where sounds from one character were assigned to the other. These misidentifications were highly consistent for familiar listeners but less consistent for unfamiliar listeners. Our data suggest that vocal flexibility has powerful effects on identity perception, where changes in the acoustic properties of vocal signals introduced by expressiveness lead to effects apparent in familiar and unfamiliar listeners alike. At the same time, expressiveness appears to have affected other aspects of voice identity processing selectively in one listener group but not the other, thus revealing complex interactions of stimulus properties and listener characteristics (i.e., familiarity) in identity processing.
Collapse
Affiliation(s)
- Nadine Lavan
- 1 Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK.,2 Department of Psychology, Royal Holloway, University of London, London, UK
| | - Luke Fk Burston
- 2 Department of Psychology, Royal Holloway, University of London, London, UK
| | - Paayal Ladwa
- 2 Department of Psychology, Royal Holloway, University of London, London, UK
| | - Siobhan E Merriman
- 2 Department of Psychology, Royal Holloway, University of London, London, UK
| | - Sarah Knight
- 1 Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- 1 Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK.,2 Department of Psychology, Royal Holloway, University of London, London, UK
| |
Collapse
|
35
|
Zhang D, Chen Y, Hou X, Wu YJ. Near-infrared spectroscopy reveals neural perception of vocal emotions in human neonates. Hum Brain Mapp 2019; 40:2434-2448. [PMID: 30697881 DOI: 10.1002/hbm.24534] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2018] [Revised: 01/19/2019] [Accepted: 01/20/2019] [Indexed: 12/20/2022] Open
Abstract
Processing affective prosody, that is the emotional tone of a speaker, is fundamental to human communication and adaptive behaviors. Previous studies have mainly focused on adults and infants; thus the neural mechanisms underlying the processing of affective prosody in newborns remain unclear. Here, we used near-infrared spectroscopy to examine the ability of 0-to-4-day-old neonates to discriminate emotions conveyed by speech prosody in their maternal language and a foreign language. Happy, fearful, and angry prosodies enhanced neural activation in the right superior temporal gyrus relative to neutral prosody in the maternal but not the foreign language. Happy prosody elicited greater activation than negative prosody in the left superior frontal gyrus and the left angular gyrus, regions that have not been associated with affective prosody processing in infants or adults. These findings suggest that sensitivity to affective prosody is formed through prenatal exposure to vocal stimuli of the maternal language. Furthermore, the sensitive neural correlates appeared more distributed in neonates than infants, indicating a high-level of neural specialization between the neonatal stage and early infancy. Finally, neonates showed preferential neural responses to positive over negative prosody, which is contrary to the "negativity bias" phenomenon established in adult and infant studies.
Collapse
Affiliation(s)
- Dandan Zhang
- College of Psychology and Sociology, Shenzhen University, Shenzhen, China.,Shenzhen Key Laboratory of Affective and Social Cognitive Science, Shenzhen University, Shenzhen, China
| | - Yu Chen
- College of Psychology and Sociology, Shenzhen University, Shenzhen, China
| | - Xinlin Hou
- Department of Pediatrics, Peking University First Hospital, Beijing, China
| | - Yan Jing Wu
- Faculty of Foreign Languages, Ningbo University, Ningbo, China
| |
Collapse
|
36
|
Levi SV. Methodological considerations for interpreting the Language Familiarity Effect in talker processing. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2018; 10:e1483. [DOI: 10.1002/wcs.1483] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 09/27/2018] [Accepted: 09/28/2018] [Indexed: 11/06/2022]
Affiliation(s)
- Susannah V. Levi
- Department of Communicative Sciences and Disorders New York University New York New York
| |
Collapse
|
37
|
Stevenage SV, Neil GJ, Parsons B, Humphreys A. A sound effect: Exploration of the distinctiveness advantage in voice recognition. APPLIED COGNITIVE PSYCHOLOGY 2018; 32:526-536. [PMID: 30333682 PMCID: PMC6175009 DOI: 10.1002/acp.3424] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2017] [Revised: 05/21/2018] [Accepted: 06/01/2018] [Indexed: 11/30/2022]
Abstract
Two experiments are presented, which explore the presence of a distinctiveness advantage when recognising unfamiliar voices. In Experiment 1, distinctive voices were recognised significantly better, and with greater confidence, in a sequential same/different matching task compared with typical voices. These effects were replicated and extended in Experiment 2, as distinctive voices were recognised better even under challenging listening conditions imposed by nonsense sentences and temporal reversal. Taken together, the results aligned well with similar results when processing faces, and provided a useful point of comparison between voice and face processing.
Collapse
Affiliation(s)
| | - Greg J. Neil
- Southampton Solent UniversitySchool of Sport, Health and Social SciencesSouthamptonUK
| | - Beth Parsons
- University of WinchesterDepartment of PsychologyWinchesterUK
| | - Abi Humphreys
- University of SouthamptonDepartment of PsychologySouthamptonUK
| |
Collapse
|
38
|
Electrophysiological correlates of voice memory for young and old speakers in young and old listeners. Neuropsychologia 2018; 116:215-227. [PMID: 28802769 DOI: 10.1016/j.neuropsychologia.2017.08.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 08/04/2017] [Accepted: 08/07/2017] [Indexed: 11/23/2022]
Abstract
Faces of one's own-age group are easier to recognize than other-age faces. Using behavioral measures and EEG, we studied whether an own-age bias (OAB) also exists in voice memory. Young (19 - 26 years) and old (60-75 years) participants studied young (18-25 years) and old (60-77 years) unfamiliar voices from short sentences. Subsequently, they classified studied and novel voices as "old" (i.e. studied) or "new", from the same sentences. Recognition performance was higher in young compared to old participants, and for old compared to young voices, with no OAB. At the same time, we found evidence for higher distinctiveness of old compared to young voices, both in terms of acoustic measures and subjective ratings (independent of rater age). Analyses of event-related brain potentials (ERPs) indicated more negative-going deflections (400-1000ms) for old compared to young voices in young participants. In old participants, we observed a reversed OLD/NEW memory effect, with overall more positive amplitudes for novel compared to studied old (but not young) voices (400-1000ms). Time-frequency analyses revealed less beta power (16-26Hz) for young compared to old voices at left anterior sites, and also reduced beta power for correctly recognized studied (compared to novel) voices at left posterior sites (300-900ms). These findings could suggest an engagement of cortical areas during stimulus-specific recollection from about 300ms, in a task that emphasized the analysis of individual acoustic features.
Collapse
|
39
|
Maguinness C, Roswandowitz C, von Kriegstein K. Understanding the mechanisms of familiar voice-identity recognition in the human brain. Neuropsychologia 2018; 116:179-193. [DOI: 10.1016/j.neuropsychologia.2018.03.039] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Revised: 03/28/2018] [Accepted: 03/29/2018] [Indexed: 11/26/2022]
|
40
|
Levi S. Another bilingual advantage? Perception of talker-voice information. BILINGUALISM (CAMBRIDGE, ENGLAND) 2018; 21:523-536. [PMID: 29755282 PMCID: PMC5945195 DOI: 10.1017/s1366728917000153] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A bilingual advantage has been found in both cognitive and social tasks. In the current study, we examine whether there is a bilingual advantage in how children process information about who is talking (talker-voice information). Younger and older groups of monolingual and bilingual children completed the following talker-voice tasks with bilingual speakers: a discrimination task in English and German (an unfamiliar language), and a talker-voice learning task in which they learned to identify the voices of three unfamiliar speakers in English. Results revealed effects of age and bilingual status. Across the tasks, older children performed better than younger children and bilingual children performed better than monolingual children. Improved talker-voice processing by the bilingual children suggests that a bilingual advantage exists in a social aspect of speech perception, where the focus is not on processing the linguistic information in the signal, but instead on processing information about who is talking.
Collapse
|
41
|
Xie X, Weatherholtz K, Bainton L, Rowe E, Burchill Z, Liu L, Jaeger TF. Rapid adaptation to foreign-accented speech and its transfer to an unfamiliar talker. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:2013. [PMID: 29716296 PMCID: PMC5895469 DOI: 10.1121/1.5027410] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 02/28/2018] [Accepted: 03/01/2018] [Indexed: 05/31/2023]
Abstract
How fast can listeners adapt to unfamiliar foreign accents? Clarke and Garrett [J. Acoust. Soc. Am. 116, 3647-3658 (2004)] (CG04) reported that native-English listeners adapted to foreign-accented English within a minute, demonstrating improved processing of spoken words. In two web-based experiments that closely follow the design of CG04, the effects of rapid accent adaptation are examined and its generalization is explored across talkers. Experiment 1 replicated the core finding of CG04 that initial perceptual difficulty with foreign-accented speech can be attenuated rapidly by a brief period of exposure to an accented talker. Importantly, listeners showed both faster (replicating CG04) and more accurate (extending CG04) comprehension of this talker. Experiment 2 revealed evidence that such adaptation transferred to a different talker of a same accent. These results highlight the rapidity of short-term accent adaptation and raise new questions about the underlying mechanism. It is suggested that the web-based paradigm provides a useful tool for investigations in speech adaptation.
Collapse
Affiliation(s)
- Xin Xie
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| | - Kodi Weatherholtz
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| | - Larisa Bainton
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| | - Emily Rowe
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| | - Zachary Burchill
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| | - Linda Liu
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| | - T Florian Jaeger
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, USA
| |
Collapse
|
42
|
Fecher N, Johnson EK. Effects of language experience and task demands on talker recognition by children and adults. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:2409. [PMID: 29716261 DOI: 10.1121/1.5032199] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Talker recognition is a language-dependent process, with listeners recognizing talkers better when the talkers speak a familiar versus an unfamiliar language. This language familiarity effect (LFE) is firmly established in adults, but its developmental trajectory in children is not well understood. Some evidence suggests that the effect already exists in infancy, but little is known about how it unfolds in childhood. The present study explored whether the strength of the LFE increases in early childhood. Adults and children were tested in their native language and a foreign language using a "same-different" talker discrimination task and a "voice line-up" talker recognition task. Results showed that adults and 6-year-olds, but not 5-year-olds, exhibit a robust LFE, suggesting that the effect strengthens as children's language competence increases. For both adults and older children, the emergence of an LFE moreover appeared to be task-dependent. This study contributes to a better understanding of how children develop mature talker recognition abilities and when children's processing of indexical and linguistic information in speech approaches adult-like levels. Furthermore, the findings reported here contribute to the debates regarding the origins of the LFE-a hallmark of adult talker recognition.
Collapse
Affiliation(s)
- Natalie Fecher
- Department of Psychology, University of Toronto, 3359 Mississauga Road, Mississauga, Ontario, L5L 1C6, Canada
| | - Elizabeth K Johnson
- Department of Psychology, University of Toronto, 3359 Mississauga Road, Mississauga, Ontario, L5L 1C6, Canada
| |
Collapse
|
43
|
Lavan N, Short B, Wilding A, McGettigan C. Impoverished encoding of speaker identity in spontaneous laughter. EVOL HUM BEHAV 2018. [DOI: 10.1016/j.evolhumbehav.2017.11.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
44
|
Vaughn CR, Bradlow AR. Processing Relationships Between Language-Being-Spoken and Other Speech Dimensions in Monolingual and Bilingual Listeners. LANGUAGE AND SPEECH 2017; 60:530-561. [PMID: 29216813 DOI: 10.1177/0023830916669536] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
While indexical information is implicated in many levels of language processing, little is known about the internal structure of the system of indexical dimensions, particularly in bilinguals. A series of three experiments using the speeded classification paradigm investigated the relationship between various indexical and non-linguistic dimensions of speech in processing. Namely, we compared the relationship between a lesser-studied indexical dimension relevant to bilinguals, which language is being spoken (in these experiments, either Mandarin Chinese or English), with: talker identity (Experiment 1), talker gender (Experiment 2), and amplitude of speech (Experiment 3). Results demonstrate that language-being-spoken is integrated in processing with each of the other dimensions tested, and that these processing dependencies seem to be independent of listeners' bilingual status or experience with the languages tested. Moreover, the data reveal processing interference asymmetries, suggesting a processing hierarchy for indexical, non-linguistic speech features.
Collapse
|
45
|
Drozdova P, van Hout R, Scharenborg O. L2 voice recognition: The role of speaker-, listener-, and stimulus-related factors. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:3058. [PMID: 29195438 DOI: 10.1121/1.5010169] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Previous studies examined various factors influencing voice recognition and learning with mixed results. The present study investigates the separate and combined contribution of these various speaker-, stimulus-, and listener-related factors to voice recognition. Dutch listeners, with arguably incomplete phonological and lexical knowledge in the target language, English, learned to recognize the voice of four native English speakers, speaking in English, during four-day training. Training was successful and listeners' accuracy was shown to be influenced by the acoustic characteristics of speakers and the sound composition of the words used in the training, but not by lexical frequency of the words, nor the lexical knowledge of the listeners or their phonological aptitude. Although not conclusive, listeners with a lower working memory capacity seemed to be slower in learning voices than listeners with a higher working memory capacity. The results reveal that speaker-related, listener-related, and stimulus-related factors accumulate in voice recognition, while lexical information turns out not to play a role in successful voice learning and recognition. This implies that voice recognition operates at the prelexical processing level.
Collapse
Affiliation(s)
- Polina Drozdova
- Centre for Language Studies, Radboud University Nijmegen, Erasmusplein 1, P.O. Box 9103, 6500 HD Nijmegen, the Netherlands
| | - Roeland van Hout
- Centre for Language Studies, Radboud University Nijmegen, Erasmusplein 1, P.O. Box 9103, 6500 HD Nijmegen, the Netherlands
| | - Odette Scharenborg
- Centre for Language Studies, Radboud University Nijmegen, Erasmusplein 1, P.O. Box 9103, 6500 HD Nijmegen, the Netherlands
| |
Collapse
|
46
|
Johnson EK, Bruggeman L, Cutler A. Abstraction and the (Misnamed) Language Familiarity Effect. Cogn Sci 2017; 42:633-645. [DOI: 10.1111/cogs.12520] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 02/25/2017] [Accepted: 04/10/2017] [Indexed: 11/27/2022]
Affiliation(s)
| | - Laurence Bruggeman
- Department of Linguistics Macquarie University
- ARC Centre of Excellence in Cognition and its Disorders
| | - Anne Cutler
- ARC Centre of Excellence in Cognition and its Disorders
- The MARCS Institute Western Sydney University
- ARC Centre of Excellence for the Dynamics of Language
- Max Planck Institute for Psycholinguistics
| |
Collapse
|
47
|
Stevenage SV. Drawing a distinction between familiar and unfamiliar voice processing: A review of neuropsychological, clinical and empirical findings. Neuropsychologia 2017; 116:162-178. [PMID: 28694095 DOI: 10.1016/j.neuropsychologia.2017.07.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Revised: 06/04/2017] [Accepted: 07/07/2017] [Indexed: 11/29/2022]
Abstract
Thirty years on from their initial observation that familiar voice recognition is not the same as unfamiliar voice discrimination (van Lancker and Kreiman, 1987), the current paper reviews available evidence in support of a distinction between familiar and unfamiliar voice processing. Here, an extensive review of the literature is provided, drawing on evidence from four domains of interest: the neuropsychological study of healthy individuals, neuropsychological investigation of brain-damaged individuals, the exploration of voice recognition deficits in less commonly studied clinical conditions, and finally empirical data from healthy individuals. All evidence is assessed in terms of its contribution to the question of interest - is familiar voice processing distinct from unfamiliar voice processing. In this regard, the evidence provides compelling support for van Lancker and Kreiman's early observation. Two considerations result: First, the limits of research based on one or other type of voice stimulus are more clearly appreciated. Second, given the demonstration of a distinction between unfamiliar and familiar voice processing, a new wave of research is encouraged which examines the transition involved as a voice is learned.
Collapse
Affiliation(s)
- Sarah V Stevenage
- Department of Psychology, University of Southampton, Highfield, Southampton, Hampshire SO17 1BJ, UK.
| |
Collapse
|
48
|
Early phonology revealed by international adoptees' birth language retention. Proc Natl Acad Sci U S A 2017; 114:7307-7312. [PMID: 28652342 DOI: 10.1073/pnas.1706405114] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Until at least 6 mo of age, infants show good discrimination for familiar phonetic contrasts (i.e., those heard in the environmental language) and contrasts that are unfamiliar. Adult-like discrimination (significantly worse for nonnative than for native contrasts) appears only later, by 9-10 mo. This has been interpreted as indicating that infants have no knowledge of phonology until vocabulary development begins, after 6 mo of age. Recently, however, word recognition has been observed before age 6 mo, apparently decoupling the vocabulary and phonology acquisition processes. Here we show that phonological acquisition is also in progress before 6 mo of age. The evidence comes from retention of birth-language knowledge in international adoptees. In the largest ever such study, we recruited 29 adult Dutch speakers who had been adopted from Korea when young and had no conscious knowledge of Korean language at all. Half were adopted at age 3-5 mo (before native-specific discrimination develops) and half at 17 mo or older (after word learning has begun). In a short intensive training program, we observe that adoptees (compared with 29 matched controls) more rapidly learn tripartite Korean consonant distinctions without counterparts in their later-acquired Dutch, suggesting that the adoptees retained phonological knowledge about the Korean distinction. The advantage is equivalent for the younger-adopted and the older-adopted groups, and both groups not only acquire the tripartite distinction for the trained consonants but also generalize it to untrained consonants. Although infants younger than 6 mo can still discriminate unfamiliar phonetic distinctions, this finding indicates that native-language phonological knowledge is nonetheless being acquired at that age.
Collapse
|
49
|
Clayards M. Individual Talker and Token Covariation in the Production of Multiple Cues to Stop Voicing. PHONETICA 2017; 75:1-23. [PMID: 28595176 DOI: 10.1159/000448809] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 07/30/2016] [Indexed: 05/24/2023]
Abstract
BACKGROUND/AIMS Previous research found that individual talkers have consistent differences in the production of segments impacting the perception of their speech by others. Speakers also produce multiple acoustic-phonetic cues to phonological contrasts. Less is known about how multiple cues covary within a phonetic category and across talkers. We examined differences in individual talkers across cues and whether token-by-token variability is a result of intrinsic factors or speaking style by examining within-category correlations. METHODS We examined correlations for 3 cues (voice onset time, VOT, talker-relative onset fundamental frequency, f0, and talker-relative following vowel duration) to word-initial labial stop voicing in English. RESULTS VOT for /b/ and /p/ productions and onset f0 for /b/ productions varied significantly by talker. Token-by-token within-category variation was largely limited to speaking rate effects. VOT and f0 were negatively correlated within category for /b/ productions after controlling for speaking rate and talker mean f0, but in the opposite direction expected for an intrinsic effect. Within-category talker means were correlated across VOT and vowel duration for /p/ productions. Some talkers produced more prototypical values than others, indicating systematic talker differences. CONCLUSION Relationships between cues are mediated more by categories and talkers than by intrinsic physiological relationships.Talker differences reflect systematic speaking style differences.
Collapse
Affiliation(s)
- Meghan Clayards
- Department of Linguistics, School of Communication Sciences and Disorders, McGill University, Montreal, QC, Canada
| |
Collapse
|
50
|
Potter CE, Wang T, Saffran JR. Second Language Experience Facilitates Statistical Learning of Novel Linguistic Materials. Cogn Sci 2017; 41 Suppl 4:913-927. [PMID: 27988939 PMCID: PMC5407950 DOI: 10.1111/cogs.12473] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Revised: 10/05/2016] [Accepted: 11/09/2016] [Indexed: 11/28/2022]
Abstract
Recent research has begun to explore individual differences in statistical learning, and how those differences may be related to other cognitive abilities, particularly their effects on language learning. In this research, we explored a different type of relationship between language learning and statistical learning: the possibility that learning a new language may also influence statistical learning by changing the regularities to which learners are sensitive. We tested two groups of participants, Mandarin Learners and Naïve Controls, at two time points, 6 months apart. At each time point, participants performed two different statistical learning tasks: an artificial tonal language statistical learning task and a visual statistical learning task. Only the Mandarin-learning group showed significant improvement on the linguistic task, whereas both groups improved equally on the visual task. These results support the view that there are multiple influences on statistical learning. Domain-relevant experiences may affect the regularities that learners can discover when presented with novel stimuli.
Collapse
Affiliation(s)
- Christine E. Potter
- Department of Psychology and Waisman Center, University of Wisconsin – Madison
| | - Tianlin Wang
- Department of Psychology and Waisman Center, University of Wisconsin – Madison
| | - Jenny R. Saffran
- Department of Psychology and Waisman Center, University of Wisconsin – Madison
| |
Collapse
|