1
|
Zhang K, Peng G. The modulation of cognitive load on speech normalization: A neurophysiological perspective. BRAIN AND LANGUAGE 2025; 266:105579. [PMID: 40239268 DOI: 10.1016/j.bandl.2025.105579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2024] [Revised: 04/07/2025] [Accepted: 04/07/2025] [Indexed: 04/18/2025]
Abstract
Extrinsic normalization, wherein listeners utilize context cues to adapt to speech variability, is essential for maintaining perceptual constancy. In daily communication, distractions are ubiquitous, raising questions about the influence of cognitive load on this process, particularly at the cortical level. This study investigates how cognitive load modulates extrinsic normalization using electroencephalography (EEG). Native Cantonese speakers were asked to perceive Cantonese tones from multiple speakers with context cues in both single- and dual-task conditions. The secondary task did not hinder listeners' normalization process at the behavioral level. However, EEG data revealed significant modulations of extrinsic normalization under cognitive load. Extrinsic normalization elicited P2, N400, and LFN, suggesting that extrinsic normalization encompasses multiple perceptual adjustments at stages of phonological processing, lexical retrieval, and decision-making. Cognitive load influenced extrinsic normalization at all these stages, as evidenced by smaller P2, larger N400, and larger LFN, highlighting the active and controlled nature of this process.
Collapse
Affiliation(s)
- Kaile Zhang
- The Research Centre for Language, Cognition, and Neuroscience, The Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR, China
| | - Gang Peng
- The Research Centre for Language, Cognition, and Neuroscience, The Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR, China.
| |
Collapse
|
2
|
Bujok R, Peeters D, Meyer AS, Bosker HR. Beating stress: Evidence for recalibration of word stress perception. Atten Percept Psychophys 2025:10.3758/s13414-025-03088-5. [PMID: 40394367 DOI: 10.3758/s13414-025-03088-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2025] [Indexed: 05/22/2025]
Abstract
Speech is inherently variable, requiring listeners to apply adaptation mechanisms to deal with the variability. A proposed perceptual adaptation mechanism is recalibration, whereby listeners learn to adjust cognitive representations of speech sounds based on disambiguating contextual information. Most studies on the role of recalibration in speech perception have focused on variability in particular speech segments (e.g., consonants/vowels), and speech has mostly been studied with a focus on talking heads. However, speech is often accompanied by visual bodily signals like hand gestures, and is thus multimodal. Moreover, variability in speech extends beyond segmental aspects alone and also affects prosodic aspects, like lexical stress. We currently do not understand well how listeners adjust their representations of lexical stress patterns to different speakers. In four experiments, we investigated recalibration of lexical stress perception, driven by lexico-orthographical information (Experiment 1) and by manual beat gestures (Experiments 2-4). Across experiments, we observed that these two types of disambiguating information (presented in an audiovisual exposure phase) led listeners to adjust their representations of lexical stress, with lasting consequences for subsequent spoken word recognition (in an audio-only test phase). However, evidence for generalization of this recalibration to new words was only found in the third experiment, suggesting that generalization may be limited. These results highlight that recalibration is a plausible mechanism for suprasegmental speech adaption in everyday communication and show that even the timing of simple hand gestures can have a lasting effect on auditory speech perception.
Collapse
Affiliation(s)
- Ronny Bujok
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
- International Max Planck Research School for Language Sciences, MPI for Psycholinguistics, Max Planck Society, Nijmegen, The Netherlands.
| | - David Peeters
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Department of Communication and Cognition, TiCC, Tilburg University, Tilburg, The Netherlands
| | - Antje S Meyer
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
3
|
Persson A, Barreda S, Jaeger TF. Comparing accounts of formant normalization against US English listeners' vowel perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:1458-1482. [PMID: 39998127 DOI: 10.1121/10.0035476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Accepted: 01/07/2025] [Indexed: 02/26/2025]
Abstract
Human speech recognition tends to be robust, despite substantial cross-talker variability. Believed to be critical to this ability are auditory normalization mechanisms whereby listeners adapt to individual differences in vocal tract physiology. This study investigates the computations involved in such normalization. Two 8-way alternative forced-choice experiments assessed L1 listeners' categorizations across the entire US English vowel space-both for unaltered and synthesized stimuli. Listeners' responses in these experiments were compared against the predictions of 20 influential normalization accounts that differ starkly in the inference and memory capacities they imply for speech perception. This includes variants of estimation-free transformations into psycho-acoustic spaces, intrinsic normalizations relative to concurrent acoustic properties, and extrinsic normalizations relative to talker-specific statistics. Listeners' responses were best explained by extrinsic normalization, suggesting that listeners learn and store distributional properties of talkers' speech. Specifically, computationally simple (single-parameter) extrinsic normalization best fit listeners' responses. This simple extrinsic normalization also clearly outperformed Lobanov normalization-a computationally more complex account that remains popular in research on phonetics and phonology, sociolinguistics, typology, and language acquisition.
Collapse
Affiliation(s)
- Anna Persson
- Swedish Language and Multilingualism, Stockholm University, Stockholm, SE-106 91, Sweden
| | - Santiago Barreda
- Linguistics, University of California, Davis, California 95616, USA
| | - T Florian Jaeger
- Brain and Cognitive Sciences, Goergen Institute for Data Science and Artificial Intelligence, University of Rochester, Rochester, New York 14627, USA
| |
Collapse
|
4
|
Kutlu E, Baxelbaum K, Sorensen E, Oleson J, McMurray B. Linguistic diversity shapes flexible speech perception in school age children. Sci Rep 2024; 14:28825. [PMID: 39572753 PMCID: PMC11582665 DOI: 10.1038/s41598-024-80430-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 11/19/2024] [Indexed: 11/24/2024] Open
Abstract
Every day, listeners encounter a wide range of acoustic signals. Successfully solving this variability problem allows them to interpret these signals accurately. While this mechanism tends to be less effortful for adults, children need to learn stable categories in the face of such variability. It is unknown to what extent general maturation or diversity of the input plays a role in shaping different speech categorization profiles that children can employ. Here, we tested school-aged children's speech categorization with a continuous speech categorization task called the Visual Analogue Scaling (VAS) task. We measured the linguistic diversity in each child's social environment through a social network analysis. We found that increased linguistic diversity led to more flexible and gradient speech categorization. On the other hand, less diverse linguistic input led to more categorical speech categorization. We argue that these findings have implications for speech perception as well as linguistic diversity research.
Collapse
Affiliation(s)
- Ethan Kutlu
- Department of Linguistics, University of Iowa, Iowa, USA.
- Department of Psychological and Brain Sciences, University of Iowa, Iowa, USA.
| | - Keith Baxelbaum
- Department of Psychological and Brain Sciences, University of Iowa, Iowa, USA
| | - Eldon Sorensen
- Department of Biostatistics, University of Iowa, Iowa, USA
| | - Jacob Oleson
- Department of Biostatistics, University of Iowa, Iowa, USA
| | - Bob McMurray
- Department of Linguistics, University of Iowa, Iowa, USA
- Department of Psychological and Brain Sciences, University of Iowa, Iowa, USA
| |
Collapse
|
5
|
Kapatsinski V, Bramlett AA, Idemaru K. What do you learn from a single cue? Dimensional reweighting and cue reassociation from experience with a newly unreliable phonetic cue. Cognition 2024; 249:105818. [PMID: 38772253 DOI: 10.1016/j.cognition.2024.105818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 05/23/2024]
Abstract
In language comprehension, we use perceptual cues to infer meanings. Some of these cues reside on perceptual dimensions. For example, the difference between bear and pear is cued by a difference in voice onset time (VOT), which is a continuous perceptual dimension. The present paper asks whether, and when, experience with a single value on a dimension behaving unexpectedly is used by the learner to reweight the whole dimension. We show that learners reweight the whole VOT dimension when exposed to a single VOT value (e.g., 45 ms) and provided with feedback indicating that the speaker intended to produce a /b/ 50% of the time and a /p/ the other 50% of the time. Importantly, dimensional reweighting occurs only if 1) the 50/50 feedback is unexpected for the VOT value, and 2) there is another dimension that is predictive of feedback. When no predictive dimension is available, listeners reassociate the experienced VOT value with the more surprising outcome but do not downweight the entire VOT dimension. These results provide support for perceptual representations of speech sounds that combine cues and dimensions, for viewing perceptual learning in speech as a combination of error-driven cue reassociation and dimensional reweighting, and for considering dimensional reweighting to be reallocation of attention that occurs only when there is evidence that reallocating attention would improve prediction accuracy (Harmon, Z., Idemaru, K., & Kapatsinski, V. 2019. Learning mechanisms in cue reweighting. Cognition, 189, 76-88.).
Collapse
Affiliation(s)
- Vsevolod Kapatsinski
- University of Oregon, Department of Linguistics, 161 Straub Hall, University of Oregon, Eugene, OR 97403-1290, United States of America.
| | - Adam A Bramlett
- Carnegie-Mellon University, Department of Modern Languages, 341 Posner Hall, 5000 Forbes Avenue, Pittsburgh, PA 15213, United States of America.
| | - Kaori Idemaru
- University of Oregon, Department of East Asian Languages and Literatures, 114 Friendly Hall University of Oregon, Eugene, OR 97403-1248, United States of America.
| |
Collapse
|
6
|
Kurumada C, Rivera R, Allen P, Bennetto L. Perception and adaptation of receptive prosody in autistic adolescents. Sci Rep 2024; 14:16409. [PMID: 39013983 PMCID: PMC11252140 DOI: 10.1038/s41598-024-66569-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 07/01/2024] [Indexed: 07/18/2024] Open
Abstract
A fundamental aspect of language processing is inferring others' minds from subtle variations in speech. The same word or sentence can often convey different meanings depending on its tempo, timing, and intonation-features often referred to as prosody. Although autistic children and adults are known to experience difficulty in making such inferences, the science remains unclear as to why. We hypothesize that detail-oriented perception in autism may interfere with the inference process if it lacks the adaptivity required to cope with the variability ubiquitous in human speech. Using a novel prosodic continuum that shifts the sentence meaning gradiently from a statement (e.g., "It's raining") to a question (e.g., "It's raining?"), we have investigated the perception and adaptation of receptive prosody in autistic adolescents and two groups of non-autistic controls. Autistic adolescents showed attenuated adaptivity in categorizing prosody, whereas they were equivalent to controls in terms of discrimination accuracy. Combined with recent findings in segmental (e.g., phoneme) recognition, the current results provide the basis for an emerging research framework for attenuated flexibility and reduced influence of contextual feedback as a possible source of deficits that hinder linguistic and social communication in autism.
Collapse
Affiliation(s)
- Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, 14627, USA.
| | - Rachel Rivera
- Psychology, University of Rochester, Rochester, 14627, USA
| | - Paul Allen
- Psychology, University of Rochester, Rochester, 14627, USA
- Otolaryngology, University of Rochester Medical Center, Rochester, 14642, USA
| | - Loisa Bennetto
- Psychology, University of Rochester, Rochester, 14627, USA
| |
Collapse
|
7
|
Xie X, Kurumada C. From first encounters to longitudinal exposure: a repeated exposure-test paradigm for monitoring speech adaptation. Front Psychol 2024; 15:1383904. [PMID: 38873525 PMCID: PMC11169900 DOI: 10.3389/fpsyg.2024.1383904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 05/08/2024] [Indexed: 06/15/2024] Open
Abstract
Perceptual difficulty with an unfamiliar accent can dissipate within short time scales (e.g., within minutes), reflecting rapid adaptation effects. At the same time, long-term familiarity with an accent is also known to yield stable perceptual benefits. However, whether the long-term effects reflect sustained, cumulative progression from shorter-term adaptation remains unknown. To fill this gap, we developed a web-based, repeated exposure-test paradigm. In this paradigm, short test blocks alternate with exposure blocks, and this exposure-test sequence is repeated multiple times. This design allows for the testing of adaptive speech perception both (a) within the first moments of encountering an unfamiliar accent and (b) over longer time scales such as days and weeks. In addition, we used a Bayesian ideal observer approach to select natural speech stimuli that increase the statistical power to detect adaptation. The current report presents results from a first application of this paradigm, investigating changes in the recognition accuracy of Mandarin-accented speech by native English listeners over five sessions spanning 3 weeks. We found that the recognition of an accent feature (a syllable-final /d/, as in feed, sounding/t/-like) improved steadily over the three-week period. Unexpectedly, however, the improvement was seen with or without exposure to the accent. We discuss possible reasons for this result and implications for conducting future longitudinal studies with repeated exposure and testing.
Collapse
Affiliation(s)
- Xin Xie
- Department of Language Science, University of California, Irvine, Irvine, CA, United States
| | - Chigusa Kurumada
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY, United States
| |
Collapse
|
8
|
Steffman J, Sundara M. Short-term exposure alters adult listeners' perception of segmental phonotactics. JASA EXPRESS LETTERS 2023; 3:125202. [PMID: 38085137 DOI: 10.1121/10.0023900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 11/21/2023] [Indexed: 12/18/2023]
Abstract
This study evaluates the malleability of adults' perception of probabilistic phonotactic (biphone) probabilities, building on a body of literature on statistical phonotactic learning. It was first replicated that listeners categorize phonetic continua as sounds that create higher-probability sequences in their native language. Listeners were also exposed to skewed distributions of biphone contexts, which resulted in the enhancement or reversal of these effects. Thus, listeners dynamically update biphone probabilities (BPs) and bring this to bear on perception of ambiguous acoustic information. These effects can override long-term BP effects rooted in native language experience.
Collapse
Affiliation(s)
- Jeremy Steffman
- Linguistics and English Language, The University of Edinburgh, Edinburgh, EH8 9AD, United Kingdom
| | - Megha Sundara
- Linguistics, University of California, Los Angeles, California 90095, ,
| |
Collapse
|
9
|
McLaughlin DJ, Van Engen KJ. Exploring effects of social information on talker-independent accent adaptation. JASA EXPRESS LETTERS 2023; 3:125201. [PMID: 38059794 DOI: 10.1121/10.0022536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 11/01/2023] [Indexed: 12/08/2023]
Abstract
The present study examined whether race information about speakers can promote rapid and generalizable perceptual adaptation to second-language accent. First-language English listeners were presented with Cantonese-accented English sentences in speech-shaped noise during a training session with three intermixed talkers, followed by a test session with a novel (i.e., fourth) talker. Participants were assigned to view either three East Asian or three White faces during training, corresponding to each speaker. Results indicated no effect of the social priming manipulation on the training or test sessions, although both groups performed better at test than a control group.
Collapse
Affiliation(s)
- Drew J McLaughlin
- Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Gipuzkoa 20018, Spain
- Department of Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130, ,
| | - Kristin J Van Engen
- Department of Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130, ,
| |
Collapse
|
10
|
Persson A, Jaeger TF. Evaluating normalization accounts against the dense vowel space of Central Swedish. Front Psychol 2023; 14:1165742. [PMID: 37416548 PMCID: PMC10322199 DOI: 10.3389/fpsyg.2023.1165742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 05/23/2023] [Indexed: 07/08/2023] Open
Abstract
Talkers vary in the phonetic realization of their vowels. One influential hypothesis holds that listeners overcome this inter-talker variability through pre-linguistic auditory mechanisms that normalize the acoustic or phonetic cues that form the input to speech recognition. Dozens of competing normalization accounts exist-including both accounts specific to vowel perception and general purpose accounts that can be applied to any type of cue. We add to the cross-linguistic literature on this matter by comparing normalization accounts against a new phonetically annotated vowel database of Swedish, a language with a particularly dense vowel inventory of 21 vowels differing in quality and quantity. We evaluate normalization accounts on how they differ in predicted consequences for perception. The results indicate that the best performing accounts either center or standardize formants by talker. The study also suggests that general purpose accounts perform as well as vowel-specific accounts, and that vowel normalization operates in both temporal and spectral domains.
Collapse
Affiliation(s)
- Anna Persson
- Department of Swedish Language and Multilingualism, Stockholm University, Stockholm, Sweden
| | - T. Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, United States
- Computer Science, University of Rochester, Rochester, NY, United States
| |
Collapse
|