1
|
Bujok R, Peeters D, Meyer AS, Bosker HR. Beating stress: Evidence for recalibration of word stress perception. Atten Percept Psychophys 2025:10.3758/s13414-025-03088-5. [PMID: 40394367 DOI: 10.3758/s13414-025-03088-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2025] [Indexed: 05/22/2025]
Abstract
Speech is inherently variable, requiring listeners to apply adaptation mechanisms to deal with the variability. A proposed perceptual adaptation mechanism is recalibration, whereby listeners learn to adjust cognitive representations of speech sounds based on disambiguating contextual information. Most studies on the role of recalibration in speech perception have focused on variability in particular speech segments (e.g., consonants/vowels), and speech has mostly been studied with a focus on talking heads. However, speech is often accompanied by visual bodily signals like hand gestures, and is thus multimodal. Moreover, variability in speech extends beyond segmental aspects alone and also affects prosodic aspects, like lexical stress. We currently do not understand well how listeners adjust their representations of lexical stress patterns to different speakers. In four experiments, we investigated recalibration of lexical stress perception, driven by lexico-orthographical information (Experiment 1) and by manual beat gestures (Experiments 2-4). Across experiments, we observed that these two types of disambiguating information (presented in an audiovisual exposure phase) led listeners to adjust their representations of lexical stress, with lasting consequences for subsequent spoken word recognition (in an audio-only test phase). However, evidence for generalization of this recalibration to new words was only found in the third experiment, suggesting that generalization may be limited. These results highlight that recalibration is a plausible mechanism for suprasegmental speech adaption in everyday communication and show that even the timing of simple hand gestures can have a lasting effect on auditory speech perception.
Collapse
Affiliation(s)
- Ronny Bujok
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
- International Max Planck Research School for Language Sciences, MPI for Psycholinguistics, Max Planck Society, Nijmegen, The Netherlands.
| | - David Peeters
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Department of Communication and Cognition, TiCC, Tilburg University, Tilburg, The Netherlands
| | - Antje S Meyer
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
2
|
Brown-Schmidt S, Cho SJ, Fenn KM, Trude AM. Modeling spatio-temporal patterns in intensive binary time series eye-tracking data using Generalized Additive Mixed Models. Brain Res 2025; 1854:149511. [PMID: 39978529 DOI: 10.1016/j.brainres.2025.149511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 02/07/2025] [Accepted: 02/15/2025] [Indexed: 02/22/2025]
Abstract
The aim of this paper is to introduce and illustrate the use of Generalized Additive Mixed Models (GAMM) for analyzing intensive binary time-series eye-tracking data. The spatio-temporal GAMM was applied to intensive binary time-series eye-tracking data. In doing so, we reveal that both fixed condition effects, as well as previously documented temporal contingencies in this type of data vary over time during speech perception. Further, spatial relationships between the point of fixation and the candidate referents on screen modulate the probability of an upcoming target fixation, and this pull (and push) on fixations changes over time as the speech is being perceived. This technique provides a way to not only account for the dominant autoregressive patterns typically seen in visual-world eye-tracking data, but does so in a way that allows modeling crossed random effects (by person and item, as typical in psycholinguistics datasets), and to model complex relationships between space and time that emerge in eye-tracking data. This new technique offers ways to ask, and answer new questions in the world of language use and processing.
Collapse
Affiliation(s)
- Sarah Brown-Schmidt
- Vanderbilt University, Department of Psychology & Human Development, United States.
| | - Sun-Joo Cho
- Vanderbilt University, Department of Psychology & Human Development, United States
| | - Kimberly M Fenn
- Michigan State University, Department of Psychology, United States
| | - Alison M Trude
- University of Illinois at Urbana-Champaign, Department of Psychology, United States
| |
Collapse
|
3
|
Jesse A. Learning to recognize unfamiliar faces from fine-phonetic detail in visual speech. Atten Percept Psychophys 2025; 87:936-951. [PMID: 40113736 DOI: 10.3758/s13414-025-03049-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/28/2025] [Indexed: 03/22/2025]
Abstract
How speech is realized varies across talkers but can be somewhat consistent within a talker. Humans are sensitive to these idiosyncrasies when perceiving auditory speech, but also, in face-to-face communications, when perceiving their visual speech. Our recent work has shown that humans can also use talker idiosyncrasies seen in how talkers produce sentences to rapidly learn to recognize unfamiliar talkers, suggesting that visual speech information can be used for speech perception and talker recognition. However, in learning from sentences, learners may focus only on global information about the talker, such as talker-specific realizations of prosody and rate. The present study tested whether human perceivers can learn the identity of the talker based solely on fine-phonetic detail in the dynamic realization of visual speech alone. Participants learned to identify talkers from point-light displays showing them uttering isolated words. These point-light displays isolated the dynamic speech information, while discarding static information about the talker's face. No sound was presented. Feedback was given only during training. Test included point-light displays of familiar words from training and of novel words. Participants learned to recognize two and four talkers from the word-level dynamics of visual speech from very little exposure. The established representations allowed talker recognition independent of linguistic content-that is, even from novel words. Spoken words therefore contain sufficient indexical information in their fine-phonetic detail for perceivers to acquire dynamic facial representations for unfamiliar talkers that allows generalization across words. Dynamic representations of talking faces are formed for the recognition of unfamiliar faces.
Collapse
Affiliation(s)
- Alexandra Jesse
- Department of Psychological and Brain Sciences, University of Massachusetts, 135 Hicks Way, Amherst, MA, 01003, USA.
| |
Collapse
|
4
|
Persson A, Barreda S, Jaeger TF. Comparing accounts of formant normalization against US English listeners' vowel perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:1458-1482. [PMID: 39998127 DOI: 10.1121/10.0035476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Accepted: 01/07/2025] [Indexed: 02/26/2025]
Abstract
Human speech recognition tends to be robust, despite substantial cross-talker variability. Believed to be critical to this ability are auditory normalization mechanisms whereby listeners adapt to individual differences in vocal tract physiology. This study investigates the computations involved in such normalization. Two 8-way alternative forced-choice experiments assessed L1 listeners' categorizations across the entire US English vowel space-both for unaltered and synthesized stimuli. Listeners' responses in these experiments were compared against the predictions of 20 influential normalization accounts that differ starkly in the inference and memory capacities they imply for speech perception. This includes variants of estimation-free transformations into psycho-acoustic spaces, intrinsic normalizations relative to concurrent acoustic properties, and extrinsic normalizations relative to talker-specific statistics. Listeners' responses were best explained by extrinsic normalization, suggesting that listeners learn and store distributional properties of talkers' speech. Specifically, computationally simple (single-parameter) extrinsic normalization best fit listeners' responses. This simple extrinsic normalization also clearly outperformed Lobanov normalization-a computationally more complex account that remains popular in research on phonetics and phonology, sociolinguistics, typology, and language acquisition.
Collapse
Affiliation(s)
- Anna Persson
- Swedish Language and Multilingualism, Stockholm University, Stockholm, SE-106 91, Sweden
| | - Santiago Barreda
- Linguistics, University of California, Davis, California 95616, USA
| | - T Florian Jaeger
- Brain and Cognitive Sciences, Goergen Institute for Data Science and Artificial Intelligence, University of Rochester, Rochester, New York 14627, USA
| |
Collapse
|
5
|
Luo Y, Mitterer H, Zhou X, Chen Y. Flexibility and Stability in Lexical Tone Recalibration: Evidence from Tone Perceptual Learning. LANGUAGE AND SPEECH 2024:238309241291536. [PMID: 39696890 DOI: 10.1177/00238309241291536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2024]
Abstract
Listeners adjust their perception of sound categories when confronted with variations in speech. Previous research on speech recalibration has primarily focused on segmental variation, demonstrating that recalibration tends to be specific to individual speakers and situations and often persists over time. In this study, we present findings on the perceptual learning of lexical tone in Standard Chinese, a suprasegmental feature signaled primarily through pitch variations to distinguish morpheme/word meanings. Native speakers of Standard Chinese showed a recalibration of tone category boundaries immediately following exposure to ambiguous tonal pitch contours. However, this recalibration effect significantly weakened after 12 hours. Furthermore, participants trained at night did not exhibit delayed stabilization, a phenomenon commonly observed during sleep-induced consolidation. Our results replicate previous findings and provide new evidence suggesting that while our perceptual system can flexibly adapt to real-time sensory inputs, subsequent consolidation processes, such as those occurring during sleep, may exhibit selectivity and, under certain conditions, may be ineffective.
Collapse
Affiliation(s)
- Yingyi Luo
- Institute of Linguistics, Chinese Academy of Social Sciences, China
| | | | - Xiaolin Zhou
- College of Psychology and Cognitive Science, Peking University, China
| | - Yiya Chen
- Leiden University Centre for Linguistics, The Netherlands; Leiden Institute for Brain and Cognition, Leiden University, The Netherlands
| |
Collapse
|
6
|
Kurumada C, Rivera R, Allen P, Bennetto L. Perception and adaptation of receptive prosody in autistic adolescents. Sci Rep 2024; 14:16409. [PMID: 39013983 PMCID: PMC11252140 DOI: 10.1038/s41598-024-66569-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 07/01/2024] [Indexed: 07/18/2024] Open
Abstract
A fundamental aspect of language processing is inferring others' minds from subtle variations in speech. The same word or sentence can often convey different meanings depending on its tempo, timing, and intonation-features often referred to as prosody. Although autistic children and adults are known to experience difficulty in making such inferences, the science remains unclear as to why. We hypothesize that detail-oriented perception in autism may interfere with the inference process if it lacks the adaptivity required to cope with the variability ubiquitous in human speech. Using a novel prosodic continuum that shifts the sentence meaning gradiently from a statement (e.g., "It's raining") to a question (e.g., "It's raining?"), we have investigated the perception and adaptation of receptive prosody in autistic adolescents and two groups of non-autistic controls. Autistic adolescents showed attenuated adaptivity in categorizing prosody, whereas they were equivalent to controls in terms of discrimination accuracy. Combined with recent findings in segmental (e.g., phoneme) recognition, the current results provide the basis for an emerging research framework for attenuated flexibility and reduced influence of contextual feedback as a possible source of deficits that hinder linguistic and social communication in autism.
Collapse
Affiliation(s)
- Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, 14627, USA.
| | - Rachel Rivera
- Psychology, University of Rochester, Rochester, 14627, USA
| | - Paul Allen
- Psychology, University of Rochester, Rochester, 14627, USA
- Otolaryngology, University of Rochester Medical Center, Rochester, 14642, USA
| | - Loisa Bennetto
- Psychology, University of Rochester, Rochester, 14627, USA
| |
Collapse
|
7
|
Bass I, Espinoza C, Bonawitz E, Ullman TD. Teaching Without Thinking: Negative Evaluations of Rote Pedagogy. Cogn Sci 2024; 48:e13470. [PMID: 38862266 DOI: 10.1111/cogs.13470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 04/11/2024] [Accepted: 05/16/2024] [Indexed: 06/13/2024]
Abstract
When people make decisions, they act in a way that is either automatic ("rote"), or more thoughtful ("reflective"). But do people notice when others are behaving in a rote way, and do they care? We examine the detection of rote behavior and its consequences in U.S. adults, focusing specifically on pedagogy and learning. We establish repetitiveness as a cue for rote behavior (Experiment 1), and find that rote people are seen as worse teachers (Experiment 2). We also find that the more a person's feedback seems similar across groups (indicating greater rote-ness), the more negatively their teaching is evaluated (Experiment 3). A word-embedding analysis of an open-response task shows people naturally cluster rote and reflective teachers into different semantic categories (Experiment 4). We also show that repetitiveness can be decoupled from perceptions of rote-ness given contextual explanation (Experiment 5). Finally, we establish two additional cues to rote behavior that can be tied to quality of teaching (Experiment 6). These results empirically show that people detect and care about scripted behaviors in pedagogy, and suggest an important extension to formal frameworks of social reasoning.
Collapse
Affiliation(s)
- Ilona Bass
- Department of Psychology, Harvard University
- Graduate School of Education, Harvard University
| | | | | | | |
Collapse
|
8
|
Xie X, Jaeger TF, Kurumada C. What we do (not) know about the mechanisms underlying adaptive speech perception: A computational framework and review. Cortex 2023; 166:377-424. [PMID: 37506665 DOI: 10.1016/j.cortex.2023.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 12/23/2022] [Accepted: 05/05/2023] [Indexed: 07/30/2023]
Abstract
Speech from unfamiliar talkers can be difficult to comprehend initially. These difficulties tend to dissipate with exposure, sometimes within minutes or less. Adaptivity in response to unfamiliar input is now considered a fundamental property of speech perception, and research over the past two decades has made substantial progress in identifying its characteristics. The mechanisms underlying adaptive speech perception, however, remain unknown. Past work has attributed facilitatory effects of exposure to any one of three qualitatively different hypothesized mechanisms: (1) low-level, pre-linguistic, signal normalization, (2) changes in/selection of linguistic representations, or (3) changes in post-perceptual decision-making. Direct comparisons of these hypotheses, or combinations thereof, have been lacking. We describe a general computational framework for adaptive speech perception (ASP) that-for the first time-implements all three mechanisms. We demonstrate how the framework can be used to derive predictions for experiments on perception from the acoustic properties of the stimuli. Using this approach, we find that-at the level of data analysis presently employed by most studies in the field-the signature results of influential experimental paradigms do not distinguish between the three mechanisms. This highlights the need for a change in research practices, so that future experiments provide more informative results. We recommend specific changes to experimental paradigms and data analysis. All data and code for this study are shared via OSF, including the R markdown document that this article is generated from, and an R library that implements the models we present.
Collapse
Affiliation(s)
- Xin Xie
- Language Science, University of California, Irvine, USA.
| | - T Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA; Computer Science, University of Rochester, Rochester, NY, USA
| | - Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA
| |
Collapse
|
9
|
Persson A, Jaeger TF. Evaluating normalization accounts against the dense vowel space of Central Swedish. Front Psychol 2023; 14:1165742. [PMID: 37416548 PMCID: PMC10322199 DOI: 10.3389/fpsyg.2023.1165742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 05/23/2023] [Indexed: 07/08/2023] Open
Abstract
Talkers vary in the phonetic realization of their vowels. One influential hypothesis holds that listeners overcome this inter-talker variability through pre-linguistic auditory mechanisms that normalize the acoustic or phonetic cues that form the input to speech recognition. Dozens of competing normalization accounts exist-including both accounts specific to vowel perception and general purpose accounts that can be applied to any type of cue. We add to the cross-linguistic literature on this matter by comparing normalization accounts against a new phonetically annotated vowel database of Swedish, a language with a particularly dense vowel inventory of 21 vowels differing in quality and quantity. We evaluate normalization accounts on how they differ in predicted consequences for perception. The results indicate that the best performing accounts either center or standardize formants by talker. The study also suggests that general purpose accounts perform as well as vowel-specific accounts, and that vowel normalization operates in both temporal and spectral domains.
Collapse
Affiliation(s)
- Anna Persson
- Department of Swedish Language and Multilingualism, Stockholm University, Stockholm, Sweden
| | - T. Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, United States
- Computer Science, University of Rochester, Rochester, NY, United States
| |
Collapse
|
10
|
Perceptual learning of multiple talkers: Determinants, characteristics, and limitations. Atten Percept Psychophys 2022; 84:2335-2359. [PMID: 36076119 DOI: 10.3758/s13414-022-02556-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/08/2022] [Indexed: 11/08/2022]
Abstract
Research suggests that listeners simultaneously update talker-specific generative models to reflect structured phonetic variation. Because past investigations exposed listeners to talkers of different genders, it is unknown whether adaptation is talker specific or rather linked to a broader sociophonetic class. Here, we test determinants of listeners' ability to update and apply talker-specific models for speech perception. In six experiments (n = 480), listeners were first exposed to the speech of two talkers who produced ambiguous fricative energy. The talkers' speech was interleaved during exposure, and lexical context differentially biased interpretation of the ambiguity as either /s/ or /ʃ/ for each talker. At test, listeners categorized tokens from ashi-asi continua, one for each talker. Across conditions and experiments, we manipulated exposure quantity, talker gender, blocked versus interleaved talker structure at test, and the degree to which fricative acoustics differed between talkers. When test was blocked by talker, learning was observed for different but not same gender talkers. When talkers were interleaved at test, learning was observed for both different and same gender talkers, which was attenuated when fricative acoustics were constant across talkers. There was no strong evidence to suggest that adaptation to multiple talkers required increased quantity of exposure beyond that required to adapt to a single talker. These results suggest that perceptual learning for speech is achieved via a mechanism that represents a context-dependent, cumulative integration of experience with speech input and identity critical constraints on listeners' ability to dynamically apply multiple generative models in mixed talker listening environments.
Collapse
|
11
|
Nenadić F, Tucker BV, Ten Bosch L. Computational Modeling of an Auditory Lexical Decision Experiment Using DIANA. LANGUAGE AND SPEECH 2022:238309221111752. [PMID: 36000386 PMCID: PMC10394956 DOI: 10.1177/00238309221111752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We present an implementation of DIANA, a computational model of spoken word recognition, to model responses collected in the Massive Auditory Lexical Decision (MALD) project. DIANA is an end-to-end model, including an activation and decision component that takes the acoustic signal as input, activates internal word representations, and outputs lexicality judgments and estimated response latencies. Simulation 1 presents the process of creating acoustic models required by DIANA to analyze novel speech input. Simulation 2 investigates DIANA's performance in determining whether the input signal is a word present in the lexicon or a pseudoword. In Simulation 3, we generate estimates of response latency and correlate them with general tendencies in participant responses in MALD data. We find that DIANA performs fairly well in free word recognition and lexical decision. However, the current approach for estimating response latency provides estimates opposite to those found in behavioral data. We discuss these findings and offer suggestions as to what a contemporary model of spoken word recognition should be able to do.
Collapse
Affiliation(s)
- Filip Nenadić
- University of Alberta, Canada; Singidunum University, Serbia
| | | | | |
Collapse
|
12
|
Bosker HR. Evidence For Selective Adaptation and Recalibration in the Perception of Lexical Stress. LANGUAGE AND SPEECH 2022; 65:472-490. [PMID: 34227417 PMCID: PMC9014674 DOI: 10.1177/00238309211030307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Individuals vary in how they produce speech. This variability affects both the segments (vowels and consonants) and the suprasegmental properties of their speech (prosody). Previous literature has demonstrated that listeners can adapt to variability in how different talkers pronounce the segments of speech. This study shows that listeners can also adapt to variability in how talkers produce lexical stress. Experiment 1 demonstrates a selective adaptation effect in lexical stress perception: repeatedly hearing Dutch trochaic words biased perception of a subsequent lexical stress continuum towards more iamb responses. Experiment 2 demonstrates a recalibration effect in lexical stress perception: when ambiguous suprasegmental cues to lexical stress were disambiguated by lexical orthographic context as signaling a trochaic word in an exposure phase, Dutch participants categorized a subsequent test continuum as more trochee-like. Moreover, the selective adaptation and recalibration effects generalized to novel words, not encountered during exposure. Together, the experiments demonstrate that listeners also flexibly adapt to variability in the suprasegmental properties of speech, thus expanding our understanding of the utility of listener adaptation in speech perception. Moreover, the combined outcomes speak for an architecture of spoken word recognition involving abstract prosodic representations at a prelexical level of analysis.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Hans Rutger Bosker, Max Planck
Institute for Psycholinguistics, PO Box 310, 6500 AH Nijmegen, The
Netherlands.
| |
Collapse
|
13
|
Tan M, Xie X, Jaeger TF. Using Rational Models to Interpret the Results of Experiments on Accent Adaptation. Front Psychol 2021; 12:676271. [PMID: 34803790 PMCID: PMC8603310 DOI: 10.3389/fpsyg.2021.676271] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 09/14/2021] [Indexed: 11/14/2022] Open
Abstract
Exposure to unfamiliar non-native speech tends to improve comprehension. One hypothesis holds that listeners adapt to non-native-accented speech through distributional learning—by inferring the statistics of the talker's phonetic cues. Models based on this hypothesis provide a good fit to incremental changes after exposure to atypical native speech. These models have, however, not previously been applied to non-native accents, which typically differ from native speech in many dimensions. Motivated by a seeming failure to replicate a well-replicated finding from accent adaptation, we use ideal observers to test whether our results can be understood solely based on the statistics of the relevant cue distributions in the native- and non-native-accented speech. The simple computational model we use for this purpose can be used predictively by other researchers working on similar questions. All code and data are shared.
Collapse
Affiliation(s)
- Maryann Tan
- Centre for Research on Bilingualism, Department of Swedish Language & Multilingualism, Stockholm University, Stockholm, Sweden.,Brain & Cognitive Sciences, University of Rochester, Rochester, NY, United States
| | - Xin Xie
- Brain & Cognitive Sciences, University of Rochester, Rochester, NY, United States.,Department of Language Science, University of California, Irvine, Irvine, CA, United States
| | - T Florian Jaeger
- Brain & Cognitive Sciences, University of Rochester, Rochester, NY, United States.,Computer Science, University of Rochester, Rochester, NY, United States
| |
Collapse
|
14
|
Kurumada C, Roettger TB. Thinking probabilistically in the study of intonational speech prosody. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2021; 13:e1579. [PMID: 34599647 DOI: 10.1002/wcs.1579] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 08/09/2021] [Accepted: 08/26/2021] [Indexed: 11/07/2022]
Abstract
Speech prosody, the melodic and rhythmic properties of a language, plays a critical role in our everyday communication. Researchers have identified unique patterns of prosody that segment words and phrases, highlight focal elements in a sentence, and convey holistic meanings and speech acts that interact with the information shared in context. The mapping between the sound and meaning represented in prosody is suggested to be probabilistic-the same physical instance of sounds can support multiple meanings across talkers and contexts while the same meaning can be encoded in physically distinct sound patterns (e.g., pitch movements). The current overview presents an analysis framework for probing the nature of this probabilistic relationship. Illustrated by examples from the literature and a dataset of German focus marking, we discuss the production variability within and across talkers and consider challenges that this variability imposes on the comprehension system. A better understanding of these challenges, we argue, will illuminate how the human perceptual, cognitive, and computational mechanisms may navigate the variability to arrive at a coherent understanding of speech prosody. The current paper is intended to be an introduction for those who are interested in thinking probabilistically about the sound-meaning mapping in prosody. Open questions for future research are discussed with proposals for examining prosodic production and comprehension within a comprehensive, mathematically-motivated framework of probabilistic inference under uncertainty. This article is categorized under: Linguistics > Language in Mind and Brain Psychology > Language.
Collapse
Affiliation(s)
- Chigusa Kurumada
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, USA
| | - Timo B Roettger
- Department of Linguistics & Scandinavian Studies, Universitetet i Oslo, Oslo, Norway
| |
Collapse
|
15
|
Listeners track talker-specific prosody to deal with talker-variability. Brain Res 2021; 1769:147605. [PMID: 34363790 DOI: 10.1016/j.brainres.2021.147605] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 07/21/2021] [Accepted: 07/27/2021] [Indexed: 11/20/2022]
Abstract
One of the challenges in speech perception is that listeners must deal with considerable segmental and suprasegmental variability in the acoustic signal due to differences between talkers. Most previous studies have focused on how listeners deal with segmental variability. In this EEG experiment, we investigated whether listeners track talker-specific usage of suprasegmental cues to lexical stress to recognize spoken words correctly. In a three-day training phase, Dutch participants learned to map non-word minimal stress pairs onto different object referents (e.g., USklot meant "lamp"; usKLOT meant "train"). These non-words were produced by two male talkers. Critically, each talker used only one suprasegmental cue to signal stress (e.g., Talker A used only F0 and Talker B only intensity). We expected participants to learn which talker used which cue to signal stress. In the test phase, participants indicated whether spoken sentences including these non-words were correct ("The word for lamp is…"). We found that participants were slower to indicate that a stimulus was correct if the non-word was produced with the unexpected cue (e.g., Talker A using intensity). That is, if in training Talker A used F0 to signal stress, participants experienced a mismatch between predicted and perceived phonological word-forms if, at test, Talker A unexpectedly used intensity to cue stress. In contrast, the N200 amplitude, an event-related potential related to phonological prediction, was not modulated by the cue mismatch. Theoretical implications of these contrasting results are discussed. The behavioral findings illustrate talker-specific prediction of prosodic cues, picked up through perceptual learning during training.
Collapse
|
16
|
Wang L, Beaman CP, Jiang C, Liu F. Perception and Production of Statement-Question Intonation in Autism Spectrum Disorder: A Developmental Investigation. J Autism Dev Disord 2021; 52:3456-3472. [PMID: 34355295 PMCID: PMC9296411 DOI: 10.1007/s10803-021-05220-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2021] [Indexed: 11/25/2022]
Abstract
Prosody or “melody in speech” in autism spectrum disorder (ASD) is often perceived as atypical. This study examined perception and production of statements and questions in 84 children, adolescents and adults with and without ASD, as well as participants’ pitch direction discrimination thresholds. The results suggested that the abilities to discriminate (in both speech and music conditions), identify, and imitate statement-question intonation were intact in individuals with ASD across age cohorts. Sensitivity to pitch direction predicted performance on intonation processing in both groups, who also exhibited similar developmental changes. These findings provide evidence for shared mechanisms in pitch processing between speech and music, as well as associations between low- and high-level pitch processing and between perception and production of pitch.
Collapse
Affiliation(s)
- Li Wang
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | - C Philip Beaman
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | - Cunmei Jiang
- Music College, Shanghai Normal University, Shanghai, China
| | - Fang Liu
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK.
| |
Collapse
|
17
|
van der Burght CL, Friederici AD, Goucha T, Hartwigsen G. Pitch accents create dissociable syntactic and semantic expectations during sentence processing. Cognition 2021; 212:104702. [PMID: 33857845 DOI: 10.1016/j.cognition.2021.104702] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 03/20/2021] [Accepted: 03/22/2021] [Indexed: 11/17/2022]
Abstract
The language system uses syntactic, semantic, as well as prosodic cues to efficiently guide auditory sentence comprehension. Prosodic cues, such as pitch accents, can build expectations about upcoming sentence elements. This study investigates to what extent syntactic and semantic expectations generated by pitch accents can be dissociated and if so, which cues take precedence when contradictory information is present. We used sentences in which one out of two nominal constituents was placed in contrastive focus with a third one. All noun phrases carried overt syntactic information (case-marking of the determiner) and semantic information (typicality of the thematic role of the noun). Two experiments (a sentence comprehension and a sentence completion task) show that focus, marked by pitch accents, established expectations in both syntactic and semantic domains. However, only the syntactic expectations, when violated, were strong enough to interfere with sentence comprehension. Furthermore, when contradictory cues occurred in the same sentence, the local syntactic cue (case-marking) took precedence over the semantic cue (thematic role), and overwrote previous information cued by prosody. The findings indicate that during auditory sentence comprehension the processing system integrates different sources of information for argument role assignment, yet primarily relies on syntactic information.
Collapse
Affiliation(s)
- Constantijn L van der Burght
- Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, 04103 Leipzig, Germany; Lise Meitner Research Group Cognition and Plasticity, Max Planck Institute for Human Cognitive and Brain Sciences, 04103 Leipzig, Germany.
| | - Angela D Friederici
- Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, 04103 Leipzig, Germany
| | - Tomás Goucha
- Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, 04103 Leipzig, Germany
| | - Gesa Hartwigsen
- Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, 04103 Leipzig, Germany; Lise Meitner Research Group Cognition and Plasticity, Max Planck Institute for Human Cognitive and Brain Sciences, 04103 Leipzig, Germany
| |
Collapse
|