1
|
Rizzi R, Bidelman GM. Functional benefits of continuous vs. categorical listening strategies on the neural encoding and perception of noise-degraded speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594387. [PMID: 38798410 PMCID: PMC11118460 DOI: 10.1101/2024.05.15.594387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Acoustic information in speech changes continuously, yet listeners form discrete perceptual categories to ease the demands of perception. Being a more continuous/gradient as opposed to a discrete/categorical listener may be further advantageous for understanding speech in noise by increasing perceptual flexibility and resolving ambiguity. The degree to which a listener's responses to a continuum of speech sounds are categorical versus continuous can be quantified using visual analog scaling (VAS) during speech labeling tasks. Here, we recorded event-related brain potentials (ERPs) to vowels along an acoustic-phonetic continuum (/u/ to /a/) while listeners categorized phonemes in both clean and noise conditions. Behavior was assessed using standard two alternative forced choice (2AFC) and VAS paradigms to evaluate categorization under task structures that promote discrete (2AFC) vs. continuous (VAS) hearing, respectively. Behaviorally, identification curves were steeper under 2AFC vs. VAS categorization but were relatively immune to noise, suggesting robust access to abstract, phonetic categories even under signal degradation. Behavioral slopes were positively correlated with listeners' QuickSIN scores, suggesting a behavioral advantage for speech in noise comprehension conferred by gradient listening strategy. At the neural level, electrode level data revealed P2 peak amplitudes of the ERPs were modulated by task and noise; responses were larger under VAS vs. 2AFC categorization and showed larger noise-related delay in latency in the VAS vs. 2AFC condition. More gradient responders also had smaller shifts in ERP latency with noise, suggesting their neural encoding of speech was more resilient to noise degradation. Interestingly, source-resolved ERPs showed that more gradient listening was also correlated with stronger neural responses in left superior temporal gyrus. Our results demonstrate that listening strategy (i.e., being a discrete vs. continuous listener) modulates the categorical organization of speech and behavioral success, with continuous/gradient listening being more advantageous to speech in noise perception.
Collapse
|
2
|
Bidelman GM, Bernard F, Skubic K. Hearing in categories aids speech streaming at the "cocktail party". BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.03.587795. [PMID: 38617284 PMCID: PMC11014555 DOI: 10.1101/2024.04.03.587795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Our perceptual system bins elements of the speech signal into categories to make speech perception manageable. Here, we aimed to test whether hearing speech in categories (as opposed to a continuous/gradient fashion) affords yet another benefit to speech recognition: parsing noisy speech at the "cocktail party." We measured speech recognition in a simulated 3D cocktail party environment. We manipulated task difficulty by varying the number of additional maskers presented at other spatial locations in the horizontal soundfield (1-4 talkers) and via forward vs. time-reversed maskers, promoting more and less informational masking (IM), respectively. In separate tasks, we measured isolated phoneme categorization using two-alternative forced choice (2AFC) and visual analog scaling (VAS) tasks designed to promote more/less categorical hearing and thus test putative links between categorization and real-world speech-in-noise skills. We first show that listeners can only monitor up to ~3 talkers despite up to 5 in the soundscape and streaming is not related to extended high-frequency hearing thresholds (though QuickSIN scores are). We then confirm speech streaming accuracy and speed decline with additional competing talkers and amidst forward compared to reverse maskers with added IM. Dividing listeners into "discrete" vs. "continuous" categorizers based on their VAS labeling (i.e., whether responses were binary or continuous judgments), we then show the degree of IM experienced at the cocktail party is predicted by their degree of categoricity in phoneme labeling; more discrete listeners are less susceptible to IM than their gradient responding peers. Our results establish a link between speech categorization skills and cocktail party processing, with a categorical (rather than gradient) listening strategy benefiting degraded speech perception. These findings imply figure-ground deficits common in many disorders might arise through a surprisingly simple mechanism: a failure to properly bin sounds into categories.
Collapse
Affiliation(s)
- Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA
- Program in Neuroscience, Indiana University, Bloomington, IN, USA
- Cognitive Science Program, Indiana University, Bloomington, IN, USA
| | - Fallon Bernard
- School of Communication Sciences & Disorders, University of Memphis, Memphis TN, USA
| | - Kimberly Skubic
- School of Communication Sciences & Disorders, University of Memphis, Memphis TN, USA
| |
Collapse
|
3
|
Vanden Bosch der Nederlanden CM, Qi X, Sequeira S, Seth P, Grahn JA, Joanisse MF, Hannon EE. Developmental changes in the categorization of speech and song. Dev Sci 2023; 26:e13346. [PMID: 36419407 DOI: 10.1111/desc.13346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 10/04/2022] [Accepted: 10/22/2022] [Indexed: 11/27/2022]
Abstract
Music and language are two fundamental forms of human communication. Many studies examine the development of music- and language-specific knowledge, but few studies compare how listeners know they are listening to music or language. Although we readily differentiate these domains, how we distinguish music and language-and especially speech and song- is not obvious. In two studies, we asked how listeners categorize speech and song. Study 1 used online survey data to illustrate that 4- to 17-year-olds and adults have verbalizable distinctions for speech and song. At all ages, listeners described speech and song differences based on acoustic features, but compared with older children, 4- to 7-year-olds more often used volume to describe differences, suggesting that they are still learning to identify the features most useful for differentiating speech from song. Study 2 used a perceptual categorization task to demonstrate that 4-8-year-olds and adults readily categorize speech and song, but this ability improves with age especially for identifying song. Despite generally rating song as more speech-like, 4- and 6-year-olds rated ambiguous speech-song stimuli as more song-like than 8-year-olds and adults. Four acoustic features predicted song ratings: F0 instability, utterance duration, harmonicity, and spectral flux. However, 4- and 6-year-olds' song ratings were better predicted by F0 instability than by harmonicity and utterance duration. These studies characterize how children develop conceptual and perceptual understandings of speech and song and suggest that children under age 8 are still learning what features are important for categorizing utterances as speech or song. RESEARCH HIGHLIGHTS: Children and adults conceptually and perceptually categorize speech and song from age 4. Listeners use F0 instability, harmonicity, spectral flux, and utterance duration to determine whether vocal stimuli sound like song. Acoustic cue weighting changes with age, becoming adult-like at age 8 for perceptual categorization and at age 12 for conceptual differentiation. Young children are still learning to categorize speech and song, which leaves open the possibility that music- and language-specific skills are not so domain-specific.
Collapse
Affiliation(s)
| | - Xin Qi
- The Brain and Mind Institute, Western University, London, Canada
| | - Sarah Sequeira
- The Brain and Mind Institute, Western University, London, Canada
| | - Prakhar Seth
- The Brain and Mind Institute, Western University, London, Canada
| | - Jessica A Grahn
- The Brain and Mind Institute, Western University, London, Canada
- Department of Psychology, Western University, London, Canada
| | - Marc F Joanisse
- The Brain and Mind Institute, Western University, London, Canada
- Department of Psychology, Western University, London, Canada
| | - Erin E Hannon
- Department of Psychology, University of Nevada, Las Vegas, Nevada, USA
| |
Collapse
|
4
|
McMurray B. I'm not sure that curve means what you think it means: Toward a [more] realistic understanding of the role of eye-movement generation in the Visual World Paradigm. Psychon Bull Rev 2023; 30:102-146. [PMID: 35962241 PMCID: PMC10964151 DOI: 10.3758/s13423-022-02143-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/29/2022] [Indexed: 11/08/2022]
Abstract
The Visual World Paradigm (VWP) is a powerful experimental paradigm for language research. Listeners respond to speech in a "visual world" containing potential referents of the speech. Fixations to these referents provides insight into the preliminary states of language processing as decisions unfold. The VWP has become the dominant paradigm in psycholinguistics and extended to every level of language, development, and disorders. Part of its impact is the impressive data visualizations which reveal the millisecond-by-millisecond time course of processing, and advances have been made in developing new analyses that precisely characterize this time course. All theoretical and statistical approaches make the tacit assumption that the time course of fixations is closely related to the underlying activation in the system. However, given the serial nature of fixations and their long refractory period, it is unclear how closely the observed dynamics of the fixation curves are actually coupled to the underlying dynamics of activation. I investigated this assumption with a series of simulations. Each simulation starts with a set of true underlying activation functions and generates simulated fixations using a simple stochastic sampling procedure that respects the sequential nature of fixations. I then analyzed the results to determine the conditions under which the observed fixations curves match the underlying functions, the reliability of the observed data, and the implications for Type I error and power. These simulations demonstrate that even under the simplest fixation-based models, observed fixation curves are systematically biased relative to the underlying activation functions, and they are substantially noisier, with important implications for reliability and power. I then present a potential generative model that may ultimately overcome many of these issues.
Collapse
Affiliation(s)
- Bob McMurray
- Department of Psychological and Brain Sciences, 278 PBSB, University of Iowa, Iowa City, IA, 52242, USA.
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA, USA.
- Department of Linguistics, University of Iowa, Iowa City, IA, USA.
- Department of Otolaryngology, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
5
|
Selective adaptation of German /r/: A role for perceptual saliency. Atten Percept Psychophys 2023; 85:222-233. [PMID: 36477703 PMCID: PMC9816247 DOI: 10.3758/s13414-022-02603-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/14/2022] [Indexed: 12/12/2022]
Abstract
In three experiments, we examined selective adaptation of German /r/ depending on the positional and allophonic overlap between adaptors and targets. A previous study had shown that selective adaptation effects with /r/ in Dutch require allophonic overlap between adaptor and target. We aimed at replicating this finding in German, which also has many allophones of /r/. German post-vocalic /r/ is often vocalized, and pre-vocalic /r/ can occur in at least three forms: uvular fricative [ʁ], uvular trill [ʀ] and alveolar trill [r]. We tested selective adaptation between these variants. The critical questions were whether an allophonic overlap is necessary for adaptation or whether phonemic overlap is sufficient to generate an adaptation effect. Surprisingly, our results show that both assertations are wrong: Adaptation does not require an allophonic overlap between adaptors and target and neither is phonemic overlap sufficient. Even more surprisingly, trilled adaptors led to more adaptation for a uvular-fricative target than uvular-fricative adaptors themselves. We suggest that the perceptual salience of the adaptors may be a hitherto underestimated influence on selective adaptation.
Collapse
|
6
|
McMurray B. The myth of categorical perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3819. [PMID: 36586868 PMCID: PMC9803395 DOI: 10.1121/10.0016614] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 11/26/2022] [Accepted: 12/06/2022] [Indexed: 05/29/2023]
Abstract
Categorical perception (CP) is likely the single finding from speech perception with the biggest impact on cognitive science. However, within speech perception, it is widely known to be an artifact of task demands. CP is empirically defined as a relationship between phoneme identification and discrimination. As discrimination tasks do not appear to require categorization, this was thought to support the claim that listeners perceive speech solely in terms of linguistic categories. However, 50 years of work using discrimination tasks, priming, the visual world paradigm, and event related potentials has rejected the strongest forms of CP and provided little strong evidence for any form of it. This paper reviews the origins and impact of this scientific meme and the work challenging it. It discusses work showing that the encoding of auditory input is largely continuous, not categorical, and describes the modern theoretical synthesis in which listeners preserve fine-grained detail to enable more flexible processing. This synthesis is fundamentally inconsistent with CP. This leads to a different understanding of how to use and interpret the most basic paradigms in speech perception-phoneme identification along a continuum-and has implications for understanding language and hearing disorders, development, and multilingualism.
Collapse
Affiliation(s)
- Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, Iowa 52242, USA
| |
Collapse
|
7
|
Kutlu E, Chiu S, McMurray B. Moving away from deficiency models: Gradiency in bilingual speech categorization. Front Psychol 2022; 13:1033825. [PMID: 36507048 PMCID: PMC9730410 DOI: 10.3389/fpsyg.2022.1033825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 11/03/2022] [Indexed: 11/25/2022] Open
Abstract
For much of its history, categorical perception was treated as a foundational theory of speech perception, which suggested that quasi-discrete categorization was a goal of speech perception. This had a profound impact on bilingualism research which adopted similar tasks to use as measures of nativeness or native-like processing, implicitly assuming that any deviation from discreteness was a deficit. This is particularly problematic for listeners like heritage speakers whose language proficiency, both in their heritage language and their majority language, is questioned. However, we now know that in the monolingual listener, speech perception is gradient and listeners use this gradiency to adjust subphonetic details, recover from ambiguity, and aid learning and adaptation. This calls for new theoretical and methodological approaches to bilingualism. We present the Visual Analogue Scaling task which avoids the discrete and binary assumptions of categorical perception and can capture gradiency more precisely than other measures. Our goal is to provide bilingualism researchers new conceptual and empirical tools that can help examine speech categorization in different bilingual communities without the necessity of forcing their speech categorization into discrete units and without assuming a deficit model.
Collapse
Affiliation(s)
- Ethan Kutlu
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, United States,Department of Linguistics, University of Iowa, Iowa City, IA, United States,*Correspondence: Ethan Kutlu,
| | - Samantha Chiu
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, United States
| | - Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, United States,Department of Linguistics, University of Iowa, Iowa City, IA, United States
| |
Collapse
|
8
|
Winn MB, Wright RA. Reconsidering commonly used stimuli in speech perception experiments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1394. [PMID: 36182291 DOI: 10.1121/10.0013415] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 07/18/2022] [Indexed: 06/16/2023]
Abstract
This paper examines some commonly used stimuli in speech perception experiments and raises questions about their use, or about the interpretations of previous results. The takeaway messages are: 1) the Hillenbrand vowels represent a particular dialect rather than a gold standard, and English vowels contain spectral dynamics that have been largely underappreciated, 2) the /ɑ/ context is very common but not clearly superior as a context for testing consonant perception, 3) /ɑ/ is particularly problematic when testing voice-onset-time perception because it introduces strong confounds in the formant transitions, 4) /dɑ/ is grossly overrepresented in neurophysiological studies and yet is insufficient as a generalized proxy for "speech perception," and 5) digit tests and matrix sentences including the coordinate response measure are systematically insensitive to important patterns in speech perception. Each of these stimulus sets and concepts is described with careful attention to their unique value and also cases where they might be misunderstood or over-interpreted.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Richard A Wright
- Department of Linguistics, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
9
|
Brown M, Tanenhaus MK, Dilley L. Syllable Inference as a Mechanism for Spoken Language Understanding. Top Cogn Sci 2021; 13:351-398. [PMID: 33780156 DOI: 10.1111/tops.12529] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 02/24/2021] [Accepted: 02/25/2021] [Indexed: 01/25/2023]
Abstract
A classic problem in spoken language comprehension is how listeners perceive speech as being composed of discrete words, given the variable time-course of information in continuous signals. We propose a syllable inference account of spoken word recognition and segmentation, according to which alternative hierarchical models of syllables, words, and phonemes are dynamically posited, which are expected to maximally predict incoming sensory input. Generative models are combined with current estimates of context speech rate drawn from neural oscillatory dynamics, which are sensitive to amplitude rises. Over time, models which result in local minima in error between predicted and recently experienced signals give rise to perceptions of hearing words. Three experiments using the visual world eye-tracking paradigm with a picture-selection task tested hypotheses motivated by this framework. Materials were sentences that were acoustically ambiguous in numbers of syllables, words, and phonemes they contained (cf. English plural constructions, such as "saw (a) raccoon(s) swimming," which have two loci of grammatical information). Time-compressing, or expanding, speech materials permitted determination of how temporal information at, or in the context of, each locus affected looks to, and selection of, pictures with a singular or plural referent (e.g., one or more than one raccoon). Supporting our account, listeners probabilistically interpreted identical chunks of speech as consistent with a singular or plural referent to a degree that was based on the chunk's gradient rate in relation to its context. We interpret these results as evidence that arriving temporal information, judged in relation to language model predictions generated from context speech rate evaluated on a continuous scale, informs inferences about syllables, thereby giving rise to perceptual experiences of understanding spoken language as words separated in time.
Collapse
Affiliation(s)
- Meredith Brown
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, USA.,Department of Psychiatry and Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts, USA.,Department of Psychology, Tufts University, Medford, Massachusetts, USA
| | - Michael K Tanenhaus
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, USA.,School of Psychology, Nanjing Normal University, Nanjing, China
| | - Laura Dilley
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
10
|
Ou J, Yu ACL, Xiang M. Individual Differences in Categorization Gradience As Predicted by Online Processing of Phonetic Cues During Spoken Word Recognition: Evidence From Eye Movements. Cogn Sci 2021; 45:e12948. [PMID: 33682211 DOI: 10.1111/cogs.12948] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 12/13/2020] [Accepted: 12/21/2020] [Indexed: 11/30/2022]
Abstract
Recent studies have documented substantial variability among typical listeners in how gradiently they categorize speech sounds, and this variability in categorization gradience may link to how listeners weight different cues in the incoming signal. The present study tested the relationship between categorization gradience and cue weighting across two sets of English contrasts, each varying orthogonally in two acoustic dimensions. Participants performed a four-alternative forced-choice identification task in a visual world paradigm while their eye movements were monitored. We found that (a) greater categorization gradience derived from behavioral identification responses corresponds to larger secondary cue weights derived from eye movements; (b) the relationship between categorization gradience and secondary cue weighting is observed across cues and contrasts, suggesting that categorization gradience may be a consistent within-individual property in speech perception; and (c) listeners who showed greater categorization gradience tend to adopt a buffered processing strategy, especially when cues arrive asynchronously in time.
Collapse
Affiliation(s)
- Jinghua Ou
- Department of Linguistics, University of Chicago
| | - Alan C L Yu
- Department of Linguistics, University of Chicago
| | - Ming Xiang
- Department of Linguistics, University of Chicago
| |
Collapse
|
11
|
Jasmin K, Dick F, Holt LL, Tierney A. Tailored perception: Individuals' speech and music perception strategies fit their perceptual abilities. J Exp Psychol Gen 2020; 149:914-934. [PMID: 31589067 PMCID: PMC7133494 DOI: 10.1037/xge0000688] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 08/09/2019] [Accepted: 08/12/2019] [Indexed: 01/09/2023]
Abstract
Perception involves integration of multiple dimensions that often serve overlapping, redundant functions, for example, pitch, duration, and amplitude in speech. Individuals tend to prioritize these dimensions differently (stable, individualized perceptual strategies), but the reason for this has remained unclear. Here we show that perceptual strategies relate to perceptual abilities. In a speech cue weighting experiment (trial N = 990), we first demonstrate that individuals with a severe deficit for pitch perception (congenital amusics; N = 11) categorize linguistic stimuli similarly to controls (N = 11) when the main distinguishing cue is duration, which they perceive normally. In contrast, in a prosodic task where pitch cues are the main distinguishing factor, we show that amusics place less importance on pitch and instead rely more on duration cues-even when pitch differences in the stimuli are large enough for amusics to discern. In a second experiment testing musical and prosodic phrase interpretation (N = 16 amusics; 15 controls), we found that relying on duration allowed amusics to overcome their pitch deficits to perceive speech and music successfully. We conclude that auditory signals, because of their redundant nature, are robust to impairments for specific dimensions, and that optimal speech and music perception strategies depend not only on invariant acoustic dimensions (the physical signal), but on perceptual dimensions whose precision varies across individuals. Computational models of speech perception (indeed, all types of perception involving redundant cues e.g., vision and touch) should therefore aim to account for the precision of perceptual dimensions and characterize individuals as well as groups. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Fred Dick
- Department of Psychological Sciences
| | | | | |
Collapse
|
12
|
Galle ME, Klein-Packard J, Schreiber K, McMurray B. What Are You Waiting For? Real-Time Integration of Cues for Fricatives Suggests Encapsulated Auditory Memory. Cogn Sci 2020; 43. [PMID: 30648798 DOI: 10.1111/cogs.12700] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 10/15/2018] [Accepted: 10/25/2018] [Indexed: 11/30/2022]
Abstract
Speech unfolds over time, and the cues for even a single phoneme are rarely available simultaneously. Consequently, to recognize a single phoneme, listeners must integrate material over several hundred milliseconds. Prior work contrasts two accounts: (a) a memory buffer account in which listeners accumulate auditory information in memory and only access higher level representations (i.e., lexical representations) when sufficient information has arrived; and (b) an immediate integration scheme in which lexical representations can be partially activated on the basis of early cues and then updated when more information arises. These studies have uniformly shown evidence for immediate integration for a variety of phonetic distinctions. We attempted to extend this to fricatives, a class of speech sounds which requires not only temporal integration of asynchronous cues (the frication, followed by the formant transitions 150-350 ms later), but also integration across different frequency bands and compensation for contextual factors like coarticulation. Eye movements in the visual world paradigm showed clear evidence for a memory buffer. Results were replicated in five experiments, ruling out methodological factors and tying the release of the buffer to the onset of the vowel. These findings support a general auditory account for speech by suggesting that the acoustic nature of particular speech sounds may have large effects on how they are processed. It also has major implications for theories of auditory and speech perception by raising the possibility of an encapsulated memory buffer in early auditory processing.
Collapse
Affiliation(s)
- Marcus E Galle
- Department of Psychological and Brain Sciences, University of Iowa
| | | | | | - Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa.,Department of Communication Sciences and Disorders, University of Iowa.,Department of Linguistics, University of Iowa.,Department of Otolaryngology, University of Iowa
| |
Collapse
|
13
|
Winn MB. Manipulation of voice onset time in speech stimuli: A tutorial and flexible Praat script. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:852. [PMID: 32113256 DOI: 10.1121/10.0000692] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 01/22/2020] [Indexed: 06/10/2023]
Abstract
Voice onset time (VOT) is an acoustic property of stop consonants that is commonly manipulated in studies of phonetic perception. This paper contains a thorough description of the "progressive cutback and replacement" method of VOT manipulation, and comparison with other VOT manipulation techniques. Other acoustic properties that covary with VOT-such as fundamental frequency and formant transitions-are also discussed, along with considerations for testing VOT perception and its relationship to various other measures of auditory temporal or spectral processing. An implementation of the progressive cutback and replacement method in the Praat scripting language is presented, which is suitable for modifying natural speech for perceptual experiments involving VOT and/or related covarying F0 and intensity cues. Justifications are provided for the stimulus design choices and constraints implemented in the script.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, 164 Pillsbury Drive Southeast, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
14
|
Hestvik A, Shinohara Y, Durvasula K, Verdonschot RG, Sakai H. Abstractness of human speech sound representations. Brain Res 2020; 1732:146664. [PMID: 31930995 DOI: 10.1016/j.brainres.2020.146664] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 01/02/2020] [Accepted: 01/09/2020] [Indexed: 10/25/2022]
Abstract
We argue, based on a study of brain responses to speech sound differences in Japanese, that memory encoding of functional speech sounds-phonemes-are highly abstract. As an example, we provide evidence for a theory where the consonants/p t k b d g/ are not only made up of symbolic features but are underspecified with respect to voicing or laryngeal features, and that languages differ with respect to which feature value is underspecified. In a previous study we showed that voiced stops are underspecified in English [Hestvik, A., & Durvasula, K. (2016). Neurobiological evidence for voicing underspecification in English. Brain and Language], as shown by asymmetries in Mismatch Negativity responses to /t/ and /d/. In the current study, we test the prediction that the opposite asymmetry should be observed in Japanese, if voiceless stops are underspecified in that language. Our results confirm this prediction. This matches a linguistic architecture where phonemes are highly abstract and do not encode actual physical characteristics of the corresponding speech sounds, but rather different subsets of abstract distinctive features.
Collapse
|
15
|
Lewis GA, Bidelman GM. Autonomic Nervous System Correlates of Speech Categorization Revealed Through Pupillometry. Front Neurosci 2020; 13:1418. [PMID: 31998068 PMCID: PMC6967406 DOI: 10.3389/fnins.2019.01418] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 12/16/2019] [Indexed: 02/06/2023] Open
Abstract
Human perception requires the many-to-one mapping between continuous sensory elements and discrete categorical representations. This grouping operation underlies the phenomenon of categorical perception (CP)-the experience of perceiving discrete categories rather than gradual variations in signal input. Speech perception requires CP because acoustic cues do not share constant relations with perceptual-phonetic representations. Beyond facilitating perception of unmasked speech, we reasoned CP might also aid the extraction of target speech percepts from interfering sound sources (i.e., noise) by generating additional perceptual constancy and reducing listening effort. Specifically, we investigated how noise interference impacts cognitive load and perceptual identification of unambiguous (i.e., categorical) vs. ambiguous stimuli. Listeners classified a speech vowel continuum (/u/-/a/) at various signal-to-noise ratios (SNRs [unmasked, 0 and -5 dB]). Continuous recordings of pupil dilation measured processing effort, with larger, later dilations reflecting increased listening demand. Critical comparisons were between time-locked changes in eye data in response to unambiguous (i.e., continuum endpoints) tokens vs. ambiguous tokens (i.e., continuum midpoint). Unmasked speech elicited faster responses and sharper psychometric functions, which steadily declined in noise. Noise increased pupil dilation across stimulus conditions, but not straightforwardly. Noise-masked speech modulated peak pupil size (i.e., [0 and -5 dB] > unmasked). In contrast, peak dilation latency varied with both token and SNR. Interestingly, categorical tokens elicited earlier pupil dilation relative to ambiguous tokens. Our pupillary data suggest CP reconstructs auditory percepts under challenging listening conditions through interactions between stimulus salience and listeners' internalized effort and/or arousal.
Collapse
Affiliation(s)
- Gwyneth A Lewis
- Institute for Intelligent Systems, The University of Memphis, Memphis, TN, United States.,School of Communication Sciences and Disorders, The University of Memphis, Memphis, TN, United States
| | - Gavin M Bidelman
- Institute for Intelligent Systems, The University of Memphis, Memphis, TN, United States.,School of Communication Sciences and Disorders, The University of Memphis, Memphis, TN, United States.,Department of Anatomy and Neurobiology, University of Tennessee Health Sciences Center, Memphis, TN, United States
| |
Collapse
|
16
|
Llompart M, Reinisch E. Imitation in a Second Language Relies on Phonological Categories but Does Not Reflect the Productive Usage of Difficult Sound Contrasts. LANGUAGE AND SPEECH 2019; 62:594-622. [PMID: 30319031 DOI: 10.1177/0023830918803978] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This study investigated the relationship between imitation and both the perception and production abilities of second language (L2) learners for two non-native contrasts differing in their expected degree of difficulty. German learners of English were tested on perceptual categorization, imitation and a word reading task for the difficult English /ɛ/-/æ/ contrast, which tends not to be well encoded in the learners' phonological inventories, and the easy, near-native /i/-/ɪ/ contrast. As expected, within-task comparisons between contrasts revealed more robust perception and better differentiation during production for /i/-/ɪ/ than /ɛ/-/æ/. Imitation also followed this pattern, suggesting that imitation is modulated by the phonological encoding of L2 categories. Moreover, learners' ability to imitate /ɛ/ and /æ/ was related to their perception of that contrast, confirming a tight perception-production link at the phonological level for difficult L2 sound contrasts. However, no relationship was observed between acoustic measures for imitated and read-aloud tokens of /ɛ/ and /æ/. This dissociation is mostly attributed to the influence of inaccurate non-native lexical representations in the word reading task. We conclude that imitation is strongly related to the phonological representation of L2 sound contrasts, but does not need to reflect the learners' productive usage of such non-native distinctions.
Collapse
|
17
|
Li MY, Braze D, Kukona A, Johns CL, Tabor W, Van Dyke JA, Mencl WE, Shankweiler DP, Pugh KR, Magnuson JS. Individual differences in subphonemic sensitivity and phonological skills. JOURNAL OF MEMORY AND LANGUAGE 2019; 107:195-215. [PMID: 31431796 PMCID: PMC6701851 DOI: 10.1016/j.jml.2019.03.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Many studies have established a link between phonological abilities (indexed by phonological awareness and phonological memory tasks) and typical and atypical reading development. Individuals who perform poorly on phonological assessments have been mostly assumed to have underspecified (or "fuzzy") phonological representations, with typical phonemic categories, but with greater category overlap due to imprecise encoding. An alternative posits that poor readers have overspecified phonological representations, with speech sounds perceived allophonically (phonetically distinct variants of a single phonemic category). On both accounts, mismatch between phonological categories and orthography leads to reading difficulty. Here, we consider the implications of these accounts for online speech processing. We used eye tracking and an individual differences approach to assess sensitivity to subphonemic detail in a community sample of young adults with a wide range of reading-related skills. Subphonemic sensitivity inversely correlated with meta-phonological task performance, consistent with overspecification.
Collapse
Affiliation(s)
- Monica Y.C. Li
- Department of Psychological Sciences, University of
Connecticut, Storrs, CT 06269-1020, USA
- Connecticut Institute for the Brain and Cognitive Sciences,
University of Connecticut, Storrs, CT 06269-1272, USA
- Brain Imaging Research Center, University of Connecticut,
Storrs, CT 06269-1271, USA
- Haskins Laboratories, 300 George St., New Haven, CT 06510,
USA
| | - David Braze
- Connecticut Institute for the Brain and Cognitive Sciences,
University of Connecticut, Storrs, CT 06269-1272, USA
- Haskins Laboratories, 300 George St., New Haven, CT 06510,
USA
| | - Anuenue Kukona
- Haskins Laboratories, 300 George St., New Haven, CT 06510,
USA
- School of Applied Social Sciences, De Montfort University,
The Gateway, Leicester, LE1 9BH, UK
| | | | - Whitney Tabor
- Department of Psychological Sciences, University of
Connecticut, Storrs, CT 06269-1020, USA
- Connecticut Institute for the Brain and Cognitive Sciences,
University of Connecticut, Storrs, CT 06269-1272, USA
- Haskins Laboratories, 300 George St., New Haven, CT 06510,
USA
| | - Julie A. Van Dyke
- Connecticut Institute for the Brain and Cognitive Sciences,
University of Connecticut, Storrs, CT 06269-1272, USA
- Haskins Laboratories, 300 George St., New Haven, CT 06510,
USA
| | - W. Einar Mencl
- Haskins Laboratories, 300 George St., New Haven, CT 06510,
USA
- Department of Linguistics, Yale University, New Haven, CT
06520, USA
| | - Donald P. Shankweiler
- Department of Psychological Sciences, University of
Connecticut, Storrs, CT 06269-1020, USA
- Haskins Laboratories, 300 George St., New Haven, CT 06510,
USA
| | - Kenneth R. Pugh
- Department of Psychological Sciences, University of
Connecticut, Storrs, CT 06269-1020, USA
- Connecticut Institute for the Brain and Cognitive Sciences,
University of Connecticut, Storrs, CT 06269-1272, USA
- Brain Imaging Research Center, University of Connecticut,
Storrs, CT 06269-1271, USA
- Haskins Laboratories, 300 George St., New Haven, CT 06510,
USA
- Department of Linguistics, Yale University, New Haven, CT
06520, USA
| | - James S. Magnuson
- Department of Psychological Sciences, University of
Connecticut, Storrs, CT 06269-1020, USA
- Connecticut Institute for the Brain and Cognitive Sciences,
University of Connecticut, Storrs, CT 06269-1272, USA
- Brain Imaging Research Center, University of Connecticut,
Storrs, CT 06269-1271, USA
- Haskins Laboratories, 300 George St., New Haven, CT 06510,
USA
| |
Collapse
|
18
|
Lexical processing depends on sublexical processing: Evidence from the visual world paradigm and aphasia. Atten Percept Psychophys 2019; 81:1047-1064. [DOI: 10.3758/s13414-019-01718-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
19
|
Getz LM, Toscano JC. Electrophysiological Evidence for Top-Down Lexical Influences on Early Speech Perception. Psychol Sci 2019; 30:830-841. [PMID: 31018103 DOI: 10.1177/0956797619841813] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
An unresolved issue in speech perception concerns whether top-down linguistic information influences perceptual responses. We addressed this issue using the event-related-potential technique in two experiments that measured cross-modal sequential-semantic priming effects on the auditory N1, an index of acoustic-cue encoding. Participants heard auditory targets (e.g., "potatoes") following associated visual primes (e.g., "MASHED"), neutral visual primes (e.g., "FACE"), or a visual mask (e.g., "XXXX"). Auditory targets began with voiced (/b/, /d/, /g/) or voiceless (/p/, /t/, /k/) stop consonants, an acoustic difference known to yield differences in N1 amplitude. In Experiment 1 (N = 21), semantic context modulated responses to upcoming targets, with smaller N1 amplitudes for semantic associates. In Experiment 2 (N = 29), semantic context changed how listeners encoded sounds: Ambiguous voice-onset times were encoded similarly to the voicing end point elicited by semantic associates. These results are consistent with an interactive model of spoken-word recognition that includes top-down effects on early perception.
Collapse
Affiliation(s)
- Laura M Getz
- Department of Psychological and Brain Sciences, Villanova University
| | - Joseph C Toscano
- Department of Psychological and Brain Sciences, Villanova University
| |
Collapse
|
20
|
Winn MB, Kan A, Litovsky RY. Temporal dynamics and uncertainty in binaural hearing revealed by anticipatory eye movements. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:676. [PMID: 30823808 PMCID: PMC6786889 DOI: 10.1121/1.5088591] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Revised: 01/07/2019] [Accepted: 01/11/2019] [Indexed: 06/09/2023]
Abstract
Accurate perception of binaural cues is essential for left-right sound localization. Much literature focuses on threshold measures of perceptual acuity and accuracy. This study focused on supra-threshold perception using an anticipatory eye movement (AEM) paradigm designed to capture subtle aspects of perception that might not emerge in behavioral-motor responses, such as the accumulation of certainty, and rapid revisions in decision-making. Participants heard interaural timing differences (ITDs) or interaural level differences in correlated or uncorrelated narrowband noises, respectively. A cartoon ball moved behind an occluder and then emerged from the left or right side, consistent with the binaural cue. Participants anticipated the correct answer (before it appeared) by looking where the ball would emerge. Results showed quicker and more steadfast gaze fixations for stimuli with larger cue magnitudes. More difficult stimuli elicited a wider distribution of saccade times and greater number of corrective saccades before final judgment, implying perceptual uncertainty or competition. Cue levels above threshold elicited some wrong-way saccades that were quickly corrected. Saccades to ITDs were earlier and more reliable for low-frequency noises. The AEM paradigm reveals the time course of uncertainty and changes in perceptual decision-making for supra-threshold binaural stimuli even when behavioral responses are consistently correct.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, 164 Pillsbury Drive SE, Minneapolis, Minnesota 55455, USA
| | - Alan Kan
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, Wisconsin 53705, USA
| | - Ruth Y Litovsky
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, Wisconsin 53705, USA
| |
Collapse
|
21
|
Wiener S, Ito K, Speer SR. Early L2 Spoken Word Recognition Combines Input-Based and Knowledge-Based Processing. LANGUAGE AND SPEECH 2018; 61:632-656. [PMID: 29560782 DOI: 10.1177/0023830918761762] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This study examines the perceptual trade-off between knowledge of a language's statistical regularities and reliance on the acoustic signal during L2 spoken word recognition. We test how early learners track and make use of segmental and suprasegmental cues and their relative frequencies during non-native word recognition. English learners of Mandarin were taught an artificial tonal language in which a tone's informativeness for word identification varied according to neighborhood density. The stimuli mimicked Mandarin's uneven distribution of syllable+tone combinations by varying syllable frequency and the probability of particular tones co-occurring with a particular syllable. Use of statistical regularities was measured by four-alternative forced-choice judgments and by eye fixations to target and competitor symbols. Half of the participants were trained on one speaker, that is, low speaker variability while the other half were trained on four speakers. After four days of learning, the results confirmed that tones are processed according to their informativeness. Eye movements to the newly learned symbols demonstrated that L2 learners use tonal probabilities at an early stage of word recognition, regardless of speaker variability. The amount of variability in the signal, however, influenced the time course of recovery from incorrect anticipatory looks: participants exposed to low speaker variability recovered from incorrect probability-based predictions of tone more rapidly than participants exposed to greater variability. These results motivate two conclusions: early L2 learners track the distribution of segmental and suprasegmental co-occurrences and make predictions accordingly during spoken word recognition; and when the acoustic input is more variable because of multi-speaker input, listeners rely more on their knowledge of tone-syllable co-occurrence frequency distributions and less on the incoming acoustic signal.
Collapse
Affiliation(s)
- Seth Wiener
- Department of Modern Languages, Carnegie Mellon University, USA
| | - Kiwako Ito
- Department of Linguistics, The Ohio State University, USA
| | - Shari R Speer
- Department of Linguistics, The Ohio State University, USA
| |
Collapse
|
22
|
Seedorff M, Oleson J, McMurray B. Detecting when timeseries differ: Using the Bootstrapped Differences of Timeseries (BDOTS) to analyze Visual World Paradigm data (and more). JOURNAL OF MEMORY AND LANGUAGE 2018; 102:55-67. [PMID: 32863563 PMCID: PMC7450631 DOI: 10.1016/j.jml.2018.05.004] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
In the last decades, major advances in the language sciences have been built on real-time measures of language and cognitive processing, measures like mouse-tracking, event related potentials and eye-tracking in the visual world paradigm. These measures yield densely sampled timeseries that can be highly revealing of the dynamics of cognitive processing. However, despite these methodological advances, existing statistical approaches for timeseries analyses have often lagged behind. Here, we present a new statistical approach, the Bootstrapped Differences of Timeseries (BDOTS), that can estimate the precise timewindow at which two timeseries differ. BDOTS makes minimal assumptions about the error distribution, uses a custom family-wise error correction, and can flexibly be adapted to a variety of applications. This manuscript presents the theoretical basis of this approach, describes implementational issues (in the associated R package), and illustrates this technique with an analysis of an existing dataset. Pitfalls and hazards are also discussed, along with suggestions for reporting in the literature.
Collapse
Affiliation(s)
| | | | - Bob McMurray
- Dept. of Psychological and Brain Sciences, Dept. of Communication Sciences and Disorders, Dept. of Linguistics, University of Iowa
| |
Collapse
|
23
|
Holt LL, Tierney AT, Guerra G, Laffere A, Dick F. Dimension-selective attention as a possible driver of dynamic, context-dependent re-weighting in speech processing. Hear Res 2018; 366:50-64. [PMID: 30131109 PMCID: PMC6107307 DOI: 10.1016/j.heares.2018.06.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 06/10/2018] [Accepted: 06/19/2018] [Indexed: 12/24/2022]
Abstract
The contribution of acoustic dimensions to an auditory percept is dynamically adjusted and reweighted based on prior experience about how informative these dimensions are across the long-term and short-term environment. This is especially evident in speech perception, where listeners differentially weight information across multiple acoustic dimensions, and use this information selectively to update expectations about future sounds. The dynamic and selective adjustment of how acoustic input dimensions contribute to perception has made it tempting to conceive of this as a form of non-spatial auditory selective attention. Here, we review several human speech perception phenomena that might be consistent with auditory selective attention although, as of yet, the literature does not definitively support a mechanistic tie. We relate these human perceptual phenomena to illustrative nonhuman animal neurobiological findings that offer informative guideposts in how to test mechanistic connections. We next present a novel empirical approach that can serve as a methodological bridge from human research to animal neurobiological studies. Finally, we describe four preliminary results that demonstrate its utility in advancing understanding of human non-spatial dimension-based auditory selective attention.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, 15213, USA; Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| | - Adam T Tierney
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK
| | - Giada Guerra
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK
| | - Aeron Laffere
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK
| | - Frederic Dick
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK; Department of Experimental Psychology, University College London, London, WC1H 0AP, UK
| |
Collapse
|
24
|
McMurray B, Danelz A, Rigler H, Seedorff M. Speech categorization develops slowly through adolescence. Dev Psychol 2018; 54:1472-1491. [PMID: 29952600 PMCID: PMC6062449 DOI: 10.1037/dev0000542] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The development of the ability to categorize speech sounds is often viewed as occurring primarily during infancy via perceptual learning mechanisms. However, a number of studies suggest that even after infancy, children's categories become more categorical and well defined through about age 12. We investigated the cognitive changes that may be responsible for such development using a visual world paradigm experiment based on (McMurray, Tanenhaus, & Aslin, 2002). Children from 3 age groups (7-8, 12-13, and 17-18 years) heard a token from either a b/p or s/∫ continua spanning 2 words (beach/peach, ship/sip) and selected its referent from a screen containing 4 pictures of potential lexical candidates. Eye movements to each object were monitored as a measure of how strongly children were committing to each candidate as perception unfolds in real-time. Results showed an ongoing sharpening of speech categories through 18, which was particularly apparent during the early stages of real-time perception. When analysis targeted to specifically within-category sensitivity to continuous detail, children exhibited increasingly gradient categories over development, suggesting that increasing sensitivity to fine-grained detail in the signal enables these more discrete categorizations. Together these suggest that speech development is a protracted process in which children's increasing sensitivity to within-category detail in the signal enables increasingly sharp phonetic categories. (PsycINFO Database Record
Collapse
Affiliation(s)
- Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa
| | - Ani Danelz
- Department of Communication Sciences and Disorders, University of Iowa
| | | | | |
Collapse
|
25
|
Abstract
Phonemes play a central role in traditional theories as units of speech perception and access codes to lexical representations. Phonemes have two essential properties: they are 'segment-sized' (the size of a consonant or vowel) and abstract (a single phoneme may be have different acoustic realisations). Nevertheless, there is a long history of challenging the phoneme hypothesis, with some theorists arguing for differently sized phonological units (e.g. features or syllables) and others rejecting abstract codes in favour of representations that encode detailed acoustic properties of the stimulus. The phoneme hypothesis is the minority view today. We defend the phoneme hypothesis in two complementary ways. First, we show that rejection of phonemes is based on a flawed interpretation of empirical findings. For example, it is commonly argued that the failure to find acoustic invariances for phonemes rules out phonemes. However, the lack of invariance is only a problem on the assumption that speech perception is a bottom-up process. If learned sublexical codes are modified by top-down constraints (which they are), then this argument loses all force. Second, we provide strong positive evidence for phonemes on the basis of linguistic data. Almost all findings that are taken (incorrectly) as evidence against phonemes are based on psycholinguistic studies of single words. However, phonemes were first introduced in linguistics, and the best evidence for phonemes comes from linguistic analyses of complex word forms and sentences. In short, the rejection of phonemes is based on a false analysis and a too-narrow consideration of the relevant data.
Collapse
Affiliation(s)
- Nina Kazanina
- School of Experimental Psychology, University of Bristol, 12a Priory Road, Bristol, BS8 1TU, UK.
| | - Jeffrey S Bowers
- School of Experimental Psychology, University of Bristol, 12a Priory Road, Bristol, BS8 1TU, UK
| | - William Idsardi
- Department of Linguistics, University of Maryland, 1401 Marie Mount Hall, College Park, MD, 20742, USA
| |
Collapse
|
26
|
Smith JR, Treat TA, Farmer TA, McMurray B. Dynamic competition account of men's perceptions of women's sexual interest. Cognition 2018; 174:43-54. [PMID: 29407605 DOI: 10.1016/j.cognition.2017.12.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Revised: 12/17/2017] [Accepted: 12/22/2017] [Indexed: 11/18/2022]
Abstract
This work applies a dynamic competition framework of decision making to the domain of sexual perception, which is linked theoretically and empirically to college men's risk for exhibiting sexual coercion and aggression toward female acquaintances. Within a mouse-tracking paradigm, 152 undergraduate men viewed full-body photographs of women who varied in affect (sexual interest or rejection), clothing style (provocative or conservative), and attractiveness, and decided whether each woman currently felt sexually interested or rejecting. Participants' mouse movements were recorded to capture competition dynamics during online processing (throughout the decisional process), and as an index of the final categorical decision (endpoint of the decisional process). Participants completed a measure of Rape-Supportive Attitudes (RSA), a well-established correlate of male-initiated sexual aggression toward female acquaintances. Mixed-effects analyses revealed greater curvature toward the incorrect response on conceptually incongruent trials (e.g., rejecting and dressed provocatively) than on congruent trials (e.g., rejecting and dressed conservatively). This suggests that the two decision alternatives are simultaneously active and compete continuously over time, consistent with a dynamic competition account. Congruence effects also emerged at the decisional endpoint; accuracy was typically lower when stimulus features were incongruent, rather than congruent. RSA potentiated online congruence effects (intermediate states of behavior) but not offline congruence effects (endpoint states of behavior). In a hierarchical regression analysis, online processing indices accounted for unique variability in RSA above and beyond offline accuracy rates. The process-based account of men's sexual-interest judgments ultimately may point to novel targets for prevention strategies designed to reduce acquaintance-initiated sexual aggression on college campuses.
Collapse
Affiliation(s)
- Jodi R Smith
- University of Iowa, Department of Psychological and Brain Sciences, United States.
| | - Teresa A Treat
- University of Iowa, Department of Psychological and Brain Sciences, United States
| | - Thomas A Farmer
- University of Iowa, Department of Psychological and Brain Sciences, United States; University of Iowa, Department of Linguistics, United States
| | - Bob McMurray
- University of Iowa, Department of Psychological and Brain Sciences, United States; University of Iowa, Department of Linguistics, United States; University of Iowa, Department of Communication Sciences and Disorders, United States; University of Iowa, DeLTA Center, United States
| |
Collapse
|
27
|
Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization. Ear Hear 2018; 37:e377-e390. [PMID: 27438871 DOI: 10.1097/aud.0000000000000328] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. DESIGN Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. RESULTS Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. CONCLUSIONS When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.
Collapse
|
28
|
Xie X, Myers E. Left Inferior Frontal Gyrus Sensitivity to Phonetic Competition in Receptive Language Processing: A Comparison of Clear and Conversational Speech. J Cogn Neurosci 2017; 30:267-280. [PMID: 29160743 DOI: 10.1162/jocn_a_01208] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The speech signal is rife with variations in phonetic ambiguity. For instance, when talkers speak in a conversational register, they demonstrate less articulatory precision, leading to greater potential for confusability at the phonetic level compared with a clear speech register. Current psycholinguistic models assume that ambiguous speech sounds activate more than one phonological category and that competition at prelexical levels cascades to lexical levels of processing. Imaging studies have shown that the left inferior frontal gyrus (LIFG) is modulated by phonetic competition between simultaneously activated categories, with increases in activation for more ambiguous tokens. Yet, these studies have often used artificially manipulated speech and/or metalinguistic tasks, which arguably may recruit neural regions that are not critical for natural speech recognition. Indeed, a prominent model of speech processing, the dual-stream model, posits that the LIFG is not involved in prelexical processing in receptive language processing. In the current study, we exploited natural variation in phonetic competition in the speech signal to investigate the neural systems sensitive to phonetic competition as listeners engage in a receptive language task. Participants heard nonsense sentences spoken in either a clear or conversational register as neural activity was monitored using fMRI. Conversational sentences contained greater phonetic competition, as estimated by measures of vowel confusability, and these sentences also elicited greater activation in a region in the LIFG. Sentence-level phonetic competition metrics uniquely correlated with LIFG activity as well. This finding is consistent with the hypothesis that the LIFG responds to competition at multiple levels of language processing and that recruitment of this region does not require an explicit phonological judgment.
Collapse
|
29
|
Kapnoula EC, Winn MB, Kong EJ, Edwards J, McMurray B. Evaluating the sources and functions of gradiency in phoneme categorization: An individual differences approach. J Exp Psychol Hum Percept Perform 2017; 43:1594-1611. [PMID: 28406683 PMCID: PMC5561468 DOI: 10.1037/xhp0000410] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
During spoken language comprehension listeners transform continuous acoustic cues into categories (e.g., /b/ and /p/). While long-standing research suggests that phonetic categories are activated in a gradient way, there are also clear individual differences in that more gradient categorization has been linked to various communication impairments such as dyslexia and specific language impairments (Joanisse, Manis, Keating, & Seidenberg, 2000; López-Zamora, Luque, Álvarez, & Cobos, 2012; Serniclaes, Van Heghe, Mousty, Carré, & Sprenger-Charolles, 2004; Werker & Tees, 1987). Crucially, most studies have used 2-alternative forced choice (2AFC) tasks to measure the sharpness of between-category boundaries. Here we propose an alternative paradigm that allows us to measure categorization gradiency in a more direct way. Furthermore, we follow an individual differences approach to (a) link this measure of gradiency to multiple cue integration, (b) explore its relationship to a set of other cognitive processes, and (c) evaluate its role in individuals' ability to perceive speech in noise. Our results provide validation for this new method of assessing phoneme categorization gradiency and offer preliminary insights into how different aspects of speech perception may be linked to each other and to more general cognitive processes. (PsycINFO Database Record
Collapse
Affiliation(s)
- Efthymia C Kapnoula
- Department of Psychological and Brain Sciences, DeLTA Center, University of Iowa
| | - Matthew B Winn
- Department of Speech and Hearing Sciences, University of Washington
| | | | - Jan Edwards
- Department of Communication Sciences and Disorders, Waisman Center, University of Wisconsin-Madison
| | - Bob McMurray
- Department of Psychological and Brain Sciences, DeLTA Center, University of Iowa
| |
Collapse
|
30
|
Kong EJ, Edwards J. Individual differences in categorical perception of speech: Cue weighting and executive function. JOURNAL OF PHONETICS 2016; 59:40-57. [PMID: 28503007 PMCID: PMC5423668 DOI: 10.1016/j.wocn.2016.08.006] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
This study examined individual differences in categorical perception and the use of multiple acoustic cues in the perception of the stop voicing contrast. Goals were to investigate whether gradiency of speech perception was related to listeners' differential sensitivity to acoustic cues and to individual differences in executive function. The experiment included two speech perception tasks (visual analogue scaling [VAS] and anticipatory eye movement [AEM]) administered to 30 English-speaking adults in two separate experimental sessions. Stimuli were a /ta/ to /da/ continuum that systematically varied VOT and f0. Findings were that some listeners had a more gradient pattern of responses on the VAS task; the listeners who had a gradient response pattern on the VAS task also showed more sensitivity to f0 on the AEM task. The patterns were consistent across individuals tested on two separate occasions. These results suggest that variability in how categorically listeners perceive speech sounds is consistent and systematic within individuals.
Collapse
Affiliation(s)
- Eun Jong Kong
- Korea Aerospace University, 100, Hanggongdae gil, Hwajeon-dong, Deogyang-gu, Goyang-city, Gyeonggi-do, South Korea 412-791
| | - Jan Edwards
- University of Wisconsin-Madison, 301 Goodnight Hall, 1975 Willow Dr., Madison, WI 53706 USA,
| |
Collapse
|
31
|
Buz E, Tanenhaus MK, Jaeger TF. Dynamically adapted context-specific hyper-articulation: Feedback from interlocutors affects speakers' subsequent pronunciations. JOURNAL OF MEMORY AND LANGUAGE 2016; 89:68-86. [PMID: 27375344 PMCID: PMC4927008 DOI: 10.1016/j.jml.2015.12.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
We ask whether speakers can adapt their productions when feedback from their interlocutors suggests that previous productions were perceptually confusable. To address this question, we use a novel web-based task-oriented paradigm for speech recording, in which participants produce instructions towards a (simulated) partner with naturalistic response times. We manipulate (1) whether a target word with a voiceless plosive (e.g., pill) occurs in the presence of a voiced competitor (bill) or an unrelated word (food) and (2) whether or not the simulated partner occasionally misunderstands the target word. Speakers hyper-articulated the target word when a voiced competitor was present. Moreover, the size of the hyper-articulation effect was nearly doubled when partners occasionally misunderstood the instruction. A novel type of distributional analysis further suggests that hyper-articulation did not change the target of production, but rather reduced the probability of perceptually ambiguous or confusable productions. These results were obtained in the absence of explicit clarification requests, and persisted across words and over trials. Our findings suggest that speakers adapt their pronunciations based on the perceived communicative success of their previous productions in the current environment. We discuss why speakers make adaptive changes to their speech and what mechanisms might underlie speakers' ability to do so.
Collapse
Affiliation(s)
- Esteban Buz
- Department of Brain and Cognitive Sciences, University of Rochester, United States
| | - Michael K. Tanenhaus
- Department of Brain and Cognitive Sciences, University of Rochester, United States
- Department of Linguistics, University of Rochester, United States
| | - T. Florian Jaeger
- Department of Brain and Cognitive Sciences, University of Rochester, United States
- Department of Linguistics, University of Rochester, United States
- Department of Computer Science, University of Rochester, United States
| |
Collapse
|
32
|
Roembke T, McMurray B. Observational Word Learning: Beyond Propose-But-Verify and Associative Bean Counting. JOURNAL OF MEMORY AND LANGUAGE 2016; 87:105-127. [PMID: 26858510 PMCID: PMC4742346 DOI: 10.1016/j.jml.2015.09.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Learning new words is difficult. In any naming situation, there are multiple possible interpretations of a novel word. Recent approaches suggest that learners may solve this problem by tracking co-occurrence statistics between words and referents across multiple naming situations (e.g. Yu & Smith, 2007), overcoming the ambiguity in any one situation. Yet, there remains debate around the underlying mechanisms. We conducted two experiments in which learners acquired eight word-object mappings using cross-situational statistics while eye-movements were tracked. These addressed four unresolved questions regarding the learning mechanism. First, eye-movements during learning showed evidence that listeners maintain multiple hypotheses for a given word and bring them all to bear in the moment of naming. Second, trial-by-trial analyses of accuracy suggested that listeners accumulate continuous statistics about word/object mappings, over and above prior hypotheses they have about a word. Third, consistent, probabilistic context can impede learning, as false associations between words and highly co-occurring referents are formed. Finally, a number of factors not previously considered in prior analysis impact observational word learning: knowledge of the foils, spatial consistency of the target object, and the number of trials between presentations of the same word. This evidence suggests that observational word learning may derive from a combination of gradual statistical or associative learning mechanisms and more rapid real-time processes such as competition, mutual exclusivity and even inference or hypothesis testing.
Collapse
Affiliation(s)
- Tanja Roembke
- Dept. of Psychological and Brain Sciences University of Iowa
| | - Bob McMurray
- Dept. of Psychological and Brain Sciences Dept. of Communication Sciences and Disorders Dept. of Linguistics University of Iowa
| |
Collapse
|
33
|
McMurray B, Farris-Trimble A, Seedorff M, Rigler H. The Effect of Residual Acoustic Hearing and Adaptation to Uncertainty on Speech Perception in Cochlear Implant Users: Evidence From Eye-Tracking. Ear Hear 2016; 37:e37-51. [PMID: 26317298 PMCID: PMC4717908 DOI: 10.1097/aud.0000000000000207] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVES While outcomes with cochlear implants (CIs) are generally good, performance can be fragile. The authors examined two factors that are crucial for good CI performance. First, while there is a clear benefit for adding residual acoustic hearing to CI stimulation (typically in low frequencies), it is unclear whether this contributes directly to phonetic categorization. Thus, the authors examined perception of voicing (which uses low-frequency acoustic cues) and fricative place of articulation (s/∫, which does not) in CI users with and without residual acoustic hearing. Second, in speech categorization experiments, CI users typically show shallower identification functions. These are typically interpreted as deriving from noisy encoding of the signal. However, psycholinguistic work suggests shallow slopes may also be a useful way to adapt to uncertainty. The authors thus employed an eye-tracking paradigm to examine this in CI users. DESIGN Participants were 30 CI users (with a variety of configurations) and 22 age-matched normal hearing (NH) controls. Participants heard tokens from six b/p and six s/∫ continua (eight steps) spanning real words (e.g., beach/peach, sip/ship). Participants selected the picture corresponding to the word they heard from a screen containing four items (a b-, p-, s- and ∫-initial item). Eye movements to each object were monitored as a measure of how strongly they were considering each interpretation in the moments leading up to their final percept. RESULTS Mouse-click results (analogous to phoneme identification) for voicing showed a shallower slope for CI users than NH listeners, but no differences between CI users with and without residual acoustic hearing. For fricatives, CI users also showed a shallower slope, but unexpectedly, acoustic + electric listeners showed an even shallower slope. Eye movements showed a gradient response to fine-grained acoustic differences for all listeners. Even considering only trials in which a participant clicked "b" (for example), and accounting for variation in the category boundary, participants made more looks to the competitor ("p") as the voice onset time neared the boundary. CI users showed a similar pattern, but looked to the competitor more than NH listeners, and this was not different at different continuum steps. CONCLUSION Residual acoustic hearing did not improve voicing categorization suggesting it may not help identify these phonetic cues. The fact that acoustic + electric users showed poorer performance on fricatives was unexpected as they usually show a benefit in standardized perception measures, and as sibilants contain little energy in the low-frequency (acoustic) range. The authors hypothesize that these listeners may overweight acoustic input, and have problems when this is not available (in fricatives). Thus, the benefit (or cost) of acoustic hearing for phonetic categorization may be complex. Eye movements suggest that in both CI and NH listeners, phoneme categorization is not a process of mapping continuous cues to discrete categories. Rather listeners preserve gradiency as a way to deal with uncertainty. CI listeners appear to adapt to their implant (in part) by amplifying competitor activation to preserve their flexibility in the face of potential misperceptions.
Collapse
Affiliation(s)
- Bob McMurray
- Departments of Psychological and Brain Sciences, Communication Sciences and Disorders, and Linguistics, University of Iowa, Iowa City, Iowa, USA
| | - Ashley Farris-Trimble
- Department of Linguistics, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Michael Seedorff
- Department of Biostatistics, University of Iowa, Iowa City, Iowa, USA
| | - Hannah Rigler
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
34
|
Oleson JJ, Cavanaugh JE, McMurray B, Brown G. Detecting time-specific differences between temporal nonlinear curves: Analyzing data from the visual world paradigm. Stat Methods Med Res 2015; 26:2708-2725. [PMID: 26400088 DOI: 10.1177/0962280215607411] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In multiple fields of study, time series measured at high frequencies are used to estimate population curves that describe the temporal evolution of some characteristic of interest. These curves are typically nonlinear, and the deviations of each series from the corresponding curve are highly autocorrelated. In this scenario, we propose a procedure to compare the response curves for different groups at specific points in time. The method involves fitting the curves, performing potentially hundreds of serially correlated tests, and appropriately adjusting the overall alpha level of the tests. Our motivating application comes from psycholinguistics and the visual world paradigm. We describe how the proposed technique can be adapted to compare fixation curves within subjects as well as between groups. Our results lead to conclusions beyond the scope of previous analyses.
Collapse
Affiliation(s)
- Jacob J Oleson
- 1 Department of Biostatistics, The University of Iowa, Iowa City, Iowa, USA
| | - Joseph E Cavanaugh
- 1 Department of Biostatistics, The University of Iowa, Iowa City, Iowa, USA
| | - Bob McMurray
- 2 Department of Psychology, The University of Iowa, Iowa City, Iowa, USA
| | - Grant Brown
- 1 Department of Biostatistics, The University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
35
|
The development of voicing categories: a quantitative review of over 40 years of infant speech perception research. Psychon Bull Rev 2015; 21:884-906. [PMID: 24550074 DOI: 10.3758/s13423-013-0569-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Most research on infant speech categories has relied on measures of discrimination. Such work often employs categorical perception as a linking hypothesis to enable inferences about categorization on the basis of discrimination measures. However, a large number of studies with adults challenge the utility of categorical perception in describing adult speech perception, and this in turn calls into question how to interpret measures of infant speech discrimination. We propose here a parallel channels model of discrimination (built on Pisoni and Tash Perception & Psychophysics, 15(2), 285-290, 1974), which posits that both a noncategorical or veridical encoding of speech cues and category representations can simultaneously contribute to discrimination. This can thus produce categorical perception effects without positing any warping of the acoustic signal, but it also reframes how we think about infant discrimination and development. We test this model by conducting a quantitative review of 20 studies examining infants' discrimination of voice onset time contrasts. This review suggests that within-category discrimination is surprisingly prevalent even in classic studies and that, averaging across studies, discrimination is related to continuous acoustic distance. It also identifies several methodological factors that may mask our ability to see this. Finally, it suggests that infant discrimination may improve over development, contrary to commonly held notion of perceptual narrowing. These results are discussed in terms of theories of speech development that may require such continuous sensitivity.
Collapse
|
36
|
Toscano JC, McMurray B. The time-course of speaking rate compensation: Effects of sentential rate and vowel length on voicing judgments. LANGUAGE, COGNITION AND NEUROSCIENCE 2015; 30:529-543. [PMID: 25780801 PMCID: PMC4358767 DOI: 10.1080/23273798.2014.946427] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Many sources of context information in speech (such as speaking rate) occur either before or after the phonetic cues they influence, yet there is little work examining the time-course of these effects. Here, we investigate how listeners compensate for preceding sentence rate and subsequent vowel length (a secondary cue that has been used as a proxy for speaking rate) when categorizing words varying in voice-onset time (VOT). Participants selected visual objects in a display while their eye-movements were recorded, allowing us to examine when each source of information had an effect on lexical processing. We found that the effect of VOT preceded that of vowel length, suggesting that each cue is used as it becomes available. In a second experiment, we found that, in contrast, the effect of preceding sentence rate occurred simultaneously with VOT, suggesting that listeners interpret VOT relative to preceding rate.
Collapse
Affiliation(s)
- Joseph C Toscano
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, 405 N Mathews Ave, Urbana, IL 61801
| | - Bob McMurray
- Dept. of Psychology and Dept. of Communication Sciences & Disorders, University of Iowa, E11 Seashore Hall, Iowa City, IA 52242
| |
Collapse
|
37
|
Chodroff E, Wilson C. Burst spectrum as a cue for the stop voicing contrast in American English. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:2762-2772. [PMID: 25373976 DOI: 10.1121/1.4896470] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Voicing contrasts in stop consonants are expressed by a constellation of acoustic cues. This study focused on a spectral cue present at burst onset in American English labial and coronal stops. Spectral shape was examined for word-initial, prevocalic stops of all three places of articulation in a laboratory production study and a large corpus of continuous read speech. Voiceless labial and coronal stops were found to have greater energy at higher frequencies in comparison to homorganic voiced stops, a difference that could not be attributed to aspiration in the voiceless stops or modal phonation in the voiced, while no consistent effect was found for dorsal stops. This pattern was found with various methods of spectral estimation (time-averaged and multitaper spectra) and measures of spectral energy concentration (center of gravity and spectral peak) for both linear and auditorily based frequency scales. Perceptual relevance of the spectral cue was tested in laboratory and online experiments with continua created by crossing burst shape and voice onset time. A trading relation was observed such that voiceless identifications were more likely for tokens with higher frequency bursts. Goodness ratings indicated that burst spectrum influences category typicality for voiceless stops even when voice onset time is unambiguous.
Collapse
Affiliation(s)
- Eleanor Chodroff
- Department of Cognitive Science, Johns Hopkins University, 3400 North Charles Street, Baltimore, Maryland 21218
| | - Colin Wilson
- Department of Cognitive Science, Johns Hopkins University, 3400 North Charles Street, Baltimore, Maryland 21218
| |
Collapse
|
38
|
Lin M, Francis AL. Effects of language experience and expectations on attention to consonants and tones in English and Mandarin Chinese. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:2827-2838. [PMID: 25373982 DOI: 10.1121/1.4898047] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Both long-term native language experience and immediate linguistic expectations can affect listeners' use of acoustic information when making a phonetic decision. In this study, a Garner selective attention task was used to investigate differences in attention to consonants and tones by American English-speaking listeners (N = 20) and Mandarin Chinese-speaking listeners hearing speech in either American English (N = 17) or Mandarin Chinese (N = 20). To minimize the effects of lexical differences and differences in the linguistic status of pitch across the two languages, stimuli and response conditions were selected such that all tokens constitute legitimate words in both languages and all responses required listeners to make decisions that were linguistically meaningful in their native language. Results showed that regardless of ambient language, Chinese listeners processed consonant and tone in a combined manner, consistent with previous research. In contrast, English listeners treated tones and consonants as perceptually separable. Results are discussed in terms of the role of sub-phonemic differences in acoustic cues across language, and the linguistic status of consonants and pitch contours in the two languages.
Collapse
Affiliation(s)
- Mengxi Lin
- Linguistics Program, Purdue University, West Lafayette, Indiana 47907-2038
| | - Alexander L Francis
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907-2038
| |
Collapse
|
39
|
Winter B. Spoken language achieves robustness and evolvability by exploiting degeneracy and neutrality. Bioessays 2014; 36:960-7. [DOI: 10.1002/bies.201400028] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Bodo Winter
- Cognitive and Information Sciences; University of California; Merced CA USA
| |
Collapse
|
40
|
McMurray B, Munson C, Tomblin JB. Individual differences in language ability are related to variation in word recognition, not speech perception: evidence from eye movements. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2014; 57:1344-62. [PMID: 24687026 PMCID: PMC4126886 DOI: 10.1044/2014_jslhr-l-13-0196] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
PURPOSE The authors examined speech perception deficits associated with individual differences in language ability, contrasting auditory, phonological, or lexical accounts by asking whether lexical competition is differentially sensitive to fine-grained acoustic variation. METHOD Adolescents with a range of language abilities (N = 74, including 35 impaired) participated in an experiment based on McMurray, Tanenhaus, and Aslin (2002). Participants heard tokens from six 9-step voice onset time (VOT) continua spanning 2 words (beach/peach, beak/peak, etc.) while viewing a screen containing pictures of those words and 2 unrelated objects. Participants selected the referent while eye movements to each picture were monitored as a measure of lexical activation. Fixations were examined as a function of both VOT and language ability. RESULTS Eye movements were sensitive to within-category VOT differences: As VOT approached the boundary, listeners made more fixations to the competing word. This did not interact with language ability, suggesting that language impairment is not associated with differential auditory sensitivity or phonetic categorization. Listeners with poorer language skills showed heightened competitors fixations overall, suggesting a deficit in lexical processes. CONCLUSION Language impairment may be better characterized by a deficit in lexical competition (inability to suppress competing words), rather than differences in phonological categorization or auditory abilities.
Collapse
|
41
|
Abstract
In this review, we synthesize the existing literature demonstrating the dynamic interplay between conceptual knowledge and visual perceptual processing. We consider two theoretical frameworks that demonstrate interactions between processes and brain areas traditionally considered perceptual or conceptual. Specifically, we discuss categorical perception, in which visual objects are represented according to category membership, and highlight studies showing that category knowledge can penetrate early stages of visual analysis. We next discuss the embodied account of conceptual knowledge, which holds that concepts are instantiated in the same neural regions required for specific types of perception and action, and discuss the limitations of this framework. We additionally consider studies showing that gaining abstract semantic knowledge about objects and faces leads to behavioral and electrophysiological changes that are indicative of more efficient stimulus processing. Finally, we consider the role that perceiver goals and motivation may play in shaping the interaction between conceptual and perceptual processing. We hope to demonstrate how pervasive such interactions between motivation, conceptual knowledge, and perceptual processing are in our understanding of the visual environment, and to demonstrate the need for future research aimed at understanding how such interactions arise in the brain.
Collapse
Affiliation(s)
- Jessica A Collins
- Department of Psychology, Temple University, Weiss Hall, 1701 North 13th Street, Philadelphia, PA, 19122, USA,
| | | |
Collapse
|
42
|
Pothos EM, Reppa I. The fickle nature of similarity change as a result of categorization. Q J Exp Psychol (Hove) 2014; 67:2425-38. [PMID: 24902601 DOI: 10.1080/17470218.2014.931977] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Several researchers have reported that learning a particular categorization leads to compatible changes in the similarity structure of the categorized stimuli. The purpose of this study is to examine whether different category structures may lead to greater or less corresponding similarity change. We created six category structures and examined changes in similarity within categories or between categories, as a result of categorization, in between-participant conditions. The best supported hypothesis was that the ease of learning a categorization affects change in within-categories similarity, so that greater (within-categories) similarity change was observed for category structures that were harder to learn.
Collapse
|
43
|
Abstract
Listeners often categorize phonotactically illegal sequences (e.g., /dla/ in English) as phonemically similar legal ones (e.g., /gla/). In an earlier investigation of such an effect in Japanese, Dehaene-Lambertz, Dupoux, and Gout (2000) did not observe a mismatch negativity in response to deviant, illegal sequences, and therefore argued that phonotactics constrain early perceptual processing. In the present study, using a priming paradigm, we compared the event-related potentials elicited by Legal targets (e.g., /gla/) preceded by (1) phonemically distinct Control primes (e.g., /kla/), (2) different tokens of Identity primes (e.g., /gla/), and (3) phonotactically Illegal Test primes (e.g., /dla/). Targets elicited a larger positivity 200-350 ms after onset when preceded by Illegal Test primes or phonemically distinct Control primes, as compared to Identity primes. Later portions of the waveforms (350-600 ms) did not differ for targets preceded by Identity and Illegal Test primes, and the similarity ratings also did not differ in these conditions. These data support a model of speech perception in which veridical representations of phoneme sequences are not only generated during processing, but also are maintained in a manner that affects perceptual processing of subsequent speech sounds.
Collapse
|
44
|
McMurray B, Kovack-Lesh KA, Goodwin D, McEchron W. Infant directed speech and the development of speech perception: enhancing development or an unintended consequence? Cognition 2013; 129:362-78. [PMID: 23973465 PMCID: PMC3874452 DOI: 10.1016/j.cognition.2013.07.015] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2012] [Revised: 07/18/2013] [Accepted: 07/22/2013] [Indexed: 11/30/2022]
Abstract
Infant directed speech (IDS) is a speech register characterized by simpler sentences, a slower rate, and more variable prosody. Recent work has implicated it in more subtle aspects of language development. Kuhl et al. (1997) demonstrated that segmental cues for vowels are affected by IDS in a way that may enhance development: the average locations of the extreme "point" vowels (/a/, /i/ and /u/) are further apart in acoustic space. If infants learn speech categories, in part, from the statistical distributions of such cues, these changes may specifically enhance speech category learning. We revisited this by asking (1) if these findings extend to a new cue (Voice Onset Time, a cue for voicing); (2) whether they extend to the interior vowels which are much harder to learn and/or discriminate; and (3) whether these changes may be an unintended phonetic consequence of factors like speaking rate or prosodic changes associated with IDS. Eighteen caregivers were recorded reading a picture book including minimal pairs for voicing (e.g., beach/peach) and a variety of vowels to either an adult or their infant. Acoustic measurements suggested that VOT was different in IDS, but not in a way that necessarily supports better development, and that these changes are almost entirely due to slower rate of speech of IDS. Measurements of the vowel suggested that in addition to changes in the mean, there was also an increase in variance, and statistical modeling suggests that this may counteract the benefit of any expansion of the vowel space. As a whole this suggests that changes in segmental cues associated with IDS may be an unintended by-product of the slower rate of speech and different prosodic structure, and do not necessarily derive from a motivation to enhance development.
Collapse
Affiliation(s)
- Bob McMurray
- Dept. of Psychology, University of Iowa, United States; Dept. of Communication Sciences and Disorders, University of Iowa, United States; Dept. of Linguistics, University of Iowa, United States; The Delta Center, University of Iowa, United States.
| | | | | | | |
Collapse
|
45
|
Farris-Trimble A, McMurray B. Test-retest reliability of eye tracking in the visual world paradigm for the study of real-time spoken word recognition. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2013; 56:1328-45. [PMID: 23926331 PMCID: PMC3875834 DOI: 10.1044/1092-4388(2012/12-0145)] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
PURPOSE Researchers have begun to use eye tracking in the visual world paradigm (VWP) to study clinical differences in language processing, but the reliability of such laboratory tests has rarely been assessed. In this article, the authors assess test-retest reliability of the VWP for spoken word recognition. Methods Participants performed an auditory VWP task in repeated sessions and a visual-only VWP task in a third session. The authors performed correlation and regression analyses on several parameters to determine which reflect reliable behavior and which are predictive of behavior in later sessions. RESULTS Results showed that the fixation parameters most closely related to timing and degree of fixations were moderately-to-strongly correlated across days, whereas the parameters related to rate of increase or decrease of fixations to particular items were less strongly correlated. Moreover, when including factors derived from the visual-only task, the performance of the regression model was at least moderately correlated with Day 2 performance on all parameters ( R > .30). CONCLUSION The VWP is stable enough (with some caveats) to serve as an individual measure. These findings suggest guidelines for future use of the paradigm and for areas of improvement in both methodology and analysis.
Collapse
|
46
|
Cue-integration and context effects in speech: evidence against speaking-rate normalization. Atten Percept Psychophys 2012; 74:1284-301. [PMID: 22532385 DOI: 10.3758/s13414-012-0306-z] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Listeners are able to accurately recognize speech despite variation in acoustic cues across contexts, such as different speaking rates. Previous work has suggested that listeners use rate information (indicated by vowel length; VL) to modify their use of context-dependent acoustic cues, like voice-onset time (VOT), a primary cue to voicing. We present several experiments and simulations that offer an alternative explanation: that listeners treat VL as a phonetic cue rather than as an indicator of speaking rate, and that they rely on general cue-integration principles to combine information from VOT and VL. We demonstrate that listeners use the two cues independently, that VL is used in both naturally produced and synthetic speech, and that the effects of stimulus naturalness can be explained by a cue-integration model. Together, these results suggest that listeners do not interpret VOT relative to rate information provided by VL and that the effects of speaking rate can be explained by more general cue-integration principles.
Collapse
|
47
|
|
48
|
Trude AM, Brown-Schmidt S. Talker-specific perceptual adaptation during online speech perception. ACTA ACUST UNITED AC 2012. [DOI: 10.1080/01690965.2011.597153] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
49
|
Noordenbos MW, Segers E, Serniclaes W, Mitterer H, Verhoeven L. Allophonic mode of speech perception in Dutch children at risk for dyslexia: a longitudinal study. RESEARCH IN DEVELOPMENTAL DISABILITIES 2012; 33:1469-1483. [PMID: 22522205 DOI: 10.1016/j.ridd.2012.03.021] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Revised: 03/06/2012] [Accepted: 03/06/2012] [Indexed: 05/31/2023]
Abstract
There is ample evidence that individuals with dyslexia have a phonological deficit. A growing body of research also suggests that individuals with dyslexia have problems with categorical perception, as evidenced by weaker discrimination of between-category differences and better discrimination of within-category differences compared to average readers. Whether the categorical perception problems of individuals with dyslexia are a result of their reading problems or a cause has yet to be determined. Whether the observed perception deficit relates to a more general auditory deficit or is specific to speech also has yet to be determined. To shed more light on these issues, the categorical perception abilities of children at risk for dyslexia and chronological age controls were investigated before and after the onset of formal reading instruction in a longitudinal study. Both identification and discrimination data were collected using identical paradigms for speech and non-speech stimuli. Results showed the children at risk for dyslexia to shift from an allophonic mode of perception in kindergarten to a phonemic mode of perception in first grade, while the control group showed a phonemic mode already in kindergarten. The children at risk for dyslexia thus showed an allophonic perception deficit in kindergarten, which was later suppressed by phonemic perception as a result of formal reading instruction in first grade; allophonic perception in kindergarten can thus be treated as a clinical marker for the possibility of later reading problems.
Collapse
Affiliation(s)
- M W Noordenbos
- Behavioural Science Institute, Radboud University Nijmegen, Nijmegen, The Netherlands.
| | | | | | | | | |
Collapse
|
50
|
Lupyan G, Mirman D, Hamilton R, Thompson-Schill SL. Categorization is modulated by transcranial direct current stimulation over left prefrontal cortex. Cognition 2012; 124:36-49. [PMID: 22578885 PMCID: PMC4114054 DOI: 10.1016/j.cognition.2012.04.002] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Revised: 04/03/2012] [Accepted: 04/04/2012] [Indexed: 11/23/2022]
Abstract
Humans have an unparalleled ability to represent objects as members of multiple categories. A given object, such as a pillow may be-depending on current task demands-represented as an instance of something that is soft, as something that contains feathers, as something that is found in bedrooms, or something that is larger than a toaster. This type of processing requires the individual to dynamically highlight task-relevant properties and abstract over or suppress object properties that, although salient, are not relevant to the task at hand. Neuroimaging and neuropsychological evidence suggests that this ability may depend on cognitive control processes associated with the left inferior prefrontal gyrus. Here, we show that stimulating the left inferior frontal cortex using transcranial direct current stimulation alters performance of healthy subjects on a simple categorization task. Our task required subjects to select pictures matching a description, e.g., "click on all the round things." Cathodal stimulation led to poorer performance on classification trials requiring attention to specific dimensions such as color or shape as opposed to trials that required selecting items belonging to a more thematic category such as objects that hold water. A polarity reversal (anodal stimulation) lowered the threshold for selecting items that were more weakly associated with the target category. These results illustrate the role of frontally-mediated control processes in categorization and suggest potential interactions between categorization, cognitive control, and language.
Collapse
Affiliation(s)
- Gary Lupyan
- Department of Psychology, University of Wisconsin-Madison, 1202 W. Johnson St., Madison, WI 53706, United States.
| | | | | | | |
Collapse
|