1
|
Viswanathan N, Rinzler A, Kelty-Stephen DG. Compensation for coarticulation despite a midway speaker change: Reassessing effects and implications. PLoS One 2024; 19:e0291992. [PMID: 38215074 PMCID: PMC10786362 DOI: 10.1371/journal.pone.0291992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 09/11/2023] [Indexed: 01/14/2024] Open
Abstract
Accounts of speech perception disagree on how listeners demonstrate perceptual constancy despite considerable variation in the speech signal due to speakers' coarticulation. According to the spectral contrast account, listeners' compensation for coarticulation (CfC) results from listeners perceiving the target-segment frequencies differently depending on the contrastive effects exerted by the preceding sound's frequencies. In this study, we reexamine a notable finding that listeners apparently demonstrate perceptual adjustments to coarticulation even when the identity of the speaker (i.e., the "source") changes midway between speech segments. We evaluated these apparent across-talker CfC effects on the rationale that such adjustments to coarticulation would likely be maladaptive for perceiving speech in multi-talker settings. In addition, we evaluated whether such cross-talker adaptations, if detected, were modulated by prior experience. We did so by manipulating the exposure phase of three groups of listeners by (a) merely exposing them to our stimuli (b) explicitly alerting them to talker change or (c) implicitly alerting them to this change. All groups then completed identical test blocks in which we assessed their CfC patterns in within- and across-talker conditions. Our results uniformly demonstrated that, while all three groups showed robust CfC shifts in the within-talker conditions, no such shifts were detected in the across-talker condition. Our results call into question a speaker-neutral explanation for CfC. Broadly, this demonstrates the need to carefully examine the perceptual demands placed on listeners in constrained experimental tasks and to evaluate whether the accounts that derive from such settings scale up to the demands of real-world listening.
Collapse
Affiliation(s)
- Navin Viswanathan
- Department of Communication Sciences & Disorders, The Pennsylvania State University, State College, Pennsylvania, United States of America
- Haskins Laboratories, New Haven, Connecticut, United States of America
| | - Ana Rinzler
- Department of Psychology, Rutgers University, New Brunswick, New Jersey, United States of America
- Department of Psychology, State University of New York-New Paltz, New Paltz, New York, United States of America
| | - Damian G. Kelty-Stephen
- Department of Psychology, State University of New York-New Paltz, New Paltz, New York, United States of America
| |
Collapse
|
2
|
Baykan C, Zhu X, Allenmark F, Shi Z. Influences of temporal order in temporal reproduction. Psychon Bull Rev 2023; 30:2210-2218. [PMID: 37291447 PMCID: PMC10728249 DOI: 10.3758/s13423-023-02310-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/10/2023]
Abstract
Despite the crucial role of complex temporal sequences, such as speech and music, in our everyday lives, our ability to acquire and reproduce these patterns is prone to various contextual biases. In this study, we examined how the temporal order of auditory sequences affects temporal reproduction. Participants were asked to reproduce accelerating, decelerating or random sequences, each consisting of four intervals, by tapping their fingers. Our results showed that the reproduction and the reproduction variability were influenced by the sequential structure and interval orders. The mean reproduced interval was assimilated by the first interval of the sequence, with the lowest mean for decelerating and the highest for accelerating sequences. Additionally, the central tendency bias was affected by the volatility and the last interval of the sequence, resulting in a stronger central tendency in the random and decelerating sequences than the accelerating sequence. Using Bayesian integration between the ensemble mean of the sequence and individual durations and considering the perceptual uncertainty associated with the sequential structure and position, we were able to accurately predict the behavioral results. The findings highlight the critical role of the temporal order of a sequence in temporal pattern reproduction, with the first interval exerting greater influence on mean reproduction and the volatility and the last interval contributing to the perceptual uncertainty of individual intervals and the central tendency bias.
Collapse
Affiliation(s)
- Cemre Baykan
- General and Experimental Psychology, Department of Psychology, Ludwig Maximilian University of Munich, 80802, Munich, Germany.
| | - Xiuna Zhu
- General and Experimental Psychology, Department of Psychology, Ludwig Maximilian University of Munich, 80802, Munich, Germany
| | - Fredrik Allenmark
- General and Experimental Psychology, Department of Psychology, Ludwig Maximilian University of Munich, 80802, Munich, Germany
| | - Zhuanghua Shi
- General and Experimental Psychology, Department of Psychology, Ludwig Maximilian University of Munich, 80802, Munich, Germany
| |
Collapse
|
3
|
Liu W, Wang T, Huang X. The influences of forward context on stop-consonant perception: The combined effects of contrast and acoustic cue activation? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:1903-1920. [PMID: 37756574 DOI: 10.1121/10.0021077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 09/06/2023] [Indexed: 09/29/2023]
Abstract
The perception of the /da/-/ga/ series, distinguished primarily by the third formant (F3) transition, is affected by many nonspeech and speech sounds. Previous studies mainly investigated the influences of context stimuli with frequency bands located in the F3 region and proposed the account of spectral contrast effects. This study examined the effects of context stimuli with bands not in the F3 region. The results revealed that these non-F3-region stimuli (whether with bands higher or lower than the F3 region) mainly facilitated the identification of /ga/; for example, the stimuli (including frequency-modulated glides, sine-wave tones, filtered sentences, and natural vowels) in the low-frequency band (500-1500 Hz) led to more /ga/ responses than those in the low-F3 region (1500-2500 Hz). It is suggested that in the F3 region, context stimuli may act through spectral contrast effects, while in non-F3 regions, context stimuli might activate the acoustic cues of /g/ and further facilitate the identification of /ga/. The combination of contrast and acoustic cue effects can explain more results concerning the forward context influences on the perception of the /da/-/ga/ series, including the effects of non-F3-region stimuli and the imbalanced influences of context stimuli on /da/ and /ga/ perception.
Collapse
Affiliation(s)
- Wenli Liu
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, 38 Tongshuo Road, Tianjin 300350, China
| | - Tianyu Wang
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, 38 Tongshuo Road, Tianjin 300350, China
| | - Xianjun Huang
- School of Psychology, Capital Normal University, 105 North West 3rd Ring Road, Beijing 100048, China
| |
Collapse
|
4
|
Chen F, Zhang K, Guo Q, Lv J. Development of Achieving Constancy in Lexical Tone Identification With Contextual Cues. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1148-1164. [PMID: 36995907 DOI: 10.1044/2022_jslhr-22-00257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE The aim of this study was to explore when and how Mandarin-speaking children use contextual cues to normalize speech variability in perceiving lexical tones. Two different cognitive mechanisms underlying speech normalization (lower level acoustic normalization and higher level acoustic-phonemic normalization) were investigated through the lexical tone identification task in nonspeech contexts and speech contexts, respectively. Besides, another aim of this study was to reveal how domain-general cognitive abilities contribute to the development of the speech normalization process. METHOD In this study, 94 five- to eight-year-old Mandarin-speaking children (50 boys, 44 girls) and 24 young adults (14 men, 10 women) were asked to identify ambiguous Mandarin high-level and mid-rising tones in either speech or nonspeech contexts. Furthermore, in this study, we tested participants' pitch sensitivity through a nonlinguistic pitch discrimination task and their working memory using the digit span task. RESULTS Higher level acoustic-phonemic normalization of lexical tones emerged at the age of 6 years and was relatively stable thereafter. However, lower level acoustic normalization was less stable across different ages. Neither pitch sensitivity nor working memory affected children's lexical tone normalization. CONCLUSIONS Mandarin-speaking children above 6 years of age successfully achieved constancy in lexical tone normalization based on speech contextual cues. The perceptual normalization of lexical tones was not affected by pitch sensitivity and working memory capacity.
Collapse
Affiliation(s)
- Fei Chen
- School of Foreign Languages, Hunan University, Changsha, China
| | - Kaile Zhang
- Centre for Cognitive and Brain Sciences, University of Macau, China
| | - Qingqing Guo
- School of Foreign Languages, Hunan University, Changsha, China
| | - Jia Lv
- School of Foreign Languages and Literature, Wuhan University, China
| |
Collapse
|
5
|
Shorey AE, Stilp CE. Short-term, not long-term, average spectra of preceding sentences bias consonant categorization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2426. [PMID: 37092945 PMCID: PMC10119874 DOI: 10.1121/10.0017862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 03/31/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
Speech sound perception is influenced by the spectral properties of surrounding sounds. For example, listeners perceive /g/ (lower F3 onset) more often after sounds with prominent high-F3 frequencies and perceive /d/ (higher F3 onset) more often after sounds with prominent low-F3 frequencies. These biases are known as spectral contrast effects (SCEs). Much of this work examined differences between long-term average spectra (LTAS) of preceding sounds and target speech sounds. Post hoc analyses by Stilp and Assgari [(2021) Atten. Percept. Psychophys. 83(6) 2694-2708] revealed that spectra of the last 475 ms of precursor sentences, not the entire LTAS, best predicted biases in consonant categorization. Here, the influences of proximal (last 500 ms) versus distal (before the last 500 ms) portions of precursor sentences on subsequent consonant categorization were compared. Sentences emphasized different frequency regions in each temporal window (e.g., distal low-F3 emphasis, proximal high-F3 emphasis, and vice versa) naturally or via filtering. In both cases, shifts in consonant categorization were produced in accordance with spectral properties of the proximal window. This was replicated when the distal window did not emphasize either frequency region, but the proximal window did. Results endorse closer consideration of patterns of spectral energy over time in preceding sounds, not just their LTAS.
Collapse
Affiliation(s)
- Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
6
|
Apfelbaum KS, Kutlu E, McMurray B, Kapnoula EC. Don't force it! Gradient speech categorization calls for continuous categorization tasks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3728. [PMID: 36586841 PMCID: PMC9894657 DOI: 10.1121/10.0015201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 09/12/2022] [Accepted: 10/20/2022] [Indexed: 05/29/2023]
Abstract
Research on speech categorization and phoneme recognition has relied heavily on tasks in which participants listen to stimuli from a speech continuum and are asked to either classify each stimulus (identification) or discriminate between them (discrimination). Such tasks rest on assumptions about how perception maps onto discrete responses that have not been thoroughly investigated. Here, we identify critical challenges in the link between these tasks and theories of speech categorization. In particular, we show that patterns that have traditionally been linked to categorical perception could arise despite continuous underlying perception and that patterns that run counter to categorical perception could arise despite underlying categorical perception. We describe an alternative measure of speech perception using a visual analog scale that better differentiates between processes at play in speech categorization, and we review some recent findings that show how this task can be used to better inform our theories.
Collapse
Affiliation(s)
- Keith S Apfelbaum
- Department of Psychological and Brain Sciences, G60 Psychological and Brain Sciences Building, University of Iowa, Iowa City, Iowa 52242-1407, USA
| | - Ethan Kutlu
- Department of Psychological and Brain Sciences, G60 Psychological and Brain Sciences Building, University of Iowa, Iowa City, Iowa 52242-1407, USA
| | - Bob McMurray
- Department of Psychological and Brain Sciences, G60 Psychological and Brain Sciences Building, University of Iowa, Iowa City, Iowa 52242-1407, USA
| | - Efthymia C Kapnoula
- BCBL, Basque Center on Cognition, Brain and Language, Mikeletegi 69, 20009 Donostia, Spain
| |
Collapse
|
7
|
Stilp CE, Shorey AE, King CJ. Nonspeech sounds are not all equally good at being nonspeech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1842. [PMID: 36182316 DOI: 10.1121/10.0014174] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 08/30/2022] [Indexed: 06/16/2023]
Abstract
Perception of speech sounds has a long history of being compared to perception of nonspeech sounds, with rich and enduring debates regarding how closely they share similar underlying processes. In many instances, perception of nonspeech sounds is directly compared to that of speech sounds without a clear explanation of how related these sounds are to the speech they are selected to mirror (or not mirror). While the extreme acoustic variability of speech sounds is well documented, this variability is bounded by the common source of a human vocal tract. Nonspeech sounds do not share a common source, and as such, exhibit even greater acoustic variability than that observed for speech. This increased variability raises important questions about how well perception of a given nonspeech sound might resemble or model perception of speech sounds. Here, we offer a brief review of extremely diverse nonspeech stimuli that have been used in the efforts to better understand perception of speech sounds. The review is organized according to increasing spectrotemporal complexity: random noise, pure tones, multitone complexes, environmental sounds, music, speech excerpts that are not recognized as speech, and sinewave speech. Considerations are offered for stimulus selection in nonspeech perception experiments moving forward.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Caleb J King
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
8
|
Si C, Zhang C, Lau P, Yang Y, Li B. Modelling representations in speech normalization of prosodic cues. Sci Rep 2022; 12:14635. [PMID: 36030274 PMCID: PMC9420126 DOI: 10.1038/s41598-022-18838-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 08/22/2022] [Indexed: 12/02/2022] Open
Abstract
The lack of invariance problem in speech perception refers to a fundamental problem of how listeners deal with differences of speech sounds produced by various speakers. The current study is the first to test the contributions of mentally stored distributional information in normalization of prosodic cues. This study starts out by modelling distributions of acoustic cues from a speech corpus. We proceeded to conduct three experiments using both naturally produced lexical tones with estimated distributions and manipulated lexical tones with f0 values generated from simulated distributions. State of the art statistical techniques have been used to examine the effects of distribution parameters in normalization and identification curves with respect to each parameter. Based on the significant effects of distribution parameters, we proposed a probabilistic parametric representation (PPR), integrating knowledge from previously established distributions of speakers with their indexical information. PPR is still accessed during speech perception even when contextual information is present. We also discussed the procedure of normalization of speech signals produced by unfamiliar talker with and without contexts and the access of long-term stored representations.
Collapse
Affiliation(s)
- Chen Si
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China. .,Hong Kong Polytechnic University-Peking University Research Centre on Chinese Linguistics, Kowloon, Hong Kong SAR, China. .,Research Centre for Language, Cognition, and Neuroscience, University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China.
| | - Caicai Zhang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China.,Hong Kong Polytechnic University-Peking University Research Centre on Chinese Linguistics, Kowloon, Hong Kong SAR, China.,Research Centre for Language, Cognition, and Neuroscience, University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Puiyin Lau
- Department of Statistics and Actuarial Science, University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Yike Yang
- Department of Chinese Language and Literature, Hong Kong Shue Yan University, North Point, Hong Kong SAR, China
| | - Bei Li
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
| |
Collapse
|
9
|
Reese H, Reinisch E. Cognitive load does not increase reliance on speaker information in phonetic categorization. JASA EXPRESS LETTERS 2022; 2:055203. [PMID: 36154071 DOI: 10.1121/10.0009895] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Past research on speech perception has shown that speaker information, such as gender, affects phoneme categorization. Additionally, studies on listening under divided attention have argued that cognitive load decreases attention to phonetic detail and increases reliance on higher-level cues such as lexical information. This study examines the processing of speaker information under divided attention. The results of two perception experiments indicate that additional cognitive load does not increase listeners' reliance on the gender of the speaker during phoneme categorization tasks. This suggests that the processing of speaker information may pattern with lower-level acoustic rather than higher-level lexical information.
Collapse
Affiliation(s)
- Helen Reese
- Acoustics Research Institute, Austrian Academy of Sciences, Vienna, 1040, Austria ,
| | - Eva Reinisch
- Acoustics Research Institute, Austrian Academy of Sciences, Vienna, 1040, Austria ,
| |
Collapse
|
10
|
Haigh SM, Brosseau P, Eack SM, Leitman DI, Salisbury DF, Behrmann M. Hyper-Sensitivity to Pitch and Poorer Prosody Processing in Adults With Autism: An ERP Study. Front Psychiatry 2022; 13:844830. [PMID: 35693971 PMCID: PMC9174755 DOI: 10.3389/fpsyt.2022.844830] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 04/20/2022] [Indexed: 01/30/2023] Open
Abstract
Individuals with autism typically experience a range of symptoms, including abnormal sensory sensitivities. However, there are conflicting reports on the sensory profiles that characterize the sensory experience in autism that often depend on the type of stimulus. Here, we examine early auditory processing to simple changes in pitch and later auditory processing of more complex emotional utterances. We measured electroencephalography in 24 adults with autism and 28 controls. First, tones (1046.5Hz/C6, 1108.7Hz/C#6, or 1244.5Hz/D#6) were repeated three times or nine times before the pitch changed. Second, utterances of delight or frustration were repeated three or six times before the emotion changed. In response to the simple pitched tones, the autism group exhibited larger mismatch negativity (MMN) after nine standards compared to controls and produced greater trial-to-trial variability (TTV). In response to the prosodic utterances, the autism group showed smaller P3 responses when delight changed to frustration compared to controls. There was no significant correlation between ERPs to pitch and ERPs to prosody. Together, this suggests that early auditory processing is hyper-sensitive in autism whereas later processing of prosodic information is hypo-sensitive. The impact the different sensory profiles have on perceptual experience in autism may be key to identifying behavioral treatments to reduce symptoms.
Collapse
Affiliation(s)
- Sarah M Haigh
- Department of Psychology and Institute for Neuroscience, University of Nevada, Reno, NV, United States.,Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Pat Brosseau
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Shaun M Eack
- School of Social Work, University of Pittsburgh, Pittsburgh, PA, United States
| | - David I Leitman
- Division of Translational Research, National Institute of Mental Health, Bethesda, MD, United States
| | - Dean F Salisbury
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Marlene Behrmann
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
11
|
Zhang K, Peng G. The time course of normalizing speech variability in vowels. BRAIN AND LANGUAGE 2021; 222:105028. [PMID: 34597904 DOI: 10.1016/j.bandl.2021.105028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Revised: 07/21/2021] [Accepted: 09/08/2021] [Indexed: 06/13/2023]
Abstract
To achieve perceptual constancy, listeners utilize contextual cues to normalize speech variabilities in speakers. The present study tested the time course of this cognitive process with an event-related potential (ERP) experiment. The first neurophysiological evidence of speech normalization is observed in P2 (130-250 ms), which is functionally related to phonetic and phonological processes. Furthermore, the normalization process was found to ease lexical retrieval, as indexed by smaller N400 (350-470 ms) after larger P2. A cross-language vowel perception task was carried out to further specify whether normalization was processed in the phonetic and/or phonological stage(s). It was found that both phonetic and phonological cues in the speech context contributed to vowel normalization. The results suggest that vowel normalization in the speech context can be observed in the P2 time window and largely overlaps with phonetic and phonological processes.
Collapse
Affiliation(s)
- Kaile Zhang
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region.
| | - Gang Peng
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region; Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Boulevard, Shenzhen 518055, China.
| |
Collapse
|
12
|
Tao R, Zhang K, Peng G. Music Does Not Facilitate Lexical Tone Normalization: A Speech-Specific Perceptual Process. Front Psychol 2021; 12:717110. [PMID: 34777097 PMCID: PMC8585521 DOI: 10.3389/fpsyg.2021.717110] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 09/30/2021] [Indexed: 11/13/2022] Open
Abstract
Listeners utilize the immediate contexts to efficiently normalize variable vocal streams into standard phonology units. However, researchers debated whether non-speech contexts can also serve as valid clues for speech normalization. Supporters of the two sides proposed a general-auditory hypothesis and a speech-specific hypothesis to explain the underlying mechanisms. A possible confounding factor of this inconsistency is the listeners' perceptual familiarity of the contexts, as the non-speech contexts were perceptually unfamiliar to listeners. In this study, we examined this confounding factor by recruiting a group of native Cantonese speakers with sufficient musical training experience and a control group with minimal musical training. Participants performed lexical tone judgment tasks in three contextual conditions, i.e., speech, non-speech, and music context conditions. Both groups were familiar with the speech context and not familiar with the non-speech context. The musician group was more familiar with the music context than the non-musician group. The results evidenced the lexical tone normalization process in speech context but not non-speech nor music contexts. More importantly, musicians did not outperform non-musicians on any contextual conditions even if the musicians were experienced at pitch perception, indicating that there is no noticeable transfer in pitch perception from the music domain to the linguistic domain for tonal language speakers. The findings showed that even high familiarity with a non-linguistic context cannot elicit an effective lexical tone normalization process, supporting the speech-specific basis of the perceptual normalization process.
Collapse
Affiliation(s)
| | | | - Gang Peng
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
| |
Collapse
|
13
|
Stilp CE. Parameterizing spectral contrast effects in vowel categorization using noise contexts. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2806. [PMID: 34717452 DOI: 10.1121/10.0006657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 09/18/2021] [Indexed: 06/13/2023]
Abstract
When spectra differ between earlier (context) and later (target) sounds, listeners perceive larger spectral changes than are physically present. When context sounds (e.g., a sentence) possess relatively higher frequencies, the target sound (e.g., a vowel sound) is perceived as possessing relatively lower frequencies, and vice versa. These spectral contrast effects (SCEs) are pervasive in auditory perception, but studies traditionally employed contexts with high spectrotemporal variability that made it difficult to understand exactly when context spectral properties biased perception. Here, contexts were speech-shaped noise divided into four consecutive 500-ms epochs. Contexts were filtered to amplify low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequencies to encourage target perception of /ɛ/ ("bet") or /ɪ/ ("bit"), respectively, via SCEs. Spectral peaks in the context ranged from its initial epoch(s) to its entire duration (onset paradigm), ranged from its final epoch(s) to its entire duration (offset paradigm), or were present for only one epoch (single paradigm). SCE magnitudes increased as spectral-peak durations increased and/or occurred later in the context (closer to the target). Contrary to predictions, brief early spectral peaks still biased subsequent target categorization. Results are compared to related experiments using speech contexts, and physiological and/or psychoacoustic idiosyncrasies of the noise contexts are considered.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, 317 Life Sciences Building, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
14
|
Contributions of natural signal statistics to spectral context effects in consonant categorization. Atten Percept Psychophys 2021; 83:2694-2708. [PMID: 33987821 DOI: 10.3758/s13414-021-02310-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2021] [Indexed: 11/08/2022]
Abstract
Speech perception, like all perception, takes place in context. Recognition of a given speech sound is influenced by the acoustic properties of surrounding sounds. When the spectral composition of earlier (context) sounds (e.g., a sentence with more energy at lower third formant [F3] frequencies) differs from that of a later (target) sound (e.g., consonant with intermediate F3 onset frequency), the auditory system magnifies this difference, biasing target categorization (e.g., towards higher-F3-onset /d/). Historically, these studies used filters to force context stimuli to possess certain spectral compositions. Recently, these effects were produced using unfiltered context sounds that already possessed the desired spectral compositions (Stilp & Assgari, 2019, Attention, Perception, & Psychophysics, 81, 2037-2052). Here, this natural signal statistics approach is extended to consonant categorization (/g/-/d/). Context sentences were either unfiltered (already possessing the desired spectral composition) or filtered (to imbue specific spectral characteristics). Long-term spectral characteristics of unfiltered contexts were poor predictors of shifts in consonant categorization, but short-term characteristics (last 475 ms) were excellent predictors. This diverges from vowel data, where long-term and shorter-term intervals (last 1,000 ms) were equally strong predictors. Thus, time scale plays a critical role in how listeners attune to signal statistics in the acoustic environment.
Collapse
|
15
|
Brown M, Tanenhaus MK, Dilley L. Syllable Inference as a Mechanism for Spoken Language Understanding. Top Cogn Sci 2021; 13:351-398. [PMID: 33780156 DOI: 10.1111/tops.12529] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 02/24/2021] [Accepted: 02/25/2021] [Indexed: 01/25/2023]
Abstract
A classic problem in spoken language comprehension is how listeners perceive speech as being composed of discrete words, given the variable time-course of information in continuous signals. We propose a syllable inference account of spoken word recognition and segmentation, according to which alternative hierarchical models of syllables, words, and phonemes are dynamically posited, which are expected to maximally predict incoming sensory input. Generative models are combined with current estimates of context speech rate drawn from neural oscillatory dynamics, which are sensitive to amplitude rises. Over time, models which result in local minima in error between predicted and recently experienced signals give rise to perceptions of hearing words. Three experiments using the visual world eye-tracking paradigm with a picture-selection task tested hypotheses motivated by this framework. Materials were sentences that were acoustically ambiguous in numbers of syllables, words, and phonemes they contained (cf. English plural constructions, such as "saw (a) raccoon(s) swimming," which have two loci of grammatical information). Time-compressing, or expanding, speech materials permitted determination of how temporal information at, or in the context of, each locus affected looks to, and selection of, pictures with a singular or plural referent (e.g., one or more than one raccoon). Supporting our account, listeners probabilistically interpreted identical chunks of speech as consistent with a singular or plural referent to a degree that was based on the chunk's gradient rate in relation to its context. We interpret these results as evidence that arriving temporal information, judged in relation to language model predictions generated from context speech rate evaluated on a continuous scale, informs inferences about syllables, thereby giving rise to perceptual experiences of understanding spoken language as words separated in time.
Collapse
Affiliation(s)
- Meredith Brown
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, USA.,Department of Psychiatry and Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts, USA.,Department of Psychology, Tufts University, Medford, Massachusetts, USA
| | - Michael K Tanenhaus
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, USA.,School of Psychology, Nanjing Normal University, Nanjing, China
| | - Laura Dilley
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
16
|
Richards VM, Tisby MK, Suzuki-Gill EN, Shen Y. Sub-optimal construction of an auditory profile from temporally distributed spectral information. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1567. [PMID: 33765831 PMCID: PMC7943247 DOI: 10.1121/10.0003646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 02/12/2021] [Accepted: 02/16/2021] [Indexed: 06/12/2023]
Abstract
When spectral components of a complex sound are presented not simultaneously but distributed over time, human listeners can still, to a degree, perceptually recover the spectral profile of the sound. This capability of integrating spectral information over time was investigated using a cued informational masking paradigm. Listeners detected a 1-kHz pure tone in a simultaneous masker composed of six random-frequency tones drawn on every trial. The spectral profile of the masker was cued using a precursor sound that consisted of a sequence of 50-ms bursts, separated by inter-burst intervals of 100 ms. Each burst in the precursor consisted of pure tones at the masker frequencies with tones appearing at each of the masker frequencies at different presentation probabilities. As the presentation probability increased in different conditions, the detectability of the target improved, indicating reliable precursor cuing regarding the spectral content of the masker. For many listeners, performance did not significantly improve as the number of precursor bursts increased from 2 to 16, indicating inefficient integration of information beyond 2 bursts. Additional analyses suggest that when intensity of the bursts is relatively constant, the contribution of the precursor is dominated by information in the initial burst.
Collapse
Affiliation(s)
- Virginia M Richards
- Department of Cognitive Sciences, University of California, Irvine, California 92687, USA
| | - Mariel Kazuko Tisby
- Department of Cognitive Sciences, University of California, Irvine, California 92687, USA
| | - Eli N Suzuki-Gill
- Department of Cognitive Sciences, University of California, Irvine, California 92687, USA
| | - Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington 98105, USA
| |
Collapse
|
17
|
Bosker HR, Sjerps MJ, Reinisch E. Spectral contrast effects are modulated by selective attention in "cocktail party" settings. Atten Percept Psychophys 2020; 82:1318-1332. [PMID: 31338824 PMCID: PMC7303055 DOI: 10.3758/s13414-019-01824-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Speech sounds are perceived relative to spectral properties of surrounding speech. For instance, target words that are ambiguous between /bɪt/ (with low F1) and /bɛt/ (with high F1) are more likely to be perceived as "bet" after a "low F1" sentence, but as "bit" after a "high F1" sentence. However, it is unclear how these spectral contrast effects (SCEs) operate in multi-talker listening conditions. Recently, Feng and Oxenham (J.Exp.Psychol.-Hum.Percept.Perform. 44(9), 1447-1457, 2018b) reported that selective attention affected SCEs to a small degree, using two simultaneously presented sentences produced by a single talker. The present study assessed the role of selective attention in more naturalistic "cocktail party" settings, with 200 lexically unique sentences, 20 target words, and different talkers. Results indicate that selective attention to one talker in one ear (while ignoring another talker in the other ear) modulates SCEs in such a way that only the spectral properties of the attended talker influences target perception. However, SCEs were much smaller in multi-talker settings (Experiment 2) than those in single-talker settings (Experiment 1). Therefore, the influence of SCEs on speech comprehension in more naturalistic settings (i.e., with competing talkers) may be smaller than estimated based on studies without competing talkers.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands.
| | - Matthias J Sjerps
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Eva Reinisch
- Institute of Phonetics and Speech Processing, Ludwig Maximilian University Munich, Munich, Germany
- Institute of General Linguistics, Ludwig Maximilian University Munich, Munich, Germany
| |
Collapse
|
18
|
Stilp CE. Evaluating peripheral versus central contributions to spectral context effects in speech perception. Hear Res 2020; 392:107983. [PMID: 32464456 DOI: 10.1016/j.heares.2020.107983] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 04/07/2020] [Accepted: 04/28/2020] [Indexed: 11/27/2022]
Abstract
Perception of a sound is influenced by spectral properties of surrounding sounds. When frequencies are absent in a preceding acoustic context before being introduced in a subsequent target sound, detection of those frequencies is facilitated via an auditory enhancement effect (EE). When spectral composition differs across a preceding context and subsequent target sound, those differences are perceptually magnified and perception shifts via a spectral contrast effect (SCE). Each effect is thought to receive contributions from peripheral and central neural processing, but the relative contributions are unclear. The present experiments manipulated ear of presentation to elucidate the degrees to which peripheral and central processes contributed to each effect in speech perception. In Experiment 1, EE and SCE magnitudes in consonant categorization were substantially diminished through contralateral presentation of contexts and targets compared to ipsilateral or bilateral presentations. In Experiment 2, spectrally complementary contexts were presented dichotically followed by the target in only one ear. This arrangement was predicted to produce context effects peripherally and cancel them centrally, but the competing contralateral context minimally decreased effect magnitudes. Results confirm peripheral and central contributions to EEs and SCEs in speech perception, but both effects appear to be primarily due to peripheral processing.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, KY, 40292, USA.
| |
Collapse
|
19
|
Chodroff E, Wilson C. Acoustic-phonetic and auditory mechanisms of adaptation in the perception of sibilant fricatives. Atten Percept Psychophys 2020; 82:2027-2048. [PMID: 31875314 PMCID: PMC7297833 DOI: 10.3758/s13414-019-01894-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Listeners are highly proficient at adapting to contextual variation when perceiving speech. In the present study, we examined the effects of brief speech and nonspeech contexts on the perception of sibilant fricatives. We explored three theoretically motivated accounts of contextual adaptation, based on phonetic cue calibration, phonetic covariation, and auditory contrast. Under the cue calibration account, listeners adapt by estimating a talker-specific average for each phonetic cue or dimension; under the cue covariation account, listeners adapt by exploiting consistencies in how the realization of speech sounds varies across talkers; under the auditory contrast account, adaptation results from (partial) masking of spectral components that are shared by adjacent stimuli. The spectral center of gravity, a phonetic cue to fricative identity, was manipulated for several types of context sound: /z/-initial syllables, /v/-initial syllables, and white noise matched in long-term average spectrum (LTAS) to the /z/-initial stimuli. Listeners' perception of the /s/-/ʃ/ contrast was significantly influenced by /z/-initial syllables and LTAS-matched white noise stimuli, but not by /v/-initial syllables. No significant difference in adaptation was observed between exposure to /z/-initial syllables and matched white noise stimuli, and speech did not have a considerable advantage over noise when the two were presented consecutively within a context. The pattern of findings is most consistent with the auditory contrast account of short-term perceptual adaptation. The cue covariation account makes accurate predictions for speech contexts, but not for nonspeech contexts or for the absence of a speech-versus-nonspeech difference.
Collapse
Affiliation(s)
- Eleanor Chodroff
- Department of Language and Linguistic Science, University of York, Heslington, York, YO10 5DD, UK.
| | - Colin Wilson
- Department of Cognitive Science, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD, 21218, USA
| |
Collapse
|
20
|
Abstract
Perception of sounds occurs in the context of surrounding sounds. When spectral properties differ between earlier (context) and later (target) sounds, categorization of later sounds becomes biased through spectral contrast effects (SCEs). Past research has shown SCEs to bias categorization of speech and music alike. Recent studies have extended SCEs to naturalistic listening conditions when the inherent spectral composition of (unfiltered) sentences biased speech categorization. Here, we tested whether natural (unfiltered) music would similarly bias categorization of French horn and tenor saxophone targets. Preceding contexts were either solo performances of the French horn or tenor saxophone (unfiltered; 1 second duration in Experiment 1, or 3 seconds duration in Experiment 2) or a string quintet processed to emphasize frequencies in the horn or saxophone (filtered; 1 second duration). Both approaches produced SCEs, producing more "saxophone" responses following horn / horn-like contexts and vice versa. One-second filtered contexts produced SCEs as in previous studies, but 1-second unfiltered contexts did not. Three-second unfiltered contexts biased perception, but to a lesser degree than filtered contexts did. These results extend SCEs in musical instrument categorization to everyday listening conditions.
Collapse
|
21
|
Llompart M, Reinisch E. Imitation in a Second Language Relies on Phonological Categories but Does Not Reflect the Productive Usage of Difficult Sound Contrasts. LANGUAGE AND SPEECH 2019; 62:594-622. [PMID: 30319031 DOI: 10.1177/0023830918803978] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This study investigated the relationship between imitation and both the perception and production abilities of second language (L2) learners for two non-native contrasts differing in their expected degree of difficulty. German learners of English were tested on perceptual categorization, imitation and a word reading task for the difficult English /ɛ/-/æ/ contrast, which tends not to be well encoded in the learners' phonological inventories, and the easy, near-native /i/-/ɪ/ contrast. As expected, within-task comparisons between contrasts revealed more robust perception and better differentiation during production for /i/-/ɪ/ than /ɛ/-/æ/. Imitation also followed this pattern, suggesting that imitation is modulated by the phonological encoding of L2 categories. Moreover, learners' ability to imitate /ɛ/ and /æ/ was related to their perception of that contrast, confirming a tight perception-production link at the phonological level for difficult L2 sound contrasts. However, no relationship was observed between acoustic measures for imitated and read-aloud tokens of /ɛ/ and /æ/. This dissociation is mostly attributed to the influence of inaccurate non-native lexical representations in the word reading task. We conclude that imitation is strongly related to the phonological representation of L2 sound contrasts, but does not need to reflect the learners' productive usage of such non-native distinctions.
Collapse
|
22
|
Stilp C. Acoustic context effects in speech perception. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2019; 11:e1517. [PMID: 31453667 DOI: 10.1002/wcs.1517] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 07/31/2019] [Accepted: 08/01/2019] [Indexed: 11/07/2022]
Abstract
The extreme acoustic variability of speech is well established, which makes the proficiency of human speech perception all the more impressive. Speech perception, like perception in any modality, is relative to context, and this provides a means to normalize the acoustic variability in the speech signal. Acoustic context effects in speech perception have been widely documented, but a clear understanding of how these effects relate to each other across stimuli, timescales, and acoustic domains is lacking. Here we review the influences that spectral context, temporal context, and spectrotemporal context have on speech perception. Studies are organized in terms of whether the context precedes the target (forward effects) or follows it (backward effects), and whether the context is adjacent to the target (proximal) or temporally removed from it (distal). Special cases where proximal and distal contexts have competing influences on perception are also considered. Across studies, a common theme emerges: acoustic differences between contexts and targets are perceptually magnified, producing contrast effects that facilitate perception of target sounds and words. This indicates enhanced sensitivity to changes in the acoustic environment, which maximizes the amount of potential information that can be transmitted to the perceiver. This article is categorized under: Linguistics > Language in Mind and Brain Psychology > Perception and Psychophysics.
Collapse
Affiliation(s)
- Christian Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky
| |
Collapse
|
23
|
Stilp CE. Auditory enhancement and spectral contrast effects in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:1503. [PMID: 31472539 DOI: 10.1121/1.5120181] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 07/11/2019] [Indexed: 06/10/2023]
Abstract
The auditory system is remarkably sensitive to changes in the acoustic environment. This is exemplified by two classic effects of preceding spectral context on perception. In auditory enhancement effects (EEs), the absence and subsequent insertion of a frequency component increases its salience. In spectral contrast effects (SCEs), spectral differences between earlier and later (target) sounds are perceptually magnified, biasing target sound categorization. These effects have been suggested to be related, but have largely been studied separately. Here, EEs and SCEs are demonstrated using the same speech materials. In Experiment 1, listeners categorized vowels (/ɪ/-/ɛ/) or consonants (/d/-/g/) following a sentence processed by a bandpass or bandstop filter (vowel tasks: 100-400 or 550-850 Hz; consonant tasks: 1700-2700 or 2700-3700 Hz). Bandpass filtering produced SCEs and bandstop filtering produced EEs, with effect magnitudes significantly correlated at the individual differences level. In Experiment 2, context sentences were processed by variable-depth notch filters in these frequency regions (-5 to -20 dB). EE magnitudes increased at larger notch depths, growing linearly in consonant categorization. This parallels previous research where SCEs increased linearly for larger spectral peaks in the context sentence. These results link EEs and SCEs, as both shape speech categorization in orderly ways.
Collapse
Affiliation(s)
- Christian E Stilp
- 317 Life Sciences Building, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
24
|
Sjerps MJ, Fox NP, Johnson K, Chang EF. Speaker-normalized sound representations in the human auditory cortex. Nat Commun 2019; 10:2465. [PMID: 31165733 PMCID: PMC6549175 DOI: 10.1038/s41467-019-10365-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 05/03/2019] [Indexed: 11/08/2022] Open
Abstract
The acoustic dimensions that distinguish speech sounds (like the vowel differences in "boot" and "boat") also differentiate speakers' voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listeners' perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listener's perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers.
Collapse
Affiliation(s)
- Matthias J Sjerps
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Kapittelweg 29, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Wundtlaan 1, Nijmegen, 6525 XD, Netherlands
| | - Neal P Fox
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California, 94158, USA
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, 1203 Dwinelle Hall #2650, Berkeley, California, 94720, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California, 94158, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California, 94158, USA.
| |
Collapse
|
25
|
|
26
|
Perceptual sensitivity to spectral properties of earlier sounds during speech categorization. Atten Percept Psychophys 2019; 80:1300-1310. [PMID: 29492759 DOI: 10.3758/s13414-018-1488-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech perception is heavily influenced by surrounding sounds. When spectral properties differ between earlier (context) and later (target) sounds, this can produce spectral contrast effects (SCEs) that bias perception of later sounds. For example, when context sounds have more energy in low-F1 frequency regions, listeners report more high-F1 responses to a target vowel, and vice versa. SCEs have been reported using various approaches for a wide range of stimuli, but most often, large spectral peaks were added to the context to bias speech categorization. This obscures the lower limit of perceptual sensitivity to spectral properties of earlier sounds, i.e., when SCEs begin to bias speech categorization. Listeners categorized vowels (/ɪ/-/ɛ/, Experiment 1) or consonants (/d/-/g/, Experiment 2) following a context sentence with little spectral amplification (+1 to +4 dB) in frequency regions known to produce SCEs. In both experiments, +3 and +4 dB amplification in key frequency regions of the context produced SCEs, but lesser amplification was insufficient to bias performance. This establishes a lower limit of perceptual sensitivity where spectral differences across sounds can bias subsequent speech categorization. These results are consistent with proposed adaptation-based mechanisms that potentially underlie SCEs in auditory perception. SIGNIFICANCE STATEMENT Recent sounds can change what speech sounds we hear later. This can occur when the average frequency composition of earlier sounds differs from that of later sounds, biasing how they are perceived. These "spectral contrast effects" are widely observed when sounds' frequency compositions differ substantially. We reveal the lower limit of these effects, as +3 dB amplification of key frequency regions in earlier sounds was enough to bias categorization of the following vowel or consonant sound. Speech categorization being biased by very small spectral differences across sounds suggests that spectral contrast effects occur frequently in everyday speech perception.
Collapse
|
27
|
|
28
|
Comparing speech and nonspeech context effects across timescales in coarticulatory contexts. Atten Percept Psychophys 2019; 80:316-324. [PMID: 29134576 DOI: 10.3758/s13414-017-1449-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Context effects are ubiquitous in speech perception and reflect the ability of human listeners to successfully perceive highly variable speech signals. In the study of how listeners compensate for coarticulatory variability, past studies have used similar effects speech and tone analogues of speech as strong support for speech-neutral, general auditory mechanisms for compensation for coarticulation. In this manuscript, we revisit compensation for coarticulation by replacing standard button-press responses with mouse-tracking responses and examining both standard geometric measures of uncertainty as well as newer information-theoretic measures that separate fast from slow mouse movements. We found that when our analyses were restricted to end-state responses, tones and speech contexts appeared to produce similar effects. However, a more detailed time-course analysis revealed systematic differences between speech and tone contexts such that listeners' responses to speech contexts, but not to tone contexts, changed across the experimental session. Analyses of the time course of effects within trials using mouse tracking indicated that speech contexts elicited fewer x-position flips but more area under the curve (AUC) and maximum deviation (MD), and they did so in the slower portions of mouse-tracking movements. Our results indicate critical differences between the time course of speech and nonspeech context effects and that general auditory explanations, motivated by their apparent similarity, be reexamined.
Collapse
|
29
|
Assgari AA, Theodore RM, Stilp CE. Variability in talkers' fundamental frequencies shapes context effects in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1443. [PMID: 31067942 DOI: 10.1121/1.5093638] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 02/22/2019] [Indexed: 06/09/2023]
Abstract
The perception of any given sound is influenced by surrounding sounds. When successive sounds differ in their spectral compositions, these differences may be perceptually magnified, resulting in spectral contrast effects (SCEs). For example, listeners are more likely to perceive /ɪ/ (low F1) following sentences with higher F1 frequencies; listeners are also more likely to perceive /ɛ/ (high F1) following sentences with lower F1 frequencies. Previous research showed that SCEs for vowel categorization were attenuated when sentence contexts were spoken by different talkers [Assgari and Stilp. (2015). J. Acoust. Soc. Am. 138(5), 3023-3032], but the locus of this diminished contextual influence was not specified. Here, three experiments examined implications of variable talker acoustics for SCEs in the categorization of /ɪ/ and /ɛ/. The results showed that SCEs were smaller when the mean fundamental frequency (f0) of context sentences was highly variable across talkers compared to when mean f0 was more consistent, even when talker gender was held constant. In contrast, SCE magnitudes were not influenced by variability in mean F1. These findings suggest that talker variability attenuates SCEs due to diminished consistency of f0 as a contextual influence. Connections between these results and talker normalization are considered.
Collapse
Affiliation(s)
- Ashley A Assgari
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06828, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
30
|
Bosker HR. Putting Laurel and Yanny in context. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:EL503. [PMID: 30599655 DOI: 10.1121/1.5070144] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 10/16/2018] [Indexed: 06/09/2023]
Abstract
Recently, the world's attention was caught by an audio clip that was perceived as "Laurel" or "Yanny." Opinions were sharply split: many could not believe others heard something different from their perception. However, a crowd-source experiment with >500 participants shows that it is possible to make people hear Laurel, where they previously heard Yanny, by manipulating preceding acoustic context. This study is not only the first to reveal within-listener variation in Laurel/Yanny percepts, but also to demonstrate contrast effects for global spectral information in larger frequency regions. Thus, it highlights the intricacies of human perception underlying these social media phenomena.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| |
Collapse
|
31
|
Holt LL, Tierney AT, Guerra G, Laffere A, Dick F. Dimension-selective attention as a possible driver of dynamic, context-dependent re-weighting in speech processing. Hear Res 2018; 366:50-64. [PMID: 30131109 PMCID: PMC6107307 DOI: 10.1016/j.heares.2018.06.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 06/10/2018] [Accepted: 06/19/2018] [Indexed: 12/24/2022]
Abstract
The contribution of acoustic dimensions to an auditory percept is dynamically adjusted and reweighted based on prior experience about how informative these dimensions are across the long-term and short-term environment. This is especially evident in speech perception, where listeners differentially weight information across multiple acoustic dimensions, and use this information selectively to update expectations about future sounds. The dynamic and selective adjustment of how acoustic input dimensions contribute to perception has made it tempting to conceive of this as a form of non-spatial auditory selective attention. Here, we review several human speech perception phenomena that might be consistent with auditory selective attention although, as of yet, the literature does not definitively support a mechanistic tie. We relate these human perceptual phenomena to illustrative nonhuman animal neurobiological findings that offer informative guideposts in how to test mechanistic connections. We next present a novel empirical approach that can serve as a methodological bridge from human research to animal neurobiological studies. Finally, we describe four preliminary results that demonstrate its utility in advancing understanding of human non-spatial dimension-based auditory selective attention.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, 15213, USA; Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| | - Adam T Tierney
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK
| | - Giada Guerra
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK
| | - Aeron Laffere
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK
| | - Frederic Dick
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK; Department of Experimental Psychology, University College London, London, WC1H 0AP, UK
| |
Collapse
|
32
|
Pike CD, Kriengwatana BP. Vocal tract constancy in birds and humans. Behav Processes 2018; 163:99-112. [PMID: 30145277 DOI: 10.1016/j.beproc.2018.08.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2017] [Revised: 07/30/2018] [Accepted: 08/10/2018] [Indexed: 12/30/2022]
Abstract
Humans perceive speech as being relatively stable despite acoustic variation caused by vocal tract (VT) differences between speakers. Humans use perceptual 'vocal tract normalisation' (VTN) and other processes to achieve this stability. Similarity in vocal apparatus/acoustics between birds and humans means that birds might also experience VT variation. This has the potential to impede bird communication. No known studies have explicitly examined this, but a number of studies show perceptual stability or 'perceptual constancy' in birds similar to that seen in humans when dealing with VT variation. This review explores similarities between birds and humans and concludes that birds show sufficient evidence of perceptual constancy to warrant further research in this area. Future work should 1) quantify the multiple sources of variation in bird vocalisations, including, but not limited to VT variations, 2) determine whether vocalisations are perniciously disrupted by any of these and 3) investigate how birds reduce variation to maintain perceptual constancy and perceptual efficiency.
Collapse
Affiliation(s)
- Cleopatra Diana Pike
- School of Psychology and Neuroscience, University of St Andrews, St Mary's Quad, South Street, St Andrews, Fife, KY16 9JP, UK.
| | - Buddhamas Pralle Kriengwatana
- School of Psychology and Neuroscience, University of St Andrews, St Mary's Quad, South Street, St Andrews, Fife, KY16 9JP, UK
| |
Collapse
|
33
|
Nourski KV, Steinschneider M, Rhone AE, Kawasaki H, Howard MA, Banks MI. Processing of auditory novelty across the cortical hierarchy: An intracranial electrophysiology study. Neuroimage 2018; 183:412-424. [PMID: 30114466 DOI: 10.1016/j.neuroimage.2018.08.027] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 08/02/2018] [Accepted: 08/12/2018] [Indexed: 11/15/2022] Open
Abstract
Under the predictive coding hypothesis, specific spatiotemporal patterns of cortical activation are postulated to occur during sensory processing as expectations generate feedback predictions and prediction errors generate feedforward signals. Establishing experimental evidence for this information flow within cortical hierarchy has been difficult, especially in humans, due to spatial and temporal limitations of non-invasive measures of cortical activity. This study investigated cortical responses to auditory novelty using the local/global deviant paradigm, which engages the hierarchical network underlying auditory predictive coding over short ('local deviance'; LD) and long ('global deviance'; GD) time scales. Electrocorticographic responses to auditory stimuli were obtained in neurosurgical patients from regions of interest (ROIs) including auditory, auditory-related and prefrontal cortex. LD and GD effects were assayed in averaged evoked potential (AEP) and high gamma (70-150 Hz) signals, the former likely dominated by local synaptic currents and the latter largely reflecting local spiking activity. AEP LD effects were distributed across all ROIs, with greatest percentage of significant sites in core and non-core auditory cortex. High gamma LD effects were localized primarily to auditory cortex in the superior temporal plane and on the lateral surface of the superior temporal gyrus (STG). LD effects exhibited progressively longer latencies in core, non-core, auditory-related and prefrontal cortices, consistent with feedforward signaling. The spatial distribution of AEP GD effects overlapped that of LD effects, but high gamma GD effects were more restricted to non-core areas. High gamma GD effects had shortest latencies in STG and preceded AEP GD effects in most ROIs. This latency profile, along with the paucity of high gamma GD effects in the superior temporal plane, suggest that the STG plays a prominent role in initiating novelty detection signals over long time scales. Thus, the data demonstrate distinct patterns of information flow in human cortex associated with auditory novelty detection over multiple time scales.
Collapse
Affiliation(s)
- Kirill V Nourski
- Department of Neurosurgery, The University of Iowa, Iowa City, IA 52242, USA; Iowa Neuroscience Institute, The University of Iowa, Iowa City, IA 52242, USA.
| | - Mitchell Steinschneider
- Departments of Neurology and Neuroscience, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Ariane E Rhone
- Department of Neurosurgery, The University of Iowa, Iowa City, IA 52242, USA
| | - Hiroto Kawasaki
- Department of Neurosurgery, The University of Iowa, Iowa City, IA 52242, USA
| | - Matthew A Howard
- Department of Neurosurgery, The University of Iowa, Iowa City, IA 52242, USA; Iowa Neuroscience Institute, The University of Iowa, Iowa City, IA 52242, USA; Pappajohn Biomedical Institute, The University of Iowa, Iowa City, IA 52242, USA
| | - Matthew I Banks
- Department of Anesthesiology and Neuroscience, University of Wisconsin - Madison, Madison, WI 53705, USA
| |
Collapse
|
34
|
Feng L, Oxenham AJ. Auditory enhancement and the role of spectral resolution in normal-hearing listeners and cochlear-implant users. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:552. [PMID: 30180692 PMCID: PMC6072550 DOI: 10.1121/1.5048414] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 06/25/2018] [Accepted: 07/11/2018] [Indexed: 05/17/2023]
Abstract
Detection of a target tone in a simultaneous multi-tone masker can be improved by preceding the stimulus with the masker alone. The mechanisms underlying this auditory enhancement effect may enable the efficient detection of new acoustic events and may help to produce perceptual constancy under varying acoustic conditions. Previous work in cochlear-implant (CI) users has suggested reduced or absent enhancement, due perhaps to poor spatial resolution in the cochlea. This study used a supra-threshold enhancement paradigm that in normal-hearing listeners results in large enhancement effects, exceeding 20 dB. Results from vocoder simulations using normal-hearing listeners showed that near-normal enhancement was observed if the simulated spread of excitation was limited to spectral slopes no shallower than 24 dB/oct. No significant enhancement was observed on average in CI users with their clinical monopolar stimulation strategy. The variability in enhancement between CI users, and between electrodes in a single CI user, could not be explained by the spread of excitation, as estimated from auditory nerve evoked potentials. Enhancement remained small, but did reach statistical significance, under the narrower partial-tripolar stimulation strategy. The results suggest that enhancement may be at least partially restored by improvements in the spatial resolution of current CIs.
Collapse
Affiliation(s)
- Lei Feng
- Department of Psychology, University of Minnesota, N218 Elliott Hall, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, N218 Elliott Hall, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
35
|
Gabay Y, Holt LL. Short-term adaptation to sound statistics is unimpaired in developmental dyslexia. PLoS One 2018; 13:e0198146. [PMID: 29879142 PMCID: PMC5991687 DOI: 10.1371/journal.pone.0198146] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 05/14/2018] [Indexed: 11/19/2022] Open
Abstract
Developmental dyslexia is presumed to arise from phonological impairments. Accordingly, people with dyslexia show speech perception deficits taken as indication of impoverished phonological representations. However, the nature of speech perception deficits in those with dyslexia remains elusive. Specifically, there is no agreement as to whether speech perception deficits arise from speech-specific processing impairments, or from general auditory impairments that might be either specific to temporal processing or more general. Recent studies show that general auditory referents such as Long Term Average Spectrum (LTAS, the distribution of acoustic energy across the duration of a sound sequence) affect speech perception. Here we examine the impact of preceding target sounds' LTAS on phoneme categorization to assess the nature of putative general auditory impairments associated with dyslexia. Dyslexic and typical listeners categorized speech targets varying perceptually from /ga/-/da/ preceded by speech and nonspeech tone contexts varying. Results revealed a spectrally contrastive influence of the preceding context LTAS on speech categorization, with a larger magnitude effect for nonspeech compared to speech precursors. Importantly, there was no difference in the presence or magnitude of the effects across dyslexia and control groups. These results demonstrate an aspect of general auditory processing that is spared in dyslexia, available to support phonemic processing when speech is presented in context.
Collapse
Affiliation(s)
- Yafit Gabay
- Department of Special Education, University of Haifa, Haifa, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, Haifa, Israel
| | - Lori L. Holt
- Carnegie Mellon University, Department of Psychology, Pittsburgh, United States of America
| |
Collapse
|
36
|
Kleinschmidt DF. Structure in talker variability: How much is there and how much can it help? LANGUAGE, COGNITION AND NEUROSCIENCE 2018; 34:43-68. [PMID: 30619905 PMCID: PMC6320234 DOI: 10.1080/23273798.2018.1500698] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
One of the persistent puzzles in understanding human speech perception is how listeners cope with talker variability. One thing that might help listeners is structure in talker variability: rather than varying randomly, talkers of the same gender, dialect, age, etc. tend to produce language in similar ways. Listeners are sensitive to this covariation between linguistic variation and socio-indexical variables. In this paper I present new techniques based on ideal observer models to quantify (1) the amount and type of structure in talker variation (informativity of a grouping variable), and (2) how useful such structure can be for robust speech recognition in the face of talker variability (the utility of a grouping variable). I demonstrate these techniques in two phonetic domains-word-initial stop voicing and vowel identity-and show that these domains have different amounts and types of talker variability, consistent with previous, impressionistic findings. An R package (phondisttools) accompanies this paper, and the source and data are available from osf.io/zv6e3.
Collapse
Affiliation(s)
- Dave F. Kleinschmidt
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
- Department of Brain and Cognitive Sciences, University of Rochester, New York, NY, USA
| |
Collapse
|
37
|
Perceptual averaging of line length: Effects of concurrent digit memory load. Atten Percept Psychophys 2017; 79:2510-2522. [DOI: 10.3758/s13414-017-1388-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
38
|
Bauer B. Does Stevens’s Power Law for Brightness Extend to Perceptual Brightness Averaging? PSYCHOLOGICAL RECORD 2017. [DOI: 10.1007/bf03395657] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
39
|
Stilp CE. Acoustic Context Alters Vowel Categorization in Perception of Noise-Vocoded Speech. J Assoc Res Otolaryngol 2017; 18:465-481. [PMID: 28281035 PMCID: PMC5418160 DOI: 10.1007/s10162-017-0615-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 01/30/2017] [Indexed: 10/20/2022] Open
Abstract
Normal-hearing listeners' speech perception is widely influenced by spectral contrast effects (SCEs), where perception of a given sound is biased away from stable spectral properties of preceding sounds. Despite this influence, it is not clear how these contrast effects affect speech perception for cochlear implant (CI) users whose spectral resolution is notoriously poor. This knowledge is important for understanding how CIs might better encode key spectral properties of the listening environment. Here, SCEs were measured in normal-hearing listeners using noise-vocoded speech to simulate poor spectral resolution. Listeners heard a noise-vocoded sentence where low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequency regions were amplified to encourage "eh" (/ɛ/) or "ih" (/ɪ/) responses to the following target vowel, respectively. This was done by filtering with +20 dB (experiment 1a) or +5 dB gain (experiment 1b) or filtering using 100 % of the difference between spectral envelopes of /ɛ/ and /ɪ/ endpoint vowels (experiment 2a) or only 25 % of this difference (experiment 2b). SCEs influenced identification of noise-vocoded vowels in each experiment at every level of spectral resolution. In every case but one, SCE magnitudes exceeded those reported for full-spectrum speech, particularly when spectral peaks in the preceding sentence were large (+20 dB gain, 100 % of the spectral envelope difference). Even when spectral resolution was insufficient for accurate vowel recognition, SCEs were still evident. Results are suggestive of SCEs influencing CI users' speech perception as well, encouraging further investigation of CI users' sensitivity to acoustic context.
Collapse
Affiliation(s)
- Christian E Stilp
- University of Louisville, 317 Life Sciences Building, Louisville, KY, 40292, USA.
| |
Collapse
|
40
|
Shigeno S. Effects of Auditory and Visual Priming on the Identification of Spoken Words. Percept Mot Skills 2017; 124:549-563. [PMID: 28361660 DOI: 10.1177/0031512516684459] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This study examined the effects of preceding contextual stimuli, either auditory or visual, on the identification of spoken target words. Fifty-one participants (29% males, 71% females; mean age = 24.5 years, SD = 8.5) were divided into three groups: no context, auditory context, and visual context. All target stimuli were spoken words masked with white noise. The relationships between the context and target stimuli were as follows: identical word, similar word, and unrelated word. Participants presented with context experienced a sequence of six context stimuli in the form of either spoken words or photographs. Auditory and visual context conditions produced similar results, but the auditory context aided word identification more than the visual context in the similar word relationship. We discuss these results in the light of top-down processing, motor theory, and the phonological system of language.
Collapse
Affiliation(s)
- Sumi Shigeno
- 1 Department of Psychology, College of Education, Psychology and Human Studies, Aoyama Gakuin University, Tokyo, Japan
| |
Collapse
|
41
|
Stilp CE, Assgari AA. Consonant categorization exhibits a graded influence of surrounding spectral context. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:EL153. [PMID: 28253661 DOI: 10.1121/1.4974769] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
When spectral properties differ across successive sounds, this difference is perceptually magnified, resulting in spectral contrast effects (SCEs). Recently, Stilp, Anderson, and Winn [(2015) J. Acoust. Soc. Am. 137(6), 3466-3476] revealed that SCEs are graded: more prominent spectral peaks in preceding sounds produced larger SCEs (i.e., category boundary shifts) in categorization of subsequent vowels. Here, a similar relationship between spectral context and SCEs was replicated in categorization of voiced stop consonants. By generalizing this relationship across consonants and vowels, different spectral cues, and different frequency regions, acute and graded sensitivity to spectral context appears to be pervasive in speech perception.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA ,
| | - Ashley A Assgari
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA ,
| |
Collapse
|
42
|
Kleinschmidt DF, Jaeger TF. Re-examining selective adaptation: Fatiguing feature detectors, or distributional learning? Psychon Bull Rev 2016; 23:678-91. [PMID: 26438255 PMCID: PMC4821823 DOI: 10.3758/s13423-015-0943-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
When a listener hears many good examples of a /b/ in a row, they are less likely to classify other sounds on, e.g., a /b/-to-/d/ continuum as /b/. This phenomenon is known as selective adaptation and is a well-studied property of speech perception. Traditionally, selective adaptation is seen as a mechanistic property of the speech perception system, and attributed to fatigue in acoustic-phonetic feature detectors. However, recent developments in our understanding of non-linguistic sensory adaptation and higher-level adaptive plasticity in speech perception and language comprehension suggest that it is time to re-visit the phenomenon of selective adaptation. We argue that selective adaptation is better thought of as a computational property of the speech perception system. Drawing on a common thread in recent work on both non-linguistic sensory adaptation and plasticity in language comprehension, we furthermore propose that selective adaptation can be seen as a consequence of distributional learning across multiple levels of representation. This proposal opens up new questions for research on selective adaptation itself, and also suggests that selective adaptation can be an important bridge between work on adaptation in low-level sensory systems and the complicated plasticity of the adult language comprehension system.
Collapse
Affiliation(s)
- Dave F Kleinschmidt
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA.
| | - T Florian Jaeger
- Departments of Brain and Cognitive Sciences, Computer Science, and Linguistics, University of Rochester, Rochester, NY, USA
| |
Collapse
|
43
|
Wang N, Oxenham AJ. Effects of auditory enhancement on the loudness of masker and target components. Hear Res 2016; 333:150-156. [PMID: 26805025 DOI: 10.1016/j.heares.2016.01.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Revised: 01/14/2016] [Accepted: 01/20/2016] [Indexed: 10/22/2022]
Abstract
Auditory enhancement refers to the observation that the salience of one spectral region (the "signal") of a broadband sound can be enhanced and can "pop out" from the remainder of the sound (the "masker") if it is preceded by the broadband sound without the signal. The present study investigated auditory enhancement as an effective change in loudness, to determine whether it reflects a change in the loudness of the signal, the masker, or both. In the first experiment, the 500-ms precursor, an inharmonic complex with logarithmically spaced components, was followed after a 50-ms gap by the 100-ms signal or masker alone, the loudness of which was compared with that of the same signal or masker presented 2 s later. In the second experiment, the loudness of the signal embedded in the masker was assessed with and without a precursor using the same method, as was the loudness of the entire signal-plus-masker complex. The results suggest that the precursor does not affect the loudness of the signal or the masker alone, but enhances the loudness of the signal in the presence of the masker, while leaving the loudness of the surrounding masker unaffected. The results are consistent with an explanation based on "adaptation of inhibition" [Viemeister and Bacon (1982). J. Acoust. Soc. Am. 71, 1502-1507].
Collapse
Affiliation(s)
- Ningyuan Wang
- Department of Psychology, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
44
|
Assgari AA, Stilp CE. Talker information influences spectral contrast effects in speech categorization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:3023-3032. [PMID: 26627776 DOI: 10.1121/1.4934559] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Spectral contrast effects, the perceptual magnification of spectral differences between sounds, have been widely shown to influence speech categorization. However, whether talker information alters spectral contrast effects was recently debated [Laing, Liu, Lotto, and Holt, Front. Psychol. 3, 1-9 (2012)]. Here, contributions of reliable spectral properties, between-talker and within-talker variability to spectral contrast effects in vowel categorization were investigated. Listeners heard sentences in three conditions (One Talker/One Sentence, One Talker/200 Sentences, 200 Talkers/200 Sentences) followed by a target vowel (varying from /ɪ/-/ɛ/ in F1, spoken by a single talker). Low-F1 or high-F1 frequency regions in the sentences were amplified to encourage /ɛ/ or /ɪ/ responses, respectively. When sentences contained large reliable spectral peaks (+20 dB; experiment 1), all contrast effect magnitudes were comparable. Talker information did not alter contrast effects following large spectral peaks, which were likely attributed to an external source (e.g., communication channel) rather than talkers. When sentences contained modest reliable spectral peaks (+5 dB; experiment 2), contrast effects were smaller following 200 Talkers/200 Sentences compared to single-talker conditions. Constant recalibration to new talkers reduced listeners' sensitivity to modest spectral peaks, diminishing contrast effects. Results bridge conflicting reports of whether talker information influences spectral contrast effects in speech categorization.
Collapse
Affiliation(s)
- Ashley A Assgari
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
45
|
Apfelbaum KS, McMurray B. Relative cue encoding in the context of sophisticated models of categorization: Separating information from categorization. Psychon Bull Rev 2015; 22:916-43. [PMID: 25475048 PMCID: PMC4621273 DOI: 10.3758/s13423-014-0783-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Traditional studies of human categorization often treat the processes of encoding features and cues as peripheral to the question of how stimuli are categorized. However, in domains where the features and cues are less transparent, how information is encoded prior to categorization may constrain our understanding of the architecture of categorization. This is particularly true in speech perception, where acoustic cues to phonological categories are ambiguous and influenced by multiple factors. Here, it is crucial to consider the joint contributions of the information in the input and the categorization architecture. We contrasted accounts that argue for raw acoustic information encoding with accounts that posit that cues are encoded relative to expectations, and investigated how two categorization architectures-exemplar models and back-propagation parallel distributed processing models-deal with each kind of information. Relative encoding, akin to predictive coding, is a form of noise reduction, so it can be expected to improve model accuracy; however, like predictive coding, the use of relative encoding in speech perception by humans is controversial, so results are compared to patterns of human performance, rather than on the basis of overall accuracy. We found that, for both classes of models, in the vast majority of parameter settings, relative cues greatly helped the models approximate human performance. This suggests that expectation-relative processing is a crucial precursor step in phoneme categorization, and that understanding the information content is essential to understanding categorization processes.
Collapse
Affiliation(s)
- Keith S Apfelbaum
- Department of Psychology, Ohio State University, Psychology Building, 1835 Neil Ave, Columbus, OH, 43210, USA,
| | | |
Collapse
|
46
|
Stilp CE, Anderson PW, Winn MB. Predicting contrast effects following reliable spectral properties in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:3466-3476. [PMID: 26093434 DOI: 10.1121/1.4921600] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Vowel perception is influenced by precursor sounds that are resynthesized to shift frequency regions [Ladefoged and Broadbent (1957). J. Acoust. Soc. Am. 29(1), 98-104] or filtered to emphasize narrow [Kiefte and Kluender (2008). J. Acoust. Soc. Am. 123(1), 366-376] or broad frequency regions [Watkins (1991). J. Acoust. Soc. Am. 90(6), 2942-2955]. Spectral differences between filtered precursors and vowel targets are perceptually enhanced, producing spectral contrast effects (e.g., emphasizing spectral properties of /ɪ/ in the precursor elicited more /ɛ/ responses to an /ɪ/-/ɛ/ vowel continuum, and vice versa). Historically, precursors have been processed by high-gain filters, resulting in prominent stable long-term spectral properties. Perceptual sensitivity to subtler but equally reliable spectral properties is unknown. Here, precursor sentences were processed by filters of variable bandwidths and different gains, then followed by vowel sounds varying from /ɪ/-/ɛ/. Contrast effects were widely observed, including when filters had only 100-Hz bandwidth or +5 dB gain. Average filter power was a good predictor of the magnitudes of contrast effects, revealing a close linear correspondence between the prominence of a reliable spectral property and the size of shifts in perceptual responses. High sensitivity to subtle spectral regularities suggests contrast effects are not limited to high-power filters, and thus may be more pervasive in speech perception than previously thought.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Paul W Anderson
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Matthew B Winn
- Department of Surgery, Waisman Center, University of Wisconsin, Madison, Wisconsin 53706, USA
| |
Collapse
|
47
|
Abstract
Perceptual hysteresis can be defined as the enduring influence of the recent past on current perception. Here, hysteresis was investigated in a basic auditory task: pitch comparisons between successive tones. On each trial, listeners were presented with pairs of tones and asked to report the direction of subjective pitch shift, as either "up" or "down." All tones were complexes known as Shepard tones (Shepard, 1964), which comprise several frequency components at octave multiples of a base frequency. The results showed that perceptual judgments were determined both by stimulus-related factors (the interval ratio between the base frequencies within a pair) and by recent context (the intervals in the two previous trials). When tones were presented in ordered sequences, for which the frequency interval between tones was varied in a progressive manner, strong hysteresis was found. In particular, ambiguous stimuli that led to equal probabilities of "up" and "down" responses within a randomized context were almost fully determined within an ordered context. Moreover, hysteresis did not act on the direction of the reported pitch shift, but rather on the perceptual representation of each tone. Thus, hysteresis could be observed within sequences in which listeners varied between "up" and "down" responses, enabling us to largely rule out confounds related to response bias. The strength of the perceptual hysteresis observed suggests that the ongoing context may have a substantial influence on fundamental aspects of auditory perception, such as how we perceive the changes in pitch between successive sounds.
Collapse
|
48
|
Maddox WT, Chandrasekaran B. Tests of a Dual-systems Model of Speech Category Learning. BILINGUALISM (CAMBRIDGE, ENGLAND) 2014; 17:709-728. [PMID: 25264426 PMCID: PMC4171735 DOI: 10.1017/s1366728913000783] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
In the visual domain, more than two decades of work posits the existence of dual category learning systems. The reflective system uses working memory to develop and test rules for classifying in an explicit fashion. The reflexive system operates by implicitly associating perception with actions that lead to reinforcement. Dual-systems models posit that in learning natural categories, learners initially use the reflective system and with practice, transfer control to the reflexive system. The role of reflective and reflexive systems in second language (L2) speech learning has not been systematically examined. Here monolingual, native speakers of American English were trained to categorize Mandarin tones produced by multiple talkers. Our computational modeling approach demonstrates that learners use reflective and reflexive strategies during tone category learning. Successful learners use talker-dependent, reflective analysis early in training and reflexive strategies by the end of training. Our results demonstrate that dual-learning systems are operative in L2 speech learning. Critically, learner strategies directly relate to individual differences in category learning success.
Collapse
|
49
|
Wang N, Oxenham AJ. Spectral motion contrast as a speech context effect. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:1237. [PMID: 25190397 PMCID: PMC4165225 DOI: 10.1121/1.4892771] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Revised: 07/11/2014] [Accepted: 07/21/2014] [Indexed: 06/03/2023]
Abstract
Spectral contrast effects may help "normalize" the incoming sound and produce perceptual constancy in the face of the variable acoustics produced by different rooms, talkers, and backgrounds. Recent studies have concentrated on the after-effects produced by the long-term average power spectrum. The present study examined contrast effects based on spectral motion, analogous to visual-motion after-effects. In experiment 1, the existence of spectral-motion after-effects with word-length inducers was established by demonstrating that the identification of the direction of a target spectral glide was influenced by the spectral motion of a preceding inducer glide. In experiment 2, the target glide was replaced with a synthetic sine-wave speech sound, including a formant transition. The speech category boundary was shifted by the presence and direction of the inducer glide. Finally, in experiment 3, stimuli based on synthetic sine-wave speech sounds were used as both context and target stimuli to show that the spectral-motion after-effects could occur even with inducers with relatively short speech-like durations and small frequency excursions. The results suggest that spectral motion may play a complementary role to the long-term average power spectrum in inducing speech context effects.
Collapse
Affiliation(s)
- Ningyuan Wang
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
50
|
Van Engen KJ, Peelle JE. Listening effort and accented speech. Front Hum Neurosci 2014; 8:577. [PMID: 25140140 PMCID: PMC4122174 DOI: 10.3389/fnhum.2014.00577] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 07/14/2014] [Indexed: 11/25/2022] Open
Affiliation(s)
| | - Jonathan E. Peelle
- Department of Otolaryngology, Washington University in St. LouisSt. Louis, MO, USA
| |
Collapse
|