1
|
Gao Z, Oxenham AJ. Adaptation to sentences and melodies when making judgments along a voice-nonvoice continuum. Atten Percept Psychophys 2025; 87:1022-1032. [PMID: 40000570 DOI: 10.3758/s13414-025-03030-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/05/2025] [Indexed: 02/27/2025]
Abstract
Adaptation to constant or repetitive sensory signals serves to improve detection of novel events in the environment and to encode incoming information more efficiently. Within the auditory modality, contrastive adaptation effects have been observed within a number of categories, including voice and musical instrument type. A recent study found contrastive perceptual shifts between voice and instrument categories following repetitive presentation of adaptors consisting of either vowels or instrument tones. The current study tested the generalizability of adaptation along a voice-instrument continuum, using more ecologically valid adaptors. Participants were presented with an adaptor followed by an ambiguous voice-instrument target, created by generating a 10-step morphed continuum between pairs of vowel and instrument sounds. Listeners' categorization of the target sounds was shifted contrastively by a spoken sentence or instrumental melody adaptor, regardless of whether the adaptor and the target shared the same speaker gender or similar pitch range (Experiment 1). However, no significant contrastive adaptation was observed when nonspeech vocalizations or nonpitched percussion sounds were used as the adaptors (Experiment 2). The results suggest that adaptation between voice and nonvoice categories does not rely on exact repetition of simple stimuli, nor does it solely reflect the result of a sound being categorized as being human or nonhuman sourced. The outcomes suggest future directions for determining the precise spectro-temporal properties of sounds that induce these voice-instrument contrastive adaptation effects.
Collapse
Affiliation(s)
- Zi Gao
- Department of Psychology, University of Minnesota-Twin Cities, 75 E River Rd, Minneapolis, MN, 55455, USA.
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota-Twin Cities, 75 E River Rd, Minneapolis, MN, 55455, USA
| |
Collapse
|
2
|
Chen F, Kuang C, Wang L, Chen X. Context effects on lexical tone categorization in quiet and noisy conditions by young, middle-aged, and older individuals. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:1795-1806. [PMID: 40084902 DOI: 10.1121/10.0036146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2024] [Accepted: 02/21/2025] [Indexed: 03/16/2025]
Abstract
Previous studies focused on how contexts affect the recognition of lexical tones, primarily among healthy young adults in a quiet environment. However, little is known about how senescence and cognitive decline influence lexical tone normalization in adverse listening conditions. This study aims to explore how F0 shifts of the preceding context affect lexical tone identification across different age groups in quiet and noisy conditions. Twenty-two Mandarin-speaking young adults, 22 middle-aged adults, and 21 older adults with mild cognitive impairment (MCI) participated in tone identification tasks with and without speech contexts. The identification tasks with contexts were conducted in quiet and babble noise with signal-to-noise ratios (SNRs) set at 5 and 0 dB. Results showed that contextual F0 cues exerted an equal impact on lexical tone normalization across all three age groups in the quiet environment. Nevertheless, under SNRs of 5 and 0 dB, noise nullified such an effect. Moreover, working memory was negatively correlated with the size of lexical tone normalization in the older group. These findings suggest that context effects on Mandarin tone normalization tend to be resistant to senescence and MCI but susceptible to babble noise, offering further insights into the cognitive processing mechanisms underlying speech normalization.
Collapse
Affiliation(s)
- Fei Chen
- School of Foreign Languages, Hunan University, Changsha 410082, China
| | - Chen Kuang
- School of Foreign Languages, Hunan University, Changsha 410082, China
| | - Liping Wang
- School of Foreign Languages, Hunan University, Changsha 410082, China
| | - Xiaoxiang Chen
- School of Foreign Languages, Hunan University, Changsha 410082, China
| |
Collapse
|
3
|
King CJ, Sharpe CM, Shorey AE, Stilp CE. The effects of variability on context effects and psychometric function slopes in speaking rate normalizationa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2099-2113. [PMID: 38483206 PMCID: PMC10942802 DOI: 10.1121/10.0025292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 02/23/2024] [Accepted: 02/27/2024] [Indexed: 03/17/2024]
Abstract
Acoustic context influences speech perception, but contextual variability restricts this influence. Assgari and Stilp [J. Acoust. Soc. Am. 138, 3023-3032 (2015)] demonstrated that when categorizing vowels, variability in who spoke the preceding context sentence on each trial but not the sentence contents diminished the resulting spectral contrast effects (perceptual shifts in categorization stemming from spectral differences between sounds). Yet, how such contextual variability affects temporal contrast effects (TCEs) (also known as speaking rate normalization; categorization shifts stemming from temporal differences) is unknown. Here, stimuli were the same context sentences and conditions (one talker saying one sentence, one talker saying 200 sentences, 200 talkers saying 200 sentences) used in Assgari and Stilp [J. Acoust. Soc. Am. 138, 3023-3032 (2015)], but set to fast or slow speaking rates to encourage perception of target words as "tier" or "deer," respectively. In Experiment 1, sentence variability and talker variability each diminished TCE magnitudes; talker variability also produced shallower psychometric function slopes. In Experiment 2, when speaking rates were matched across the 200-sentences conditions, neither TCE magnitudes nor slopes differed across conditions. In Experiment 3, matching slow and fast rates across all conditions failed to produce equal TCEs and slopes everywhere. Results suggest a complex interplay between acoustic, talker, and sentence variability in shaping TCEs in speech perception.
Collapse
Affiliation(s)
- Caleb J King
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Chloe M Sharpe
- School of Psychology, Xavier University, Cincinnati, Ohio 45207, USA
| | - Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
4
|
Liu W, Wang T, Huang X. The influences of forward context on stop-consonant perception: The combined effects of contrast and acoustic cue activation? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:1903-1920. [PMID: 37756574 DOI: 10.1121/10.0021077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 09/06/2023] [Indexed: 09/29/2023]
Abstract
The perception of the /da/-/ga/ series, distinguished primarily by the third formant (F3) transition, is affected by many nonspeech and speech sounds. Previous studies mainly investigated the influences of context stimuli with frequency bands located in the F3 region and proposed the account of spectral contrast effects. This study examined the effects of context stimuli with bands not in the F3 region. The results revealed that these non-F3-region stimuli (whether with bands higher or lower than the F3 region) mainly facilitated the identification of /ga/; for example, the stimuli (including frequency-modulated glides, sine-wave tones, filtered sentences, and natural vowels) in the low-frequency band (500-1500 Hz) led to more /ga/ responses than those in the low-F3 region (1500-2500 Hz). It is suggested that in the F3 region, context stimuli may act through spectral contrast effects, while in non-F3 regions, context stimuli might activate the acoustic cues of /g/ and further facilitate the identification of /ga/. The combination of contrast and acoustic cue effects can explain more results concerning the forward context influences on the perception of the /da/-/ga/ series, including the effects of non-F3-region stimuli and the imbalanced influences of context stimuli on /da/ and /ga/ perception.
Collapse
Affiliation(s)
- Wenli Liu
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, 38 Tongshuo Road, Tianjin 300350, China
| | - Tianyu Wang
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, 38 Tongshuo Road, Tianjin 300350, China
| | - Xianjun Huang
- School of Psychology, Capital Normal University, 105 North West 3rd Ring Road, Beijing 100048, China
| |
Collapse
|
5
|
Kahloon L, Shorey AE, King CJ, Stilp CE. Clear speech promotes speaking rate normalization. JASA EXPRESS LETTERS 2023; 3:055205. [PMID: 37219432 PMCID: PMC11303017 DOI: 10.1121/10.0019499] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 05/05/2023] [Indexed: 05/24/2023]
Abstract
When speaking in noisy conditions or to a hearing-impaired listener, talkers often use clear speech, which is typically slower than conversational speech. In other research, changes in speaking rate affect speech perception through speaking rate normalization: Slower context sounds encourage perception of subsequent sounds as faster, and vice versa. Here, on each trial, listeners heard a context sentence before the target word (which varied from "deer" to "tier"). Clear and slowed conversational context sentences elicited more "deer" responses than conversational sentences, consistent with rate normalization. Changing speaking styles aids speech intelligibility but might also produce other outcomes that alter sound/word recognition.
Collapse
Affiliation(s)
- Lilah Kahloon
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, , , ,
| | - Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, , , ,
| | - Caleb J King
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, , , ,
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, , , ,
| |
Collapse
|
6
|
Shorey AE, Stilp CE. Short-term, not long-term, average spectra of preceding sentences bias consonant categorization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2426. [PMID: 37092945 PMCID: PMC10119874 DOI: 10.1121/10.0017862] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 03/31/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
Speech sound perception is influenced by the spectral properties of surrounding sounds. For example, listeners perceive /g/ (lower F3 onset) more often after sounds with prominent high-F3 frequencies and perceive /d/ (higher F3 onset) more often after sounds with prominent low-F3 frequencies. These biases are known as spectral contrast effects (SCEs). Much of this work examined differences between long-term average spectra (LTAS) of preceding sounds and target speech sounds. Post hoc analyses by Stilp and Assgari [(2021) Atten. Percept. Psychophys. 83(6) 2694-2708] revealed that spectra of the last 475 ms of precursor sentences, not the entire LTAS, best predicted biases in consonant categorization. Here, the influences of proximal (last 500 ms) versus distal (before the last 500 ms) portions of precursor sentences on subsequent consonant categorization were compared. Sentences emphasized different frequency regions in each temporal window (e.g., distal low-F3 emphasis, proximal high-F3 emphasis, and vice versa) naturally or via filtering. In both cases, shifts in consonant categorization were produced in accordance with spectral properties of the proximal window. This was replicated when the distal window did not emphasize either frequency region, but the proximal window did. Results endorse closer consideration of patterns of spectral energy over time in preceding sounds, not just their LTAS.
Collapse
Affiliation(s)
- Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
7
|
Contributions of natural signal statistics to spectral context effects in consonant categorization. Atten Percept Psychophys 2021; 83:2694-2708. [PMID: 33987821 DOI: 10.3758/s13414-021-02310-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2021] [Indexed: 11/08/2022]
Abstract
Speech perception, like all perception, takes place in context. Recognition of a given speech sound is influenced by the acoustic properties of surrounding sounds. When the spectral composition of earlier (context) sounds (e.g., a sentence with more energy at lower third formant [F3] frequencies) differs from that of a later (target) sound (e.g., consonant with intermediate F3 onset frequency), the auditory system magnifies this difference, biasing target categorization (e.g., towards higher-F3-onset /d/). Historically, these studies used filters to force context stimuli to possess certain spectral compositions. Recently, these effects were produced using unfiltered context sounds that already possessed the desired spectral compositions (Stilp & Assgari, 2019, Attention, Perception, & Psychophysics, 81, 2037-2052). Here, this natural signal statistics approach is extended to consonant categorization (/g/-/d/). Context sentences were either unfiltered (already possessing the desired spectral composition) or filtered (to imbue specific spectral characteristics). Long-term spectral characteristics of unfiltered contexts were poor predictors of shifts in consonant categorization, but short-term characteristics (last 475 ms) were excellent predictors. This diverges from vowel data, where long-term and shorter-term intervals (last 1,000 ms) were equally strong predictors. Thus, time scale plays a critical role in how listeners attune to signal statistics in the acoustic environment.
Collapse
|
8
|
Abstract
Perception of sounds occurs in the context of surrounding sounds. When spectral properties differ between earlier (context) and later (target) sounds, categorization of later sounds becomes biased through spectral contrast effects (SCEs). Past research has shown SCEs to bias categorization of speech and music alike. Recent studies have extended SCEs to naturalistic listening conditions when the inherent spectral composition of (unfiltered) sentences biased speech categorization. Here, we tested whether natural (unfiltered) music would similarly bias categorization of French horn and tenor saxophone targets. Preceding contexts were either solo performances of the French horn or tenor saxophone (unfiltered; 1 second duration in Experiment 1, or 3 seconds duration in Experiment 2) or a string quintet processed to emphasize frequencies in the horn or saxophone (filtered; 1 second duration). Both approaches produced SCEs, producing more "saxophone" responses following horn / horn-like contexts and vice versa. One-second filtered contexts produced SCEs as in previous studies, but 1-second unfiltered contexts did not. Three-second unfiltered contexts biased perception, but to a lesser degree than filtered contexts did. These results extend SCEs in musical instrument categorization to everyday listening conditions.
Collapse
|
9
|
Abstract
Speech perception is challenged by indexical variability. A litany of studies on talker normalization have demonstrated that hearing multiple talkers incurs processing costs (e.g., lower accuracy, increased response time) compared to hearing a single talker. However, when reframing these studies in terms of stimulus structure, it is evident that past tests of multiple-talker (i.e., low structure) and single-talker (i.e., high structure) conditions are not representative of the graded nature of indexical variation in the environment. Here we tested the hypothesis that processing costs incurred by multiple-talker conditions would abate given increased stimulus structure. We tested this hypothesis by manipulating the degree to which talkers' voices differed acoustically (Experiment 1) and also the frequency with which talkers' voices changed (Experiment 2) in multiple-talker conditions. Listeners performed a speeded classification task for words containing vowels that varied in acoustic-phonemic ambiguity. In Experiment 1, response times progressively decreased as acoustic variability among talkers' voices decreased. In Experiment 2, blocking talkers within mixed-talker conditions led to more similar response times among single-talker and multiple-talker conditions. Neither result interacted with acoustic-phonemic ambiguity of the target vowels. Thus, the results showed that indexical structure mediated the processing costs incurred by hearing different talkers. This is consistent with the Efficient Coding Hypothesis, which proposes that sensory and perceptual processing are facilitated by stimulus structure. Defining the roles and limits of stimulus structure on speech perception is an important direction for future research.
Collapse
|
10
|
Stilp C. Acoustic context effects in speech perception. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2019; 11:e1517. [PMID: 31453667 DOI: 10.1002/wcs.1517] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 07/31/2019] [Accepted: 08/01/2019] [Indexed: 11/07/2022]
Abstract
The extreme acoustic variability of speech is well established, which makes the proficiency of human speech perception all the more impressive. Speech perception, like perception in any modality, is relative to context, and this provides a means to normalize the acoustic variability in the speech signal. Acoustic context effects in speech perception have been widely documented, but a clear understanding of how these effects relate to each other across stimuli, timescales, and acoustic domains is lacking. Here we review the influences that spectral context, temporal context, and spectrotemporal context have on speech perception. Studies are organized in terms of whether the context precedes the target (forward effects) or follows it (backward effects), and whether the context is adjacent to the target (proximal) or temporally removed from it (distal). Special cases where proximal and distal contexts have competing influences on perception are also considered. Across studies, a common theme emerges: acoustic differences between contexts and targets are perceptually magnified, producing contrast effects that facilitate perception of target sounds and words. This indicates enhanced sensitivity to changes in the acoustic environment, which maximizes the amount of potential information that can be transmitted to the perceiver. This article is categorized under: Linguistics > Language in Mind and Brain Psychology > Perception and Psychophysics.
Collapse
Affiliation(s)
- Christian Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky
| |
Collapse
|
11
|
Stilp CE. Auditory enhancement and spectral contrast effects in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:1503. [PMID: 31472539 DOI: 10.1121/1.5120181] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 07/11/2019] [Indexed: 06/10/2023]
Abstract
The auditory system is remarkably sensitive to changes in the acoustic environment. This is exemplified by two classic effects of preceding spectral context on perception. In auditory enhancement effects (EEs), the absence and subsequent insertion of a frequency component increases its salience. In spectral contrast effects (SCEs), spectral differences between earlier and later (target) sounds are perceptually magnified, biasing target sound categorization. These effects have been suggested to be related, but have largely been studied separately. Here, EEs and SCEs are demonstrated using the same speech materials. In Experiment 1, listeners categorized vowels (/ɪ/-/ɛ/) or consonants (/d/-/g/) following a sentence processed by a bandpass or bandstop filter (vowel tasks: 100-400 or 550-850 Hz; consonant tasks: 1700-2700 or 2700-3700 Hz). Bandpass filtering produced SCEs and bandstop filtering produced EEs, with effect magnitudes significantly correlated at the individual differences level. In Experiment 2, context sentences were processed by variable-depth notch filters in these frequency regions (-5 to -20 dB). EE magnitudes increased at larger notch depths, growing linearly in consonant categorization. This parallels previous research where SCEs increased linearly for larger spectral peaks in the context sentence. These results link EEs and SCEs, as both shape speech categorization in orderly ways.
Collapse
Affiliation(s)
- Christian E Stilp
- 317 Life Sciences Building, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
12
|
Long-standing problems in speech perception dissolve within an information-theoretic perspective. Atten Percept Psychophys 2019; 81:861-883. [PMID: 30937673 DOI: 10.3758/s13414-019-01702-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
An information theoretic framework is proposed to have the potential to dissolve (rather than attempt to solve) multiple long-standing problems concerning speech perception. By this view, speech perception can be reframed as a series of processes through which sensitivity to information-that which changes and/or is unpredictable-becomes increasingly sophisticated and shaped by experience. Problems concerning appropriate objects of perception (gestures vs. sounds), rate normalization, variance consequent to articulation, and talker normalization are reframed, or even dissolved, within this information-theoretic framework. Application of discriminative models founded on information theory provides a productive approach to answer questions concerning perception of speech, and perception most broadly.
Collapse
|