1
|
Bidelman GM, Bernard F, Skubic K. Hearing in categories and speech perception at the "cocktail party". PLoS One 2025; 20:e0318600. [PMID: 39883695 PMCID: PMC11781644 DOI: 10.1371/journal.pone.0318600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 01/17/2025] [Indexed: 02/01/2025] Open
Abstract
We aimed to test whether hearing speech in phonetic categories (as opposed to a continuous/gradient fashion) affords benefits to "cocktail party" speech perception. We measured speech perception performance (recognition, localization, and source monitoring) in a simulated 3D cocktail party environment. We manipulated task difficulty by varying the number of additional maskers presented at other spatial locations in the horizontal soundfield (1-4 talkers) and via forward vs. time-reversed maskers, the latter promoting a release from masking. In separate tasks, we measured isolated phoneme categorization using two-alternative forced choice (2AFC) and visual analog scaling (VAS) tasks designed to promote more/less categorical hearing and thus test putative links between categorization and real-world speech-in-noise skills. We first show cocktail party speech recognition accuracy and speed decline with additional competing talkers and amidst forward compared to reverse maskers. Dividing listeners into "discrete" vs. "continuous" categorizers based on their VAS labeling (i.e., whether responses were binary or continuous judgments), we then show the degree of release from masking experienced at the cocktail party is predicted by their degree of categoricity in phoneme labeling and not high-frequency audiometric thresholds; more discrete listeners make less effective use of time-reversal and show less release from masking than their gradient responding peers. Our results suggest a link between speech categorization skills and cocktail party processing, with a gradient (rather than discrete) listening strategy benefiting degraded speech perception. These findings suggest that less flexibility in binning sounds into categories may be one factor that contributes to figure-ground deficits.
Collapse
Affiliation(s)
- Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, Indiana, United States of America
- Program in Neuroscience, Indiana University, Bloomington, Indiana, United States of America
- Cognitive Science Program, Indiana University, Bloomington, Indiana, United States of America
| | - Fallon Bernard
- School of Communication Sciences & Disorders, University of Memphis, Memphis, Tennessee, United States of America
| | - Kimberly Skubic
- School of Communication Sciences & Disorders, University of Memphis, Memphis, Tennessee, United States of America
| |
Collapse
|
2
|
Bidelman GM, York A, Pearson C. Neural correlates of phonetic categorization under auditory (phoneme) and visual (grapheme) modalities. Neuroscience 2025; 565:182-191. [PMID: 39631659 DOI: 10.1016/j.neuroscience.2024.11.079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 11/16/2024] [Accepted: 11/30/2024] [Indexed: 12/07/2024]
Abstract
This study assessed the neural mechanisms and relative saliency of categorization for speech sounds and comparable graphemes (i.e., visual letters) of the same phonetic label. Given that linguistic experience shapes categorical processing, and letter-speech sound matching plays a crucial role during early reading acquisition, we hypothesized sound phoneme and visual grapheme tokens representing the same linguistic identity might recruit common neural substrates, despite originating from different sensory modalities. Behavioral and neuroelectric brain responses (ERPs) were acquired as participants categorized stimuli from sound (phoneme) and homologous letter (grapheme) continua each spanning a /da/-/ga/ gradient. Behaviorally, listeners were faster and showed stronger categorization of phoneme compared to graphemes. At the neural level, multidimensional scaling of the EEG revealed responses self-organized in a categorial fashion such that tokens clustered within their respective modality beginning ∼150-250 ms after stimulus onset. Source-resolved ERPs further revealed modality-specific and overlapping brain regions supporting phonetic categorization. Left inferior frontal gyrus and auditory cortex showed stronger responses for sound category members compared to phonetically ambiguous tokens, whereas early visual cortices paralleled this categorical organization for graphemes. Auditory and visual categorization also recruited common visual association areas in extrastriate cortex but in opposite hemispheres (auditory = left; visual = right). Our findings reveal both auditory and visual sensory cortex supports categorical organization for phonetic labels within their respective modalities. However, a partial overlap in phoneme and grapheme processing among occipital brain areas implies the presence of an isomorphic, domain-general mapping for phonetic categories in dorsal visual system.
Collapse
Affiliation(s)
- Gavin M Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA; Program in Neuroscience, Indiana University, Bloomington, IN, USA; Cognitive Science Program, Indiana University, Bloomington, IN, USA.
| | - Ashleigh York
- School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA; Univeristy of Mississippi Medical Center, Jackson, MS, USA
| | - Claire Pearson
- School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA
| |
Collapse
|
3
|
Rizzi R, Bidelman GM. Functional benefits of continuous vs. categorical listening strategies on the neural encoding and perception of noise-degraded speech. Brain Res 2024; 1844:149166. [PMID: 39151718 PMCID: PMC11399885 DOI: 10.1016/j.brainres.2024.149166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Revised: 07/26/2024] [Accepted: 08/13/2024] [Indexed: 08/19/2024]
Abstract
Acoustic information in speech changes continuously, yet listeners form discrete perceptual categories to ease the demands of perception. Being a more continuous/gradient as opposed to a more discrete/categorical listener may be further advantageous for understanding speech in noise by increasing perceptual flexibility and resolving ambiguity. The degree to which a listener's responses to a continuum of speech sounds are categorical versus continuous can be quantified using visual analog scaling (VAS) during speech labeling tasks. Here, we recorded event-related brain potentials (ERPs) to vowels along an acoustic-phonetic continuum (/u/ to /a/) while listeners categorized phonemes in both clean and noise conditions. Behavior was assessed using standard two alternative forced choice (2AFC) and VAS paradigms to evaluate categorization under task structures that promote discrete vs. continuous hearing, respectively. Behaviorally, identification curves were steeper under 2AFC vs. VAS categorization but were relatively immune to noise, suggesting robust access to abstract, phonetic categories even under signal degradation. Behavioral slopes were correlated with listeners' QuickSIN scores; shallower slopes corresponded with better speech in noise performance, suggesting a perceptual advantage to noise degraded speech comprehension conferred by a more gradient listening strategy. At the neural level, P2 amplitudes and latencies of the ERPs were modulated by task and noise; VAS responses were larger and showed greater noise-related latency delays than 2AFC responses. More gradient responders had smaller shifts in ERP latency with noise, suggesting their neural encoding of speech was more resilient to noise degradation. Interestingly, source-resolved ERPs showed that more gradient listening was also correlated with stronger neural responses in left superior temporal gyrus. Our results demonstrate that listening strategy modulates the categorical organization of speech and behavioral success, with more continuous/gradient listening being advantageous to sentential speech in noise perception.
Collapse
Affiliation(s)
- Rose Rizzi
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA; Program in Neuroscience, Indiana University, Bloomington, IN, USA
| | - Gavin M Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA; Program in Neuroscience, Indiana University, Bloomington, IN, USA; Cognitive Science Program, Indiana University, Bloomington, IN, USA.
| |
Collapse
|
4
|
Bidelman GM, York A, Pearson C. Neural correlates of phonetic categorization under auditory (phoneme) and visual (grapheme) modalities. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.24.604940. [PMID: 39211275 PMCID: PMC11361091 DOI: 10.1101/2024.07.24.604940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
We tested whether the neural mechanisms of phonetic categorization are specific to speech sounds or generalize to graphemes (i.e., visual letters) of the same phonetic label. Given that linguistic experience shapes categorical processing, and letter-speech sound matching plays a crucial role during early reading acquisition, we hypothesized sound phoneme and visual grapheme tokens representing the same linguistic identity might recruit common neural substrates, despite originating from different sensory modalities. Behavioral and neuroelectric brain responses (ERPs) were acquired as participants categorized stimuli from sound (phoneme) and homologous letter (grapheme) continua each spanning a /da/ - /ga/ gradient. Behaviorally, listeners were faster and showed stronger categorization of phoneme compared to graphemes. At the neural level, multidimensional scaling of the EEG revealed responses self-organized in a categorial fashion such that tokens clustered within their respective modality beginning ∼150-250 ms after stimulus onset. Source-resolved ERPs further revealed modality-specific and overlapping brain regions supporting phonetic categorization. Left inferior frontal gyrus and auditory cortex showed stronger responses for sound category members compared to phonetically ambiguous tokens, whereas early visual cortices paralleled this categorical organization for graphemes. Auditory and visual categorization also recruited common visual association areas in extrastriate cortex but in opposite hemispheres (auditory = left; visual=right). Our findings reveal both auditory and visual sensory cortex supports categorical organization for phonetic labels within their respective modalities. However, a partial overlap in phoneme and grapheme processing among occipital brain areas implies the presence of an isomorphic, domain-general mapping for phonetic categories in dorsal visual system.
Collapse
|
5
|
Rizzi R, Bidelman GM. Functional benefits of continuous vs. categorical listening strategies on the neural encoding and perception of noise-degraded speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594387. [PMID: 38798410 PMCID: PMC11118460 DOI: 10.1101/2024.05.15.594387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Acoustic information in speech changes continuously, yet listeners form discrete perceptual categories to ease the demands of perception. Being a more continuous/gradient as opposed to a discrete/categorical listener may be further advantageous for understanding speech in noise by increasing perceptual flexibility and resolving ambiguity. The degree to which a listener's responses to a continuum of speech sounds are categorical versus continuous can be quantified using visual analog scaling (VAS) during speech labeling tasks. Here, we recorded event-related brain potentials (ERPs) to vowels along an acoustic-phonetic continuum (/u/ to /a/) while listeners categorized phonemes in both clean and noise conditions. Behavior was assessed using standard two alternative forced choice (2AFC) and VAS paradigms to evaluate categorization under task structures that promote discrete (2AFC) vs. continuous (VAS) hearing, respectively. Behaviorally, identification curves were steeper under 2AFC vs. VAS categorization but were relatively immune to noise, suggesting robust access to abstract, phonetic categories even under signal degradation. Behavioral slopes were positively correlated with listeners' QuickSIN scores, suggesting a behavioral advantage for speech in noise comprehension conferred by gradient listening strategy. At the neural level, electrode level data revealed P2 peak amplitudes of the ERPs were modulated by task and noise; responses were larger under VAS vs. 2AFC categorization and showed larger noise-related delay in latency in the VAS vs. 2AFC condition. More gradient responders also had smaller shifts in ERP latency with noise, suggesting their neural encoding of speech was more resilient to noise degradation. Interestingly, source-resolved ERPs showed that more gradient listening was also correlated with stronger neural responses in left superior temporal gyrus. Our results demonstrate that listening strategy (i.e., being a discrete vs. continuous listener) modulates the categorical organization of speech and behavioral success, with continuous/gradient listening being more advantageous to speech in noise perception.
Collapse
|
6
|
Bidelman GM, Bernard F, Skubic K. Hearing in categories aids speech streaming at the "cocktail party". BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.03.587795. [PMID: 38617284 PMCID: PMC11014555 DOI: 10.1101/2024.04.03.587795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Our perceptual system bins elements of the speech signal into categories to make speech perception manageable. Here, we aimed to test whether hearing speech in categories (as opposed to a continuous/gradient fashion) affords yet another benefit to speech recognition: parsing noisy speech at the "cocktail party." We measured speech recognition in a simulated 3D cocktail party environment. We manipulated task difficulty by varying the number of additional maskers presented at other spatial locations in the horizontal soundfield (1-4 talkers) and via forward vs. time-reversed maskers, promoting more and less informational masking (IM), respectively. In separate tasks, we measured isolated phoneme categorization using two-alternative forced choice (2AFC) and visual analog scaling (VAS) tasks designed to promote more/less categorical hearing and thus test putative links between categorization and real-world speech-in-noise skills. We first show that listeners can only monitor up to ~3 talkers despite up to 5 in the soundscape and streaming is not related to extended high-frequency hearing thresholds (though QuickSIN scores are). We then confirm speech streaming accuracy and speed decline with additional competing talkers and amidst forward compared to reverse maskers with added IM. Dividing listeners into "discrete" vs. "continuous" categorizers based on their VAS labeling (i.e., whether responses were binary or continuous judgments), we then show the degree of IM experienced at the cocktail party is predicted by their degree of categoricity in phoneme labeling; more discrete listeners are less susceptible to IM than their gradient responding peers. Our results establish a link between speech categorization skills and cocktail party processing, with a categorical (rather than gradient) listening strategy benefiting degraded speech perception. These findings imply figure-ground deficits common in many disorders might arise through a surprisingly simple mechanism: a failure to properly bin sounds into categories.
Collapse
Affiliation(s)
- Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA
- Program in Neuroscience, Indiana University, Bloomington, IN, USA
- Cognitive Science Program, Indiana University, Bloomington, IN, USA
| | - Fallon Bernard
- School of Communication Sciences & Disorders, University of Memphis, Memphis TN, USA
| | - Kimberly Skubic
- School of Communication Sciences & Disorders, University of Memphis, Memphis TN, USA
| |
Collapse
|
7
|
Carter JA, Bidelman GM. Perceptual warping exposes categorical representations for speech in human brainstem responses. Neuroimage 2023; 269:119899. [PMID: 36720437 PMCID: PMC9992300 DOI: 10.1016/j.neuroimage.2023.119899] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 01/17/2023] [Accepted: 01/22/2023] [Indexed: 01/30/2023] Open
Abstract
The brain transforms continuous acoustic events into discrete category representations to downsample the speech signal for our perceptual-cognitive systems. Such phonetic categories are highly malleable, and their percepts can change depending on surrounding stimulus context. Previous work suggests these acoustic-phonetic mapping and perceptual warping of speech emerge in the brain no earlier than auditory cortex. Here, we examined whether these auditory-category phenomena inherent to speech perception occur even earlier in the human brain, at the level of auditory brainstem. We recorded speech-evoked frequency following responses (FFRs) during a task designed to induce more/less warping of listeners' perceptual categories depending on stimulus presentation order of a speech continuum (random, forward, backward directions). We used a novel clustered stimulus paradigm to rapidly record the high trial counts needed for FFRs concurrent with active behavioral tasks. We found serial stimulus order caused perceptual shifts (hysteresis) near listeners' category boundary confirming identical speech tokens are perceived differentially depending on stimulus context. Critically, we further show neural FFRs during active (but not passive) listening are enhanced for prototypical vs. category-ambiguous tokens and are biased in the direction of listeners' phonetic label even for acoustically-identical speech stimuli. These findings were not observed in the stimulus acoustics nor model FFR responses generated via a computational model of cochlear and auditory nerve transduction, confirming a central origin to the effects. Our data reveal FFRs carry category-level information and suggest top-down processing actively shapes the neural encoding and categorization of speech at subcortical levels. These findings suggest the acoustic-phonetic mapping and perceptual warping in speech perception occur surprisingly early along the auditory neuroaxis, which might aid understanding by reducing ambiguity inherent to the speech signal.
Collapse
Affiliation(s)
- Jared A Carter
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, USA; Division of Clinical Neuroscience, School of Medicine, Hearing Sciences - Scottish Section, University of Nottingham, Glasgow, Scotland, UK
| | - Gavin M Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA; Program in Neuroscience, Indiana University, Bloomington, IN, USA.
| |
Collapse
|
8
|
Bidelman GM, Carter JA. Continuous dynamics in behavior reveal interactions between perceptual warping in categorization and speech-in-noise perception. Front Neurosci 2023; 17:1032369. [PMID: 36937676 PMCID: PMC10014819 DOI: 10.3389/fnins.2023.1032369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 02/14/2023] [Indexed: 03/05/2023] Open
Abstract
Introduction Spoken language comprehension requires listeners map continuous features of the speech signal to discrete category labels. Categories are however malleable to surrounding context and stimulus precedence; listeners' percept can dynamically shift depending on the sequencing of adjacent stimuli resulting in a warping of the heard phonetic category. Here, we investigated whether such perceptual warping-which amplify categorical hearing-might alter speech processing in noise-degraded listening scenarios. Methods We measured continuous dynamics in perception and category judgments of an acoustic-phonetic vowel gradient via mouse tracking. Tokens were presented in serial vs. random orders to induce more/less perceptual warping while listeners categorized continua in clean and noise conditions. Results Listeners' responses were faster and their mouse trajectories closer to the ultimate behavioral selection (marked visually on the screen) in serial vs. random order, suggesting increased perceptual attraction to category exemplars. Interestingly, order effects emerged earlier and persisted later in the trial time course when categorizing speech in noise. Discussion These data describe interactions between perceptual warping in categorization and speech-in-noise perception: warping strengthens the behavioral attraction to relevant speech categories, making listeners more decisive (though not necessarily more accurate) in their decisions of both clean and noise-degraded speech.
Collapse
Affiliation(s)
- Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, United States
- Program in Neuroscience, Indiana University, Bloomington, IN, United States
| | - Jared A. Carter
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
- Hearing Sciences – Scottish Section, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, Glasgow, United Kingdom
| |
Collapse
|
9
|
Carter JA, Buder EH, Bidelman GM. Nonlinear dynamics in auditory cortical activity reveal the neural basis of perceptual warping in speech categorization. JASA EXPRESS LETTERS 2022; 2:045201. [PMID: 35434716 PMCID: PMC8984957 DOI: 10.1121/10.0009896] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 03/03/2022] [Indexed: 06/14/2023]
Abstract
Surrounding context influences speech listening, resulting in dynamic shifts to category percepts. To examine its neural basis, event-related potentials (ERPs) were recorded during vowel identification with continua presented in random, forward, and backward orders to induce perceptual warping. Behaviorally, sequential order shifted individual listeners' categorical boundary, versus random delivery, revealing perceptual warping (biasing) of the heard phonetic category dependent on recent stimulus history. ERPs revealed later (∼300 ms) activity localized to superior temporal and middle/inferior frontal gyri that predicted listeners' hysteresis/enhanced contrast magnitudes. Findings demonstrate that interactions between frontotemporal brain regions govern top-down, stimulus history effects on speech categorization.
Collapse
Affiliation(s)
- Jared A Carter
- Institute for Intelligent Systems, University of Memphis, Memphis, Tennessee 38152, USA
| | - Eugene H Buder
- School of Communication Sciences and Disorders, University of Memphis, Memphis, Tennessee 38152, USA
| | - Gavin M Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, , Bloomington, Indiana 47408, USA , ,
| |
Collapse
|