1
|
Xie Z. Subcortical responses to continuous speech under bimodal divided attention. J Neurophysiol 2025; 133:1216-1221. [PMID: 40098452 DOI: 10.1152/jn.00039.2025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Revised: 02/13/2025] [Accepted: 03/14/2025] [Indexed: 03/19/2025] Open
Abstract
Everyday speech perception often occurs in multimodal environments, requiring listeners to divide attention across sensory modalities to prioritize relevant information. Although this division of attention modestly reduces cortical encoding of natural continuous speech, its impact on subcortical processing remains unclear. To investigate this, we used an audiovisual dual-task paradigm to manipulate bimodal divided attention. Participants completed a primary visual memory task (low or high cognitive load) while simultaneously performing a secondary task of listening to audiobook segments. Sixteen young adults with normal hearing completed these tasks while their EEG signals were recorded. In a third condition, participants performed only the listening task. Subcortical responses to the audiobook segments were analyzed using temporal response functions (TRFs), which predicted EEG responses from speech predictors derived from auditory nerve models. Across all conditions, TRFs displayed a prominent peak at ∼8 ms, resembling the wave V peak of auditory brainstem responses, indicating subcortical origins. No significant differences in latencies or amplitudes of this peak, nor in TRF prediction correlations, were observed between conditions. These findings provide no evidence that bimodal divided attention affects the subcortical processing of continuous speech, indicating that its effects may be restricted to cortical levels.NEW & NOTEWORTHY This study shows that auditory subcortical processing of natural continuous speech remains unaffected when attention is divided across auditory and visual modalities. These findings indicate that the influence of crossmodal attention on the processing of natural continuous speech may be restricted to cortical levels.
Collapse
Affiliation(s)
- Zilong Xie
- School of Communication Science and Disorders, Florida State University, Tallahassee, Florida, United States
| |
Collapse
|
2
|
Polonenko MJ, Maddox RK. The Effect of Speech Masking on the Human Subcortical Response to Continuous Speech. eNeuro 2025; 12:ENEURO.0561-24.2025. [PMID: 40127932 PMCID: PMC11974362 DOI: 10.1523/eneuro.0561-24.2025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 02/04/2025] [Accepted: 02/10/2025] [Indexed: 03/26/2025] Open
Abstract
Auditory masking-the interference of the encoding and processing of an acoustic stimulus imposed by one or more competing stimuli-is nearly omnipresent in daily life and presents a critical barrier to many listeners, including people with hearing loss, users of hearing aids and cochlear implants, and people with auditory processing disorders. The perceptual aspects of masking have been actively studied for several decades, and particular emphasis has been placed on masking of speech by other speech sounds. The neural effects of such masking, especially at the subcortical level, have been much less studied, in large part due to the technical limitations of making such measurements. Recent work has allowed estimation of the auditory brainstem response (ABR), whose characteristic waves are linked to specific subcortical areas, to naturalistic speech. In this study, we used those techniques to measure the encoding of speech stimuli that were masked by one or more simultaneous other speech stimuli. We presented listeners with simultaneous speech from one, two, three, or five simultaneous talkers, corresponding to a range of signal-to-noise ratios (clean, 0, -3, and -6 dB), and derived the ABR to each talker in the mixture. Each talker in a mixture was treated in turn as a target sound masked by other talkers, making the response quicker to acquire. We found consistently across listeners that ABR Wave V amplitudes decreased and latencies increased as the number of competing talkers increased.
Collapse
Affiliation(s)
- Melissa J Polonenko
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota, 55455
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, New York, 14627
| | - Ross K Maddox
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, New York, 14627
- Kresge Hearing Research Institute, Department of Otolaryngology - Head and Neck Surgery, University of Michigan, Ann Arbor, Michigan, 48109
| |
Collapse
|
3
|
Haupt T, Rosenkranz M, Bleichner MG. Exploring Relevant Features for EEG-Based Investigation of Sound Perception in Naturalistic Soundscapes. eNeuro 2025; 12:ENEURO.0287-24.2024. [PMID: 39753371 PMCID: PMC11747973 DOI: 10.1523/eneuro.0287-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 10/15/2024] [Accepted: 10/18/2024] [Indexed: 01/19/2025] Open
Abstract
A comprehensive analysis of everyday sound perception can be achieved using electroencephalography (EEG) with the concurrent acquisition of information about the environment. While extensive research has been dedicated to speech perception, the complexities of auditory perception within everyday environments, specifically the types of information and the key features to extract, remain less explored. Our study aims to systematically investigate the relevance of different feature categories: discrete sound-identity markers, general cognitive state information, and acoustic representations, including discrete sound onset, the envelope, and mel-spectrogram. Using continuous data analysis, we contrast different features in terms of their predictive power for unseen data and thus their distinct contributions to explaining neural data. For this, we analyze data from a complex audio-visual motor task using a naturalistic soundscape. The results demonstrated that the feature sets that explain the most neural variability were a combination of highly detailed acoustic features with a comprehensive description of specific sound onsets. Furthermore, it showed that established features can be applied to complex soundscapes. Crucially, the outcome hinged on excluding periods devoid of sound onsets in the analysis in the case of the discrete features. Our study highlights the importance to comprehensively describe the soundscape, using acoustic and non-acoustic aspects, to fully understand the dynamics of sound perception in complex situations. This approach can serve as a foundation for future studies aiming to investigate sound perception in natural settings.
Collapse
Affiliation(s)
- Thorge Haupt
- Neurophysiology of Everyday Life Group, Department of Psychology, Carl von Ossietzky Universität Oldenburg, Oldenburg 26129, Germany
| | - Marc Rosenkranz
- Neurophysiology of Everyday Life Group, Department of Psychology, Carl von Ossietzky Universität Oldenburg, Oldenburg 26129, Germany
| | - Martin G Bleichner
- Neurophysiology of Everyday Life Group, Department of Psychology, Carl von Ossietzky Universität Oldenburg, Oldenburg 26129, Germany
- Research Center for Neurosensory Science, Carl von Ossietzky Universität Oldenburg, Oldenburg 26129, Germany
| |
Collapse
|
4
|
Polonenko MJ, Maddox RK. The effect of speech masking on the human subcortical response to continuous speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.10.627771. [PMID: 39713441 PMCID: PMC11661217 DOI: 10.1101/2024.12.10.627771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Auditory masking-the interference of the encoding and processing of an acoustic stimulus imposed by one or more competing stimuli-is nearly omnipresent in daily life, and presents a critical barrier to many listeners, including people with hearing loss, users of hearing aids and cochlear implants, and people with auditory processing disorders. The perceptual aspects of masking have been actively studied for several decades, and particular emphasis has been placed on masking of speech by other speech sounds. The neural effects of such masking, especially at the subcortical level, have been much less studied, in large part due to the technical limitations of making such measurements. Recent work has allowed estimation of the auditory brainstem response (ABR), whose characteristic waves are linked to specific subcortical areas, to naturalistic speech. In this study, we used those techniques to measure the encoding of speech stimuli that were masked by one or more simultaneous other speech stimuli. We presented listeners with simultaneous speech from one, two, three, or five simultaneous talkers, corresponding to a range of signal-to-noise ratios ( SNR ; Clean, 0, -3, and -6 dB), and derived the ABR to each talker in the mixture. Each talker in a mixture was treated in turn as a target sound masked by other talkers, making the response quicker to acquire. We found consistently across listeners that ABR wave V amplitudes decreased and latencies increased as the number of competing talkers increased. Significance statement Trying to listen to someone speak in a noisy setting is a common challenge for most people, due to auditory masking. Masking has been studied extensively at the behavioral level, and more recently in the cortex using EEG and other neurophysiological methods. Much less is known, however, about how masking affects speech encoding in the subcortical auditory system. Here we presented listeners with mixtures of simultaneous speech streams ranging from one to five talkers. We used recently developed tools for measuring subcortical speech encoding to determine how the encoding of each speech stream was impacted by the masker speech. We show that the subcortical response to masked speech becomes smaller and increasingly delayed as the masking becomes more severe.
Collapse
Affiliation(s)
- Melissa J Polonenko
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, MN
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, NY
| | - Ross K Maddox
- Kresge Hearing Research Institute, Department of Otolaryngology - Head and Neck Surgery, University of Michigan, Ann Arbor, MI
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, NY
| |
Collapse
|
5
|
Polonenko MJ, Maddox RK. Fundamental frequency predominantly drives talker differences in auditory brainstem responses to continuous speech. JASA EXPRESS LETTERS 2024; 4:114401. [PMID: 39504231 PMCID: PMC11558516 DOI: 10.1121/10.0034329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 10/09/2024] [Indexed: 11/08/2024]
Abstract
Deriving human neural responses to natural speech is now possible, but the responses to male- and female-uttered speech have been shown to differ. These talker differences may complicate interpretations or restrict experimental designs geared toward more realistic communication scenarios. This study found that when a male talker and a female talker had the same fundamental frequency, auditory brainstem responses (ABRs) were very similar. Those responses became smaller and later with increasing fundamental frequency, as did click ABRs with increasing stimulus rates. Modeled responses suggested that the speech and click ABR differences were reasonably predicted by peripheral and brainstem processing of stimulus acoustics.
Collapse
Affiliation(s)
- Melissa J Polonenko
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, New York 14642, USA
| | - Ross K Maddox
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, New York 14642, USA
- Kresge Hearing Research Institute, Department of Otolaryngology Head and Neck Surgery, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
6
|
Kulasingham JP, Innes-Brown H, Enqvist M, Alickovic E. Level-Dependent Subcortical Electroencephalography Responses to Continuous Speech. eNeuro 2024; 11:ENEURO.0135-24.2024. [PMID: 39142822 DOI: 10.1523/eneuro.0135-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 07/02/2024] [Accepted: 07/26/2024] [Indexed: 08/16/2024] Open
Abstract
The auditory brainstem response (ABR) is a measure of subcortical activity in response to auditory stimuli. The wave V peak of the ABR depends on the stimulus intensity level, and has been widely used for clinical hearing assessment. Conventional methods estimate the ABR average electroencephalography (EEG) responses to short unnatural stimuli such as clicks. Recent work has moved toward more ecologically relevant continuous speech stimuli using linear deconvolution models called temporal response functions (TRFs). Investigating whether the TRF waveform changes with stimulus intensity is a crucial step toward the use of natural speech stimuli for hearing assessments involving subcortical responses. Here, we develop methods to estimate level-dependent subcortical TRFs using EEG data collected from 21 participants listening to continuous speech presented at 4 different intensity levels. We find that level-dependent changes can be detected in the wave V peak of the subcortical TRF for almost all participants, and are consistent with level-dependent changes in click-ABR wave V. We also investigate the most suitable peripheral auditory model to generate predictors for level-dependent subcortical TRFs and find that simple gammatone filterbanks perform the best. Additionally, around 6 min of data may be sufficient for detecting level-dependent effects and wave V peaks above the noise floor for speech segments with higher intensity. Finally, we show a proof-of-concept that level-dependent subcortical TRFs can be detected even for the inherent intensity fluctuations in natural continuous speech.
Collapse
Affiliation(s)
- Joshua P Kulasingham
- Automatic Control, Department of Electrical Engineering, Linköping University, 581 83 Linköping, Sweden
| | - Hamish Innes-Brown
- Eriksholm Research Centre, DK-3070 Snekkersten, Denmark
- Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Martin Enqvist
- Automatic Control, Department of Electrical Engineering, Linköping University, 581 83 Linköping, Sweden
| | - Emina Alickovic
- Automatic Control, Department of Electrical Engineering, Linköping University, 581 83 Linköping, Sweden
- Eriksholm Research Centre, DK-3070 Snekkersten, Denmark
| |
Collapse
|
7
|
Bolt E, Giroud N. Auditory Encoding of Natural Speech at Subcortical and Cortical Levels Is Not Indicative of Cognitive Decline. eNeuro 2024; 11:ENEURO.0545-23.2024. [PMID: 38658138 PMCID: PMC11082929 DOI: 10.1523/eneuro.0545-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 03/27/2024] [Accepted: 03/29/2024] [Indexed: 04/26/2024] Open
Abstract
More and more patients worldwide are diagnosed with dementia, which emphasizes the urgent need for early detection markers. In this study, we built on the auditory hypersensitivity theory of a previous study-which postulated that responses to auditory input in the subcortex as well as cortex are enhanced in cognitive decline-and examined auditory encoding of natural continuous speech at both neural levels for its indicative potential for cognitive decline. We recruited study participants aged 60 years and older, who were divided into two groups based on the Montreal Cognitive Assessment, one group with low scores (n = 19, participants with signs of cognitive decline) and a control group (n = 25). Participants completed an audiometric assessment and then we recorded their electroencephalography while they listened to an audiobook and click sounds. We derived temporal response functions and evoked potentials from the data and examined response amplitudes for their potential to predict cognitive decline, controlling for hearing ability and age. Contrary to our expectations, no evidence of auditory hypersensitivity was observed in participants with signs of cognitive decline; response amplitudes were comparable in both cognitive groups. Moreover, the combination of response amplitudes showed no predictive value for cognitive decline. These results challenge the proposed hypothesis and emphasize the need for further research to identify reliable auditory markers for the early detection of cognitive decline.
Collapse
Affiliation(s)
- Elena Bolt
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich 8050, Switzerland
- International Max Planck Research School on the Life Course (IMPRS LIFE), University of Zurich, Zurich 8050, Switzerland
| | - Nathalie Giroud
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich 8050, Switzerland
- International Max Planck Research School on the Life Course (IMPRS LIFE), University of Zurich, Zurich 8050, Switzerland
- Language & Medicine Centre Zurich, Competence Centre of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich 8050, Switzerland
| |
Collapse
|
8
|
Kulasingham JP, Bachmann FL, Eskelund K, Enqvist M, Innes-Brown H, Alickovic E. Predictors for estimating subcortical EEG responses to continuous speech. PLoS One 2024; 19:e0297826. [PMID: 38330068 PMCID: PMC10852227 DOI: 10.1371/journal.pone.0297826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 01/12/2024] [Indexed: 02/10/2024] Open
Abstract
Perception of sounds and speech involves structures in the auditory brainstem that rapidly process ongoing auditory stimuli. The role of these structures in speech processing can be investigated by measuring their electrical activity using scalp-mounted electrodes. However, typical analysis methods involve averaging neural responses to many short repetitive stimuli that bear little relevance to daily listening environments. Recently, subcortical responses to more ecologically relevant continuous speech were detected using linear encoding models. These methods estimate the temporal response function (TRF), which is a regression model that minimises the error between the measured neural signal and a predictor derived from the stimulus. Using predictors that model the highly non-linear peripheral auditory system may improve linear TRF estimation accuracy and peak detection. Here, we compare predictors from both simple and complex peripheral auditory models for estimating brainstem TRFs on electroencephalography (EEG) data from 24 participants listening to continuous speech. We also investigate the data length required for estimating subcortical TRFs, and find that around 12 minutes of data is sufficient for clear wave V peaks (>3 dB SNR) to be seen in nearly all participants. Interestingly, predictors derived from simple filterbank-based models of the peripheral auditory system yield TRF wave V peak SNRs that are not significantly different from those estimated using a complex model of the auditory nerve, provided that the nonlinear effects of adaptation in the auditory system are appropriately modelled. Crucially, computing predictors from these simpler models is more than 50 times faster compared to the complex model. This work paves the way for efficient modelling and detection of subcortical processing of continuous speech, which may lead to improved diagnosis metrics for hearing impairment and assistive hearing technology.
Collapse
Affiliation(s)
- Joshua P. Kulasingham
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | | | | | - Martin Enqvist
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Hamish Innes-Brown
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Emina Alickovic
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
- Eriksholm Research Centre, Snekkersten, Denmark
| |
Collapse
|