1
|
Watson S, Dau T, Hjortkjær J. Task-dependent Modulation Masking of 4 Hz Envelope Following Responses. J Cogn Neurosci 2025; 37:1189-1201. [PMID: 39918897 DOI: 10.1162/jocn_a_02309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2025]
Abstract
The perception and recognition of natural sounds, like speech, rely on the processing of slow amplitude modulations. Perception can be hindered by interfering modulations at similar rates, a phenomenon known as modulation masking. Cortical envelope following responses (EFRs) are highly sensitive to these slow modulations, but it is unclear how modulation masking impacts these cortical envelope responses. To dissociate stimulus-driven and attention-driven effects, we recorded EEG responses to 4 Hz modulated noise in a two-way factorial design, varying the level of modulation masking and intermodal attention. Auditory stimuli contained one of three random masking bands in the stimulus envelope, at various proximities in modulation frequency to the 4 Hz target, or an unmasked reference condition. During EEG recordings, the same stimuli were presented while participants performed either an auditory or a visual change detection task. Attention to the auditory modality resulted in a general enhancement of sustained EFR responses to the 4 Hz target. In the visual task condition only, EFR 4 Hz power systematically decreased with increasing modulation masking, consistent with psychophysical masking patterns. However, during the auditory task, the 4 Hz EFRs were unaffected by masking and remained strong even with the highest degrees of masking. Rather than indicating a general bottom-up modulation selective process, these results indicate that the masking of cortical envelope responses interacts with attention. We propose that auditory attention allows robust tracking of masked envelopes, possibly through a form of glimpsing of the target, whereas envelope responses to task-irrelevant auditory stimuli reflect stimulus salience.
Collapse
|
2
|
Wang L, Chen F. EEG responses to onset-edge and steady-state segments of continuous speech under selective auditory attention modulation. Hear Res 2025; 463:109298. [PMID: 40344751 DOI: 10.1016/j.heares.2025.109298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 05/01/2025] [Accepted: 05/02/2025] [Indexed: 05/11/2025]
Abstract
Electroencephalography (EEG) signals provide valuable insights into the neural mechanisms of speech perception. However, it still remains unclear how neural responses dynamically align with continuous speech under selective attention modulation in the complex auditory environments. This study examined the evoked and induced EEG responses, their correlations with speech features, and cortical distributions for the target and ignored speech in two-talker competing scenarios. Results showed that selective attention increased the evoked EEG power for the target speech compared to the ignored speech. In contrast, the induced power indicated no significant differences. Low-frequency EEG power and phase responses reliably indexed the target speech identification amid competing streams. Cortical distribution analyses revealed that the evoked power differences between the target and ignored speech were concentrated in the central and parietal cortices. Significant induced power differences between the target and ignored speech presented only at the onset-edge segments in the left temporal cortex. Comparisons between onset-edge and steady-state segments showed the evoked power differences in the right central and temporal cortices and the induced power differences in the frontal cortex for the ignored speech. No significant differences of the cortical distribution were observed between the onset-edge and steady-state segments for the target speech. These findings underscore the distinct contributions of the evoked and induced neural activities and their cortical distributions to selective auditory attention and segment-specific speech perception.
Collapse
Affiliation(s)
- Lei Wang
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou, China.
| | - Fei Chen
- Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China.
| |
Collapse
|
3
|
Polonenko MJ, Maddox RK. The Effect of Speech Masking on the Human Subcortical Response to Continuous Speech. eNeuro 2025; 12:ENEURO.0561-24.2025. [PMID: 40127932 PMCID: PMC11974362 DOI: 10.1523/eneuro.0561-24.2025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 02/04/2025] [Accepted: 02/10/2025] [Indexed: 03/26/2025] Open
Abstract
Auditory masking-the interference of the encoding and processing of an acoustic stimulus imposed by one or more competing stimuli-is nearly omnipresent in daily life and presents a critical barrier to many listeners, including people with hearing loss, users of hearing aids and cochlear implants, and people with auditory processing disorders. The perceptual aspects of masking have been actively studied for several decades, and particular emphasis has been placed on masking of speech by other speech sounds. The neural effects of such masking, especially at the subcortical level, have been much less studied, in large part due to the technical limitations of making such measurements. Recent work has allowed estimation of the auditory brainstem response (ABR), whose characteristic waves are linked to specific subcortical areas, to naturalistic speech. In this study, we used those techniques to measure the encoding of speech stimuli that were masked by one or more simultaneous other speech stimuli. We presented listeners with simultaneous speech from one, two, three, or five simultaneous talkers, corresponding to a range of signal-to-noise ratios (clean, 0, -3, and -6 dB), and derived the ABR to each talker in the mixture. Each talker in a mixture was treated in turn as a target sound masked by other talkers, making the response quicker to acquire. We found consistently across listeners that ABR Wave V amplitudes decreased and latencies increased as the number of competing talkers increased.
Collapse
Affiliation(s)
- Melissa J Polonenko
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota, 55455
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, New York, 14627
| | - Ross K Maddox
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, New York, 14627
- Kresge Hearing Research Institute, Department of Otolaryngology - Head and Neck Surgery, University of Michigan, Ann Arbor, Michigan, 48109
| |
Collapse
|
4
|
Herrmann B. Enhanced neural speech tracking through noise indicates stochastic resonance in humans. eLife 2025; 13:RP100830. [PMID: 40100253 PMCID: PMC11919254 DOI: 10.7554/elife.100830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2025] Open
Abstract
Neural activity in auditory cortex tracks the amplitude-onset envelope of continuous speech, but recent work counterintuitively suggests that neural tracking increases when speech is masked by background noise, despite reduced speech intelligibility. Noise-related amplification could indicate that stochastic resonance - the response facilitation through noise - supports neural speech tracking, but a comprehensive account is lacking. In five human electroencephalography experiments, the current study demonstrates a generalized enhancement of neural speech tracking due to minimal background noise. Results show that (1) neural speech tracking is enhanced for speech masked by background noise at very high signal-to-noise ratios (~30 dB SNR) where speech is highly intelligible; (2) this enhancement is independent of attention; (3) it generalizes across different stationary background maskers, but is strongest for 12-talker babble; and (4) it is present for headphone and free-field listening, suggesting that the neural-tracking enhancement generalizes to real-life listening. The work paints a clear picture that minimal background noise enhances the neural representation of the speech onset-envelope, suggesting that stochastic resonance contributes to neural speech tracking. The work further highlights non-linearities of neural tracking induced by background noise that make its use as a biological marker for speech processing challenging.
Collapse
Affiliation(s)
- Björn Herrmann
- Rotman Research Institute, Baycrest Academy for Research and EducationTorontoCanada
- Department of Psychology, University of TorontoTorontoCanada
| |
Collapse
|
5
|
Borges H, Zaar J, Alickovic E, Christensen C, Kidmose P. Speech Reception Threshold Estimation via EEG-Based Continuous Speech Envelope Reconstruction. Eur J Neurosci 2025; 61:e70083. [PMID: 40145625 PMCID: PMC11948451 DOI: 10.1111/ejn.70083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Accepted: 03/11/2025] [Indexed: 03/28/2025]
Abstract
This study investigates the potential of speech reception threshold (SRT) estimation through electroencephalography (EEG) based envelope reconstruction techniques with continuous speech. Additionally, we investigate the influence of the stimuli's signal-to-noise ratio (SNR) on the temporal response function (TRF). Twenty young normal-hearing participants listened to audiobook excerpts with varying background noise levels while EEG was recorded. A linear decoder was trained to reconstruct the speech envelope from the EEG data. The reconstruction accuracy was calculated as the Pearson's correlation between the reconstructed and actual speech envelopes. An EEG SRT estimate (SRTneuro) was obtained as the midpoint of a sigmoid function fitted to the reconstruction accuracy versus SNR data points. Additionally, the TRF was estimated at each SNR level, followed by a statistical analysis to reveal significant effects of SNR levels on the latencies and amplitudes of the most prominent components. The SRTneuro was within 3 dB of the behavioral SRT for all participants. The TRF analysis showed a significant latency decrease for N1 and P2 and a significant amplitude magnitude increase for N1 and P2 with increasing SNR. The results suggest that both envelope reconstruction accuracy and the TRF components are influenced by changes in SNR, indicating they may be linked to the same underlying neural process.
Collapse
Affiliation(s)
- Heidi B. Borges
- Eriksholm Research CentreSnekkerstenDenmark
- Department of Electrical and Computer EngineeringAarhus UniversityAarhusDenmark
| | - Johannes Zaar
- Eriksholm Research CentreSnekkerstenDenmark
- Hearing Systems, Department of Health TechnologyTechnical University of DenmarkKongens LyngbyDenmark
| | - Emina Alickovic
- Eriksholm Research CentreSnekkerstenDenmark
- Department of Electrical EngineeringLinköping UniversityLinköpingSweden
| | | | - Preben Kidmose
- Department of Electrical and Computer EngineeringAarhus UniversityAarhusDenmark
| |
Collapse
|
6
|
Christensen RK, Studer F, Barkat TR. Background white noise increases neuronal activity by reducing membrane fluctuations and slow-wave oscillations in auditory cortex. Prog Neurobiol 2025; 246:102720. [PMID: 39863149 DOI: 10.1016/j.pneurobio.2025.102720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 01/20/2025] [Accepted: 01/21/2025] [Indexed: 01/27/2025]
Abstract
The brain faces the challenging task of preserving a consistent portrayal of the external world in the face of disruptive sensory inputs. What alterations occur in sensory representation amidst noise, and how does brain activity adapt to it? Although it has previously been shown that background white noise (WN) decreases responses to salient sounds, a mechanistic understanding of the brain processes responsible for such changes is lacking. We investigated the effect of background WN on neuronal spiking activity, membrane potential, and network oscillations in the mouse central auditory system. We found that, in addition to increasing background spiking activity in the auditory cortex and thalamus, background WN decreases neural activity fluctuations, as reflected in the membrane potential of single neurons and the local field potential. Blocking acetylcholine signaling in the auditory cortex eliminated the WN-dependent increase in background activity as well as the shift in slow-wave oscillations. Together, our observations show that background WN is not filtered away along the auditory pathway, but rather drives sustained changes in cortical activity that can be reverted by blocking cholinergic inputs.
Collapse
Affiliation(s)
| | - Florian Studer
- Department of Biomedicine, University of Basel, Klingelbergstrasse 50, Basel 4056, Switzerland
| | - Tania Rinaldi Barkat
- Department of Biomedicine, University of Basel, Klingelbergstrasse 50, Basel 4056, Switzerland.
| |
Collapse
|
7
|
Federici A, Fantoni M, Pavani F, Handjaras G, Bednaya E, Martinelli A, Berto M, Trabalzini F, Ricciardi E, Nava E, Orzan E, Bianchi B, Bottari D. Resilience and vulnerability of neural speech tracking after hearing restoration. Commun Biol 2025; 8:343. [PMID: 40025189 PMCID: PMC11873316 DOI: 10.1038/s42003-025-07788-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Accepted: 02/19/2025] [Indexed: 03/04/2025] Open
Abstract
The role of early auditory experience in the development of neural speech tracking remains an open question. To address this issue, we measured neural speech tracking in children with or without functional hearing during their first year of life after their hearing was restored with cochlear implants (CIs), as well as in hearing controls (HC). Neural tracking in children with CIs is unaffected by the absence of perinatal auditory experience. CI users and HC exhibit a similar neural tracking magnitude at short timescales of brain activity. However, neural tracking is delayed in CI users, and its timing depends on the age of hearing restoration. Conversely, at longer timescales, speech tracking is dampened in participants using CIs, thereby accounting for their speech comprehension deficits. These findings highlight the resilience of sensory processing in speech tracking while also demonstrating the vulnerability of higher-level processing to the lack of early auditory experience.
Collapse
Affiliation(s)
| | - Marta Fantoni
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Francesco Pavani
- Centro Interdipartimentale Mente/Cervello-CIMEC, University of Trento, Trento, Italy
- Centro Interuniversitario di Ricerca "Cognizione Linguaggio e Sordità"-CIRCLeS; University of Trento, Trento, Italy
| | | | - Evgenia Bednaya
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Alice Martinelli
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
- IRCCS Fondazione Stella Maris, Pisa, Italy
| | - Martina Berto
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Franco Trabalzini
- IRCCS Meyer, Azienda Ospedaliero-Universitaria Meyer, Firenze, Italy
| | | | - Elena Nava
- University of Milano-Bicocca, Milano, Italy
| | - Eva Orzan
- IRCCS Materno Infantile Burlo Garofolo, Trieste, Italy
| | - Benedetta Bianchi
- IRCCS Meyer, Azienda Ospedaliero-Universitaria Meyer, Firenze, Italy
| | - Davide Bottari
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy.
| |
Collapse
|
8
|
Reisinger P, Gillis M, Suess N, Vanthornhout J, Haider CL, Hartmann T, Hauswald A, Schwarz K, Francart T, Weisz N. Neural Speech Tracking Contribution of Lip Movements Predicts Behavioral Deterioration When the Speaker's Mouth Is Occluded. eNeuro 2025; 12:ENEURO.0368-24.2024. [PMID: 39819839 PMCID: PMC11801124 DOI: 10.1523/eneuro.0368-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Accepted: 12/09/2024] [Indexed: 01/19/2025] Open
Abstract
Observing lip movements of a speaker facilitates speech understanding, especially in challenging listening situations. Converging evidence from neuroscientific studies shows stronger neural responses to audiovisual stimuli compared with audio-only stimuli. However, the interindividual variability of this contribution of lip movement information and its consequences on behavior are unknown. We analyzed source-localized magnetoencephalographic responses from 29 normal-hearing participants (12 females) listening to audiovisual speech, both with and without the speaker wearing a surgical face mask, and in the presence or absence of a distractor speaker. Using temporal response functions to quantify neural speech tracking, we show that neural responses to lip movements are, in general, enhanced when speech is challenging. After controlling for speech acoustics, we show that lip movements contribute to enhanced neural speech tracking, particularly when a distractor speaker is present. However, the extent of this visual contribution to neural speech tracking varied greatly among participants. Probing the behavioral relevance, we demonstrate that individuals who show a higher contribution of lip movements in terms of neural speech tracking show a stronger drop in comprehension and an increase in perceived difficulty when the mouth is occluded by a surgical face mask. In contrast, no effect was found when the mouth was not occluded. We provide novel insights on how the contribution of lip movements in terms of neural speech tracking varies among individuals and its behavioral relevance, revealing negative consequences when visual speech is absent. Our results also offer potential implications for objective assessments of audiovisual speech perception.
Collapse
Affiliation(s)
- Patrick Reisinger
- Department of Psychology, Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Salzburg 5020, Austria
| | - Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven 3000, Belgium
| | - Nina Suess
- Department of Psychology, Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Salzburg 5020, Austria
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven 3000, Belgium
| | - Chandra Leon Haider
- Department of Psychology, Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Salzburg 5020, Austria
| | - Thomas Hartmann
- Department of Psychology, Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Salzburg 5020, Austria
| | - Anne Hauswald
- Department of Psychology, Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Salzburg 5020, Austria
| | | | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven 3000, Belgium
| | - Nathan Weisz
- Department of Psychology, Centre for Cognitive Neuroscience, Paris-Lodron-University of Salzburg, Salzburg 5020, Austria
- Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University Salzburg, Salzburg 5020, Austria
| |
Collapse
|
9
|
Levy O, Korisky A, Zvilichovsky Y, Zion Golumbic E. The Neurophysiological Costs of Learning in a Noisy Classroom: An Ecological Virtual Reality Study. J Cogn Neurosci 2025; 37:300-316. [PMID: 39348110 DOI: 10.1162/jocn_a_02249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Many real-life situations can be extremely noisy, which makes it difficult to understand what people say. Here, we introduce a novel audiovisual virtual reality experimental platform to study the behavioral and neurophysiological consequences of background noise on processing continuous speech in highly realistic environments. We focus on a context where the ability to understand speech is particularly important: the classroom. Participants (n = 32) experienced sitting in a virtual reality classroom and were told to pay attention to a virtual teacher giving a lecture. Trials were either quiet or contained background construction noise, emitted from outside the classroom window. Two realistic types of noise were used: continuous drilling and intermittent air hammers. Alongside behavioral outcomes, we measured several neurophysiological metrics, including neural activity (EEG), eye-gaze and skin conductance (galvanic skin response). Our results confirm the detrimental effect of background noise. Construction noise, and particularly intermittent noise, was associated with reduced behavioral performance, reduced neural tracking of the teacher's speech and an increase in skin conductance, although it did not have a significant effect on alpha-band oscillations or eye-gaze patterns. These results demonstrate the neurophysiological costs of learning in noisy environments and emphasize the role of temporal dynamics in speech-in-noise perception. The finding that intermittent noise was more disruptive than continuous noise supports a "habituation" rather than "glimpsing" hypothesis of speech-in-noise processing. These results also underscore the importance of increasing the ecologically relevance of neuroscientific research and considering acoustic, temporal, and semantic features of realistic stimuli as well as the cognitive demands of real-life environments.
Collapse
|
10
|
Schüller A, Mücke A, Riegel J, Reichenbach T. The Impact of Selective Attention and Musical Training on the Cortical Speech Tracking in the Delta and Theta Frequency Bands. J Cogn Neurosci 2025; 37:464-481. [PMID: 39509103 DOI: 10.1162/jocn_a_02275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2024]
Abstract
Oral communication regularly takes place amidst background noise, requiring the ability to selectively attend to a target speech stream. Musical training has been shown to be beneficial for this task. Regarding the underlying neural mechanisms, recent studies showed that the speech envelope is tracked by neural activity in auditory cortex, which plays a role in the neural processing of speech, including speech in noise. The neural tracking occurs predominantly in two frequency bands, the delta and the theta bands. However, much regarding the specifics of these neural responses, as well as their modulation through musical training, still remain unclear. Here, we investigated the delta- and theta-band cortical tracking of the speech envelope of target and distractor speech using magnetoencephalography (MEG) recordings. We thereby assessed both musicians and nonmusicians to explore potential differences between these groups. The cortical speech tracking was quantified through source-reconstructing the MEG data and subsequently relating the speech envelope in a certain frequency band to the MEG data using linear models. We thereby found the theta-band tracking to be dominated by early responses with comparable magnitudes for target and distractor speech, whereas the delta band tracking exhibited both earlier and later responses that were modulated by selective attention. Almost no significant differences emerged in the neural responses between musicians and nonmusicians. Our findings show that only the speech tracking in the delta but not in the theta band contributes to selective attention, but that this mechanism is essentially unaffected by musical training.
Collapse
Affiliation(s)
| | - Annika Mücke
- Friedrich-Alexander-Universität Erlangen-Nürnberg
| | | | | |
Collapse
|
11
|
Cooper JK, Vanthornhout J, van Wieringen A, Francart T. Objectively Measuring Audiovisual Effects in Noise Using Virtual Human Speakers. Trends Hear 2025; 29:23312165251333528. [PMID: 40221967 PMCID: PMC12033406 DOI: 10.1177/23312165251333528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 03/17/2025] [Accepted: 03/24/2025] [Indexed: 04/15/2025] Open
Abstract
Speech intelligibility in challenging listening environments relies on the integration of audiovisual cues. Measuring the effectiveness of audiovisual integration in these challenging listening environments can be difficult due to the complexity of such environments. The Audiovisual True-to-Life Assessment of Auditory Rehabilitation (AVATAR) is a paradigm that was developed to provide an ecological environment to capture both the audio and visual aspects of speech intelligibility measures. Previous research has shown the benefit from audiovisual cues can be measured using behavioral (e.g., word recognition) and electrophysiological (e.g., neural tracking) measures. The current research examines, when using the AVATAR paradigm, if electrophysiological measures of speech intelligibility yield similar outcomes as behavioral measures. We hypothesized visual cues would enhance both the behavioral and electrophysiological scores as the signal-to-noise ratio (SNR) of the speech signal decreased. Twenty young (18-25 years old) participants (1 male and 19 female) with normal hearing participated in our study. For our behavioral experiment, we administered lists of sentences using an adaptive procedure to estimate a speech reception threshold (SRT). For our electrophysiological experiment, we administered 35 lists of sentences randomized across five SNR levels (silence, 0, -3, -6, and -9 dB) and two visual conditions (audio-only and audiovisual). We used a neural tracking decoder to measure the reconstruction accuracies for each participant. We observed most participants had higher reconstruction accuracies for the audiovisual condition compared to the audio-only condition in conditions with moderate to high levels of noise. We found the electrophysiological measure may correlate with the behavioral measure that shows audiovisual benefit.
Collapse
Affiliation(s)
| | | | | | - Tom Francart
- ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium
| |
Collapse
|
12
|
Dieudonné B, Decruy L, Vanthornhout J. Neural tracking of the speech envelope predicts binaural unmasking. Eur J Neurosci 2025; 61:e16638. [PMID: 39653384 DOI: 10.1111/ejn.16638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 10/31/2024] [Accepted: 11/26/2024] [Indexed: 12/24/2024]
Abstract
Binaural unmasking is a remarkable phenomenon that it is substantially easier to detect a signal in noise when the interaural parameters of the signal are different from those of the noise - a useful mechanism in so-called cocktail party scenarios. In this study, we investigated the effect of binaural unmasking on neural tracking of the speech envelope. We measured EEG in 8 participants who listened to speech in noise at a fixed signal-to-noise ratio, in two conditions: one where speech and noise had the same interaural phase difference (both speech and noise having an opposite waveform across ears, SπNπ), and one where the interaural phase difference of the speech was different from that of the noise (only the speech having an opposite waveform across ears, SπN). We measured a clear benefit of binaural unmasking in behavioural speech understanding scores, accompanied by increased neural tracking of the speech envelope. Moreover, analysing the temporal response functions revealed that binaural unmasking also resulted in decreased peak latencies and increased peak amplitudes. Our results are consistent with previous research using auditory evoked potentials and steady-state responses to quantify binaural unmasking at cortical levels. Moreover, they confirm that neural tracking of speech is associated with speech understanding, even if the acoustic signal-to-noise ratio is kept constant. From a clinical perspective, these results offer the potential for the objective evaluation of binaural speech understanding mechanisms, and the objective detection of pathologies sensitive to binaural processing, such as asymmetric hearing loss, auditory neuropathy and age-related deficits.
Collapse
Affiliation(s)
- Benjamin Dieudonné
- Experimental Otorhinolaryngology, Department of Neurosciences, KU Leuven - University of Leuven, Leuven, Belgium
| | - Lien Decruy
- Experimental Otorhinolaryngology, Department of Neurosciences, KU Leuven - University of Leuven, Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Otorhinolaryngology, Department of Neurosciences, KU Leuven - University of Leuven, Leuven, Belgium
| |
Collapse
|
13
|
Polonenko MJ, Maddox RK. The effect of speech masking on the human subcortical response to continuous speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.10.627771. [PMID: 39713441 PMCID: PMC11661217 DOI: 10.1101/2024.12.10.627771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Auditory masking-the interference of the encoding and processing of an acoustic stimulus imposed by one or more competing stimuli-is nearly omnipresent in daily life, and presents a critical barrier to many listeners, including people with hearing loss, users of hearing aids and cochlear implants, and people with auditory processing disorders. The perceptual aspects of masking have been actively studied for several decades, and particular emphasis has been placed on masking of speech by other speech sounds. The neural effects of such masking, especially at the subcortical level, have been much less studied, in large part due to the technical limitations of making such measurements. Recent work has allowed estimation of the auditory brainstem response (ABR), whose characteristic waves are linked to specific subcortical areas, to naturalistic speech. In this study, we used those techniques to measure the encoding of speech stimuli that were masked by one or more simultaneous other speech stimuli. We presented listeners with simultaneous speech from one, two, three, or five simultaneous talkers, corresponding to a range of signal-to-noise ratios ( SNR ; Clean, 0, -3, and -6 dB), and derived the ABR to each talker in the mixture. Each talker in a mixture was treated in turn as a target sound masked by other talkers, making the response quicker to acquire. We found consistently across listeners that ABR wave V amplitudes decreased and latencies increased as the number of competing talkers increased. Significance statement Trying to listen to someone speak in a noisy setting is a common challenge for most people, due to auditory masking. Masking has been studied extensively at the behavioral level, and more recently in the cortex using EEG and other neurophysiological methods. Much less is known, however, about how masking affects speech encoding in the subcortical auditory system. Here we presented listeners with mixtures of simultaneous speech streams ranging from one to five talkers. We used recently developed tools for measuring subcortical speech encoding to determine how the encoding of each speech stream was impacted by the masker speech. We show that the subcortical response to masked speech becomes smaller and increasingly delayed as the masking becomes more severe.
Collapse
Affiliation(s)
- Melissa J Polonenko
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, MN
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, NY
| | - Ross K Maddox
- Kresge Hearing Research Institute, Department of Otolaryngology - Head and Neck Surgery, University of Michigan, Ann Arbor, MI
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, NY
| |
Collapse
|
14
|
Cai S, Li P, Li H. A Bio-Inspired Spiking Attentional Neural Network for Attentional Selection in the Listening Brain. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17387-17397. [PMID: 37585329 DOI: 10.1109/tnnls.2023.3303308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/18/2023]
Abstract
Humans show a remarkable ability in solving the cocktail party problem. Decoding auditory attention from the brain signals is a major step toward the development of bionic ears emulating human capabilities. Electroencephalography (EEG)-based auditory attention detection (AAD) has attracted considerable interest recently. Despite much progress, the performance of traditional AAD decoders remains to be improved, especially in low-latency settings. State-of-the-art AAD decoders based on deep neural networks generally lack the intrinsic temporal coding ability in biological networks. In this study, we first propose a bio-inspired spiking attentional neural network, denoted as BSAnet, for decoding auditory attention. BSAnet is capable of exploiting the temporal dynamics of EEG signals using biologically plausible neurons and an attentional mechanism. Experiments on two publicly available datasets confirm the superior performance of BSAnet over other state-of-the-art systems across various evaluation conditions. Moreover, BSAnet imitates realistic brain-like information processing, through which we show the advantage of brain-inspired computational models.
Collapse
|
15
|
Hicks JM, McDermott JH. Noise schemas aid hearing in noise. Proc Natl Acad Sci U S A 2024; 121:e2408995121. [PMID: 39546566 PMCID: PMC11588100 DOI: 10.1073/pnas.2408995121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Accepted: 10/14/2024] [Indexed: 11/17/2024] Open
Abstract
Human hearing is robust to noise, but the basis of this robustness is poorly understood. Several lines of evidence are consistent with the idea that the auditory system adapts to sound components that are stable over time, potentially achieving noise robustness by suppressing noise-like signals. Yet background noise often provides behaviorally relevant information about the environment and thus seems unlikely to be completely discarded by the auditory system. Motivated by this observation, we explored whether noise robustness might instead be mediated by internal models of noise structure that could facilitate the separation of background noise from other sounds. We found that detection, recognition, and localization in real-world background noise were better for foreground sounds positioned later in a noise excerpt, with performance improving over the initial second of exposure to a noise. These results are consistent with both adaptation-based and model-based accounts (adaptation increases over time and online noise estimation should benefit from acquiring more samples). However, performance was also robust to interruptions in the background noise and was enhanced for intermittently recurring backgrounds, neither of which would be expected from known forms of adaptation. Additionally, the performance benefit observed for foreground sounds occurring later within a noise excerpt was reduced for recurring noises, suggesting that a noise representation is built up during exposure to a new background noise and then maintained in memory. These findings suggest that noise robustness is supported by internal models-"noise schemas"-that are rapidly estimated, stored over time, and used to estimate other concurrent sounds.
Collapse
Affiliation(s)
- Jarrod M. Hicks
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA02139
- Center for Brains Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Josh H. McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA02139
- Center for Brains Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA02139
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Boston, MA02115
| |
Collapse
|
16
|
Mehraram R, De Clercq P, Kries J, Vandermosten M, Francart T. Functional connectivity of stimulus-evoked brain responses to natural speech in post-stroke aphasia. J Neural Eng 2024; 21:066010. [PMID: 39500050 DOI: 10.1088/1741-2552/ad8ef9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 11/05/2024] [Indexed: 11/15/2024]
Abstract
Objective. One out of three stroke-patients develop language processing impairment known as aphasia. The need for ecological validity of the existing diagnostic tools motivates research on biomarkers, such as stimulus-evoked brain responses. With the aim of enhancing the physiological interpretation of the latter, we used EEG to investigate how functional brain network patterns associated with the neural response to natural speech are affected in persons with post-stroke chronic aphasia.Approach. EEG was recorded from 24 healthy controls and 40 persons with aphasia while they listened to a story. Stimulus-evoked brain responses at all scalp regions were measured as neural envelope tracking in the delta (0.5-4 Hz), theta (4-8 Hz) and low-gamma bands (30-49 Hz) using mutual information. Functional connectivity between neural-tracking signals was measured, and the Network-Based Statistics toolbox was used to: (1) assess the added value of the neural tracking vs EEG time series, (2) test between-group differences and (3) investigate any association with language performance in aphasia. Graph theory was also used to investigate topological alterations in aphasia.Main results. Functional connectivity was higher when assessed from neural tracking compared to EEG time series. Persons with aphasia showed weaker low-gamma-band left-hemispheric connectivity, and graph theory-based results showed a greater network segregation and higher region-specific node strength. Aphasia also exhibited a correlation between delta-band connectivity within the left pre-frontal region and language performance.Significance.We demonstrated the added value of combining brain connectomics with neural-tracking measurement when investigating natural speech processing in post-stroke aphasia. The higher sensitivity to language-related brain circuits of this approach favors its use as informative biomarker for the assessment of aphasia.
Collapse
Affiliation(s)
- Ramtin Mehraram
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Pieter De Clercq
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Jill Kries
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Maaike Vandermosten
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| |
Collapse
|
17
|
Anderson AJ, Davis C, Lalor EC. Deep-learning models reveal how context and listener attention shape electrophysiological correlates of speech-to-language transformation. PLoS Comput Biol 2024; 20:e1012537. [PMID: 39527649 PMCID: PMC11581396 DOI: 10.1371/journal.pcbi.1012537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 11/21/2024] [Accepted: 10/04/2024] [Indexed: 11/16/2024] Open
Abstract
To transform continuous speech into words, the human brain must resolve variability across utterances in intonation, speech rate, volume, accents and so on. A promising approach to explaining this process has been to model electroencephalogram (EEG) recordings of brain responses to speech. Contemporary models typically invoke context invariant speech categories (e.g. phonemes) as an intermediary representational stage between sounds and words. However, such models may not capture the complete picture because they do not model the brain mechanism that categorizes sounds and consequently may overlook associated neural representations. By providing end-to-end accounts of speech-to-text transformation, new deep-learning systems could enable more complete brain models. We model EEG recordings of audiobook comprehension with the deep-learning speech recognition system Whisper. We find that (1) Whisper provides a self-contained EEG model of an intermediary representational stage that reflects elements of prelexical and lexical representation and prediction; (2) EEG modeling is more accurate when informed by 5-10s of speech context, which traditional context invariant categorical models do not encode; (3) Deep Whisper layers encoding linguistic structure were more accurate EEG models of selectively attended speech in two-speaker "cocktail party" listening conditions than early layers encoding acoustics. No such layer depth advantage was observed for unattended speech, consistent with a more superficial level of linguistic processing in the brain.
Collapse
Affiliation(s)
- Andrew J. Anderson
- Department of Neurology. Medical College of Wisconsin, Milwaukee, Wisconsin United States of America
- Department of Biomedical Engineering. Medical College of Wisconsin. Milwaukee, Wisconsin United States of America
- Department of Neurosurgery. Medical College of Wisconsin. Milwaukee, Wisconsin United States of America
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York, United States of America
| | - Chris Davis
- Western Sydney University, The MARCS Institute for Brain, Behaviour and Development, Westmead Innovation Quarter, Westmead, New South Wales, Australia
| | - Edmund C. Lalor
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York, United States of America
- Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America
- Center for Visual Science, University of Rochester, Rochester, New York, United States of America
| |
Collapse
|
18
|
He X, Raghavan VS, Mesgarani N. Distinct roles of SNR, speech Intelligibility, and attentional effort on neural speech tracking in noise. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.10.616515. [PMID: 39416110 PMCID: PMC11483070 DOI: 10.1101/2024.10.10.616515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Robust neural encoding of speech in noise is influenced by several factors, including signal-to-noise ratio (SNR), speech intelligibility (SI), and attentional effort (AE). Yet, the interaction and distinct role of these factors remain unclear. In this study, fourteen native English speakers performed selective speech listening tasks at various SNR levels while EEG responses were recorded. Attentional performance was assessed using a repeated word detection task, and attentional effort was inferred from subjects' gaze velocity. Results indicate that both SNR and SI enhance neural tracking of target speech, with distinct effects influenced by the previously overlooked role of attentional effort. Specifically, at high levels of SI, increasing SNR leads to reduced attentional effort, which in turn decreases neural speech tracking. Our findings highlight the importance of differentiating the roles of SNR, SI, and AE in neural speech processing and advance our understanding of how noisy speech is processed in the auditory pathway.
Collapse
Affiliation(s)
- Xiaomin He
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Vinay S Raghavan
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| |
Collapse
|
19
|
Hausfeld L, Hamers IMH, Formisano E. FMRI speech tracking in primary and non-primary auditory cortex while listening to noisy scenes. Commun Biol 2024; 7:1217. [PMID: 39349723 PMCID: PMC11442455 DOI: 10.1038/s42003-024-06913-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 09/17/2024] [Indexed: 10/04/2024] Open
Abstract
Invasive and non-invasive electrophysiological measurements during "cocktail-party"-like listening indicate that neural activity in the human auditory cortex (AC) "tracks" the envelope of relevant speech. However, due to limited coverage and/or spatial resolution, the distinct contribution of primary and non-primary areas remains unclear. Here, using 7-Tesla fMRI, we measured brain responses of participants attending to one speaker, in the presence and absence of another speaker. Through voxel-wise modeling, we observed envelope tracking in bilateral Heschl's gyrus (HG), right middle superior temporal sulcus (mSTS) and left temporo-parietal junction (TPJ), despite the signal's sluggish nature and slow temporal sampling. Neurovascular activity correlated positively (HG) or negatively (mSTS, TPJ) with the envelope. Further analyses comparing the similarity between spatial response patterns in the single speaker and concurrent speakers conditions and envelope decoding indicated that tracking in HG reflected both relevant and (to a lesser extent) non-relevant speech, while mSTS represented the relevant speech signal. Additionally, in mSTS, the similarity strength correlated with the comprehension of relevant speech. These results indicate that the fMRI signal tracks cortical responses and attention effects related to continuous speech and support the notion that primary and non-primary AC process ongoing speech in a push-pull of acoustic and linguistic information.
Collapse
Affiliation(s)
- Lars Hausfeld
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands.
- Maastricht Brain Imaging Centre, 6200 MD, Maastricht, The Netherlands.
| | - Iris M H Hamers
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands
- Department of Biomedical Sciences of Cells & Systems, Section Cognitive Neurosciences, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands
- Maastricht Brain Imaging Centre, 6200 MD, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology, Faculty of Science and Engineering, 6200 MD, Maastricht, The Netherlands
| |
Collapse
|
20
|
Baus C, Millan I, Chen XJ, Blanco-Elorrieta E. Exploring the Interplay Between Language Comprehension and Cortical Tracking: The Bilingual Test Case. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:484-496. [PMID: 38911463 PMCID: PMC11192516 DOI: 10.1162/nol_a_00141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 03/04/2024] [Indexed: 06/25/2024]
Abstract
Cortical tracking, the synchronization of brain activity to linguistic rhythms is a well-established phenomenon. However, its nature has been heavily contested: Is it purely epiphenomenal or does it play a fundamental role in speech comprehension? Previous research has used intelligibility manipulations to examine this topic. Here, we instead varied listeners' language comprehension skills while keeping the auditory stimulus constant. To do so, we tested 22 native English speakers and 22 Spanish/Catalan bilinguals learning English as a second language (SL) in an EEG cortical entrainment experiment and correlated the responses with the magnitude of the N400 component of a semantic comprehension task. As expected, native listeners effectively tracked sentential, phrasal, and syllabic linguistic structures. In contrast, SL listeners exhibited limitations in tracking sentential structures but successfully tracked phrasal and syllabic rhythms. Importantly, the amplitude of the neural entrainment correlated with the amplitude of the detection of semantic incongruities in SLs, showing a direct connection between tracking and the ability to understand speech. Together, these findings shed light on the interplay between language comprehension and cortical tracking, to identify neural entrainment as a fundamental principle for speech comprehension.
Collapse
Affiliation(s)
- Cristina Baus
- Department of Cognition, Development and Educational Psychology, University of Barcelona, Barcelona, Spain
- Institute of Neurosciences, University of Barcelona, Barcelona, Spain
| | | | | | - Esti Blanco-Elorrieta
- Department of Psychology, New York University, New York, NY, USA
- Department of Neural Science, New York University, New York, NY, USA
| |
Collapse
|
21
|
Simon A, Bech S, Loquet G, Østergaard J. Cortical linear encoding and decoding of sounds: Similarities and differences between naturalistic speech and music listening. Eur J Neurosci 2024; 59:2059-2074. [PMID: 38303522 DOI: 10.1111/ejn.16265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 11/02/2023] [Accepted: 01/12/2024] [Indexed: 02/03/2024]
Abstract
Linear models are becoming increasingly popular to investigate brain activity in response to continuous and naturalistic stimuli. In the context of auditory perception, these predictive models can be 'encoding', when stimulus features are used to reconstruct brain activity, or 'decoding' when neural features are used to reconstruct the audio stimuli. These linear models are a central component of some brain-computer interfaces that can be integrated into hearing assistive devices (e.g., hearing aids). Such advanced neurotechnologies have been widely investigated when listening to speech stimuli but rarely when listening to music. Recent attempts at neural tracking of music show that the reconstruction performances are reduced compared with speech decoding. The present study investigates the performance of stimuli reconstruction and electroencephalogram prediction (decoding and encoding models) based on the cortical entrainment of temporal variations of the audio stimuli for both music and speech listening. Three hypotheses that may explain differences between speech and music stimuli reconstruction were tested to assess the importance of the speech-specific acoustic and linguistic factors. While the results obtained with encoding models suggest different underlying cortical processing between speech and music listening, no differences were found in terms of reconstruction of the stimuli or the cortical data. The results suggest that envelope-based linear modelling can be used to study both speech and music listening, despite the differences in the underlying cortical mechanisms.
Collapse
Affiliation(s)
- Adèle Simon
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Søren Bech
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Gérard Loquet
- Department of Audiology and Speech Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Jan Østergaard
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
| |
Collapse
|
22
|
Kim SG, De Martino F, Overath T. Linguistic modulation of the neural encoding of phonemes. Cereb Cortex 2024; 34:bhae155. [PMID: 38687241 PMCID: PMC11059272 DOI: 10.1093/cercor/bhae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 05/02/2024] Open
Abstract
Speech comprehension entails the neural mapping of the acoustic speech signal onto learned linguistic units. This acousto-linguistic transformation is bi-directional, whereby higher-level linguistic processes (e.g. semantics) modulate the acoustic analysis of individual linguistic units. Here, we investigated the cortical topography and linguistic modulation of the most fundamental linguistic unit, the phoneme. We presented natural speech and "phoneme quilts" (pseudo-randomly shuffled phonemes) in either a familiar (English) or unfamiliar (Korean) language to native English speakers while recording functional magnetic resonance imaging. This allowed us to dissociate the contribution of acoustic vs. linguistic processes toward phoneme analysis. We show that (i) the acoustic analysis of phonemes is modulated by linguistic analysis and (ii) that for this modulation, both of acoustic and phonetic information need to be incorporated. These results suggest that the linguistic modulation of cortical sensitivity to phoneme classes minimizes prediction error during natural speech perception, thereby aiding speech comprehension in challenging listening situations.
Collapse
Affiliation(s)
- Seung-Goo Kim
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Federico De Martino
- Faculty of Psychology and Neuroscience, University of Maastricht, Universiteitssingel 40, 6229 ER Maastricht, Netherlands
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Duke Institute for Brain Sciences, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Center for Cognitive Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
| |
Collapse
|
23
|
Li Z, Zhang D. How does the human brain process noisy speech in real life? Insights from the second-person neuroscience perspective. Cogn Neurodyn 2024; 18:371-382. [PMID: 38699619 PMCID: PMC11061069 DOI: 10.1007/s11571-022-09924-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/20/2022] [Accepted: 12/19/2022] [Indexed: 01/07/2023] Open
Abstract
Comprehending speech with the existence of background noise is of great importance for human life. In the past decades, a large number of psychological, cognitive and neuroscientific research has explored the neurocognitive mechanisms of speech-in-noise comprehension. However, as limited by the low ecological validity of the speech stimuli and the experimental paradigm, as well as the inadequate attention on the high-order linguistic and extralinguistic processes, there remains much unknown about how the brain processes noisy speech in real-life scenarios. A recently emerging approach, i.e., the second-person neuroscience approach, provides a novel conceptual framework. It measures both of the speaker's and the listener's neural activities, and estimates the speaker-listener neural coupling with regarding of the speaker's production-related neural activity as a standardized reference. The second-person approach not only promotes the use of naturalistic speech but also allows for free communication between speaker and listener as in a close-to-life context. In this review, we first briefly review the previous discoveries about how the brain processes speech in noise; then, we introduce the principles and advantages of the second-person neuroscience approach and discuss its implications to unravel the linguistic and extralinguistic processes during speech-in-noise comprehension; finally, we conclude by proposing some critical issues and calls for more research interests in the second-person approach, which would further extend the present knowledge about how people comprehend speech in noise.
Collapse
Affiliation(s)
- Zhuoran Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, 100084 China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, 100084 China
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, 100084 China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, 100084 China
| |
Collapse
|
24
|
Ershaid H, Lizarazu M, McLaughlin D, Cooke M, Simantiraki O, Koutsogiannaki M, Lallier M. Contributions of listening effort and intelligibility to cortical tracking of speech in adverse listening conditions. Cortex 2024; 172:54-71. [PMID: 38215511 DOI: 10.1016/j.cortex.2023.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/05/2023] [Accepted: 11/14/2023] [Indexed: 01/14/2024]
Abstract
Cortical tracking of speech is vital for speech segmentation and is linked to speech intelligibility. However, there is no clear consensus as to whether reduced intelligibility leads to a decrease or an increase in cortical speech tracking, warranting further investigation of the factors influencing this relationship. One such factor is listening effort, defined as the cognitive resources necessary for speech comprehension, and reported to have a strong negative correlation with speech intelligibility. Yet, no studies have examined the relationship between speech intelligibility, listening effort, and cortical tracking of speech. The aim of the present study was thus to examine these factors in quiet and distinct adverse listening conditions. Forty-nine normal hearing adults listened to sentences produced casually, presented in quiet and two adverse listening conditions: cafeteria noise and reverberant speech. Electrophysiological responses were registered with electroencephalogram, and listening effort was estimated subjectively using self-reported scores and objectively using pupillometry. Results indicated varying impacts of adverse conditions on intelligibility, listening effort, and cortical tracking of speech, depending on the preservation of the speech temporal envelope. The more distorted envelope in the reverberant condition led to higher listening effort, as reflected in higher subjective scores, increased pupil diameter, and stronger cortical tracking of speech in the delta band. These findings suggest that using measures of listening effort in addition to those of intelligibility is useful for interpreting cortical tracking of speech results. Moreover, reading and phonological skills of participants were positively correlated with listening effort in the cafeteria condition, suggesting a special role of expert language skills in processing speech in this noisy condition. Implications for future research and theories linking atypical cortical tracking of speech and reading disorders are further discussed.
Collapse
Affiliation(s)
- Hadeel Ershaid
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Mikel Lizarazu
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Drew McLaughlin
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Martin Cooke
- Ikerbasque, Basque Science Foundation, Bilbao, Spain.
| | | | | | - Marie Lallier
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain; Ikerbasque, Basque Science Foundation, Bilbao, Spain.
| |
Collapse
|
25
|
Zoefel B, Kösem A. Neural tracking of continuous acoustics: properties, speech-specificity and open questions. Eur J Neurosci 2024; 59:394-414. [PMID: 38151889 DOI: 10.1111/ejn.16221] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 12/29/2023]
Abstract
Human speech is a particularly relevant acoustic stimulus for our species, due to its role of information transmission during communication. Speech is inherently a dynamic signal, and a recent line of research focused on neural activity following the temporal structure of speech. We review findings that characterise neural dynamics in the processing of continuous acoustics and that allow us to compare these dynamics with temporal aspects in human speech. We highlight properties and constraints that both neural and speech dynamics have, suggesting that auditory neural systems are optimised to process human speech. We then discuss the speech-specificity of neural dynamics and their potential mechanistic origins and summarise open questions in the field.
Collapse
Affiliation(s)
- Benedikt Zoefel
- Centre de Recherche Cerveau et Cognition (CerCo), CNRS UMR 5549, Toulouse, France
- Université de Toulouse III Paul Sabatier, Toulouse, France
| | - Anne Kösem
- Lyon Neuroscience Research Center (CRNL), INSERM U1028, Bron, France
| |
Collapse
|
26
|
Gao J, Chen H, Fang M, Ding N. Original speech and its echo are segregated and separately processed in the human brain. PLoS Biol 2024; 22:e3002498. [PMID: 38358954 PMCID: PMC10868781 DOI: 10.1371/journal.pbio.3002498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 01/15/2024] [Indexed: 02/17/2024] Open
Abstract
Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility. Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms. Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed. These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.
Collapse
Affiliation(s)
- Jiaxin Gao
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Honghua Chen
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Mingxuan Fang
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
- Nanhu Brain-computer Interface Institute, Hangzhou, China
- The State key Lab of Brain-Machine Intelligence; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou, China
| |
Collapse
|
27
|
Frei V, Schmitt R, Meyer M, Giroud N. Processing of Visual Speech Cues in Speech-in-Noise Comprehension Depends on Working Memory Capacity and Enhances Neural Speech Tracking in Older Adults With Hearing Impairment. Trends Hear 2024; 28:23312165241287622. [PMID: 39444375 PMCID: PMC11520018 DOI: 10.1177/23312165241287622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 08/21/2024] [Accepted: 09/11/2024] [Indexed: 10/25/2024] Open
Abstract
Comprehending speech in noise (SiN) poses a challenge for older hearing-impaired listeners, requiring auditory and working memory resources. Visual speech cues provide additional sensory information supporting speech understanding, while the extent of such visual benefit is characterized by large variability, which might be accounted for by individual differences in working memory capacity (WMC). In the current study, we investigated behavioral and neurofunctional (i.e., neural speech tracking) correlates of auditory and audio-visual speech comprehension in babble noise and the associations with WMC. Healthy older adults with hearing impairment quantified by pure-tone hearing loss (threshold average: 31.85-57 dB, N = 67) listened to sentences in babble noise in audio-only, visual-only and audio-visual speech modality and performed a pattern matching and a comprehension task, while electroencephalography (EEG) was recorded. Behaviorally, no significant difference in task performance was observed across modalities. However, we did find a significant association between individual working memory capacity and task performance, suggesting a more complex interplay between audio-visual speech cues, working memory capacity and real-world listening tasks. Furthermore, we found that the visual speech presentation was accompanied by increased cortical tracking of the speech envelope, particularly in a right-hemispheric auditory topographical cluster. Post-hoc, we investigated the potential relationships between the behavioral performance and neural speech tracking but were not able to establish a significant association. Overall, our results show an increase in neurofunctional correlates of speech associated with congruent visual speech cues, specifically in a right auditory cluster, suggesting multisensory integration.
Collapse
Affiliation(s)
- Vanessa Frei
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- International Max Planck Research School for the Life Course: Evolutionary and Ontogenetic Dynamics (LIFE), Berlin, Germany
| | - Raffael Schmitt
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- International Max Planck Research School for the Life Course: Evolutionary and Ontogenetic Dynamics (LIFE), Berlin, Germany
- Competence Center Language & Medicine, Center of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich, Switzerland
| | - Martin Meyer
- Competence Center Language & Medicine, Center of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich, Switzerland
- University of Zurich, University Research Priority Program Dynamics of Healthy Aging, Zurich, Switzerland
- Center for Neuroscience Zurich, University and ETH of Zurich, Zurich, Switzerland
- Evolutionary Neuroscience of Language, Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Cognitive Psychology Unit, Alpen-Adria University, Klagenfurt, Austria
| | - Nathalie Giroud
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- International Max Planck Research School for the Life Course: Evolutionary and Ontogenetic Dynamics (LIFE), Berlin, Germany
- Competence Center Language & Medicine, Center of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich, Switzerland
- Center for Neuroscience Zurich, University and ETH of Zurich, Zurich, Switzerland
| |
Collapse
|
28
|
Karunathilake IMD, Kulasingham JP, Simon JZ. Neural tracking measures of speech intelligibility: Manipulating intelligibility while keeping acoustics unchanged. Proc Natl Acad Sci U S A 2023; 120:e2309166120. [PMID: 38032934 PMCID: PMC10710032 DOI: 10.1073/pnas.2309166120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/21/2023] [Indexed: 12/02/2023] Open
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle the effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise-vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (nondegraded) version of the speech. This intermediate priming, which generates a "pop-out" percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affect acoustic and linguistic neural representations using multivariate temporal response functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. mTRFs analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex, in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD20742
- Department of Biology, University of Maryland, College Park, MD20742
- Institute for Systems Research, University of Maryland, College Park, MD20742
| |
Collapse
|
29
|
Tuckute G, Feather J, Boebinger D, McDermott JH. Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions. PLoS Biol 2023; 21:e3002366. [PMID: 38091351 PMCID: PMC10718467 DOI: 10.1371/journal.pbio.3002366] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 10/06/2023] [Indexed: 12/18/2023] Open
Abstract
Models that predict brain responses to stimuli provide one measure of understanding of a sensory system and have many potential applications in science and engineering. Deep artificial neural networks have emerged as the leading such predictive models of the visual system but are less explored in audition. Prior work provided examples of audio-trained neural networks that produced good predictions of auditory cortical fMRI responses and exhibited correspondence between model stages and brain regions, but left it unclear whether these results generalize to other neural network models and, thus, how to further improve models in this domain. We evaluated model-brain correspondence for publicly available audio neural network models along with in-house models trained on 4 different tasks. Most tested models outpredicted standard spectromporal filter-bank models of auditory cortex and exhibited systematic model-brain correspondence: Middle stages best predicted primary auditory cortex, while deep stages best predicted non-primary cortex. However, some state-of-the-art models produced substantially worse brain predictions. Models trained to recognize speech in background noise produced better brain predictions than models trained to recognize speech in quiet, potentially because hearing in noise imposes constraints on biological auditory representations. The training task influenced the prediction quality for specific cortical tuning properties, with best overall predictions resulting from models trained on multiple tasks. The results generally support the promise of deep neural networks as models of audition, though they also indicate that current models do not explain auditory cortical responses in their entirety.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Jenelle Feather
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
- University of Rochester Medical Center, Rochester, New York, New York, United States of America
| | - Josh H. McDermott
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
30
|
Zhang X, Li J, Li Z, Hong B, Diao T, Ma X, Nolte G, Engel AK, Zhang D. Leading and following: Noise differently affects semantic and acoustic processing during naturalistic speech comprehension. Neuroimage 2023; 282:120404. [PMID: 37806465 DOI: 10.1016/j.neuroimage.2023.120404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/19/2023] [Accepted: 10/05/2023] [Indexed: 10/10/2023] Open
Abstract
Despite the distortion of speech signals caused by unavoidable noise in daily life, our ability to comprehend speech in noisy environments is relatively stable. However, the neural mechanisms underlying reliable speech-in-noise comprehension remain to be elucidated. The present study investigated the neural tracking of acoustic and semantic speech information during noisy naturalistic speech comprehension. Participants listened to narrative audio recordings mixed with spectrally matched stationary noise at three signal-to-ratio (SNR) levels (no noise, 3 dB, -3 dB), and 60-channel electroencephalography (EEG) signals were recorded. A temporal response function (TRF) method was employed to derive event-related-like responses to the continuous speech stream at both the acoustic and the semantic levels. Whereas the amplitude envelope of the naturalistic speech was taken as the acoustic feature, word entropy and word surprisal were extracted via the natural language processing method as two semantic features. Theta-band frontocentral TRF responses to the acoustic feature were observed at around 400 ms following speech fluctuation onset over all three SNR levels, and the response latencies were more delayed with increasing noise. Delta-band frontal TRF responses to the semantic feature of word entropy were observed at around 200 to 600 ms leading to speech fluctuation onset over all three SNR levels. The response latencies became more leading with increasing noise and decreasing speech comprehension and intelligibility. While the following responses to speech acoustics were consistent with previous studies, our study revealed the robustness of leading responses to speech semantics, which suggests a possible predictive mechanism at the semantic level for maintaining reliable speech comprehension in noisy environments.
Collapse
Affiliation(s)
- Xinmiao Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Jiawei Li
- Department of Education and Psychology, Freie Universität Berlin, Berlin 14195, Federal Republic of Germany
| | - Zhuoran Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China; Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Tongxiang Diao
- Department of Otolaryngology, Head and Neck Surgery, Peking University, People's Hospital, Beijing 100044, China
| | - Xin Ma
- Department of Otolaryngology, Head and Neck Surgery, Peking University, People's Hospital, Beijing 100044, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Federal Republic of Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Federal Republic of Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
31
|
Wang B, Xu X, Niu Y, Wu C, Wu X, Chen J. EEG-based auditory attention decoding with audiovisual speech for hearing-impaired listeners. Cereb Cortex 2023; 33:10972-10983. [PMID: 37750333 DOI: 10.1093/cercor/bhad325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 08/21/2023] [Accepted: 08/22/2023] [Indexed: 09/27/2023] Open
Abstract
Auditory attention decoding (AAD) was used to determine the attended speaker during an auditory selective attention task. However, the auditory factors modulating AAD remained unclear for hearing-impaired (HI) listeners. In this study, scalp electroencephalogram (EEG) was recorded with an auditory selective attention paradigm, in which HI listeners were instructed to attend one of the two simultaneous speech streams with or without congruent visual input (articulation movements), and at a high or low target-to-masker ratio (TMR). Meanwhile, behavioral hearing tests (i.e. audiogram, speech reception threshold, temporal modulation transfer function) were used to assess listeners' individual auditory abilities. The results showed that both visual input and increasing TMR could significantly enhance the cortical tracking of the attended speech and AAD accuracy. Further analysis revealed that the audiovisual (AV) gain in attended speech cortical tracking was significantly correlated with listeners' auditory amplitude modulation (AM) sensitivity, and the TMR gain in attended speech cortical tracking was significantly correlated with listeners' hearing thresholds. Temporal response function analysis revealed that subjects with higher AM sensitivity demonstrated more AV gain over the right occipitotemporal and bilateral frontocentral scalp electrodes.
Collapse
Affiliation(s)
- Bo Wang
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Xiran Xu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Yadong Niu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Chao Wu
- School of Nursing, Peking University, Beijing 100191, China
| | - Xihong Wu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| | - Jing Chen
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| |
Collapse
|
32
|
Tan SHJ, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: Gaze Behavior and the Auditory-Visual Speech Benefit in Adults' Cortical Tracking of Infant-directed Speech. J Cogn Neurosci 2023; 35:1741-1759. [PMID: 37677057 DOI: 10.1162/jocn_a_02044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
In face-to-face conversations, listeners gather visual speech information from a speaker's talking face that enhances their perception of the incoming auditory speech signal. This auditory-visual (AV) speech benefit is evident even in quiet environments but is stronger in situations that require greater listening effort such as when the speech signal itself deviates from listeners' expectations. One example is infant-directed speech (IDS) presented to adults. IDS has exaggerated acoustic properties that are easily discriminable from adult-directed speech (ADS). Although IDS is a speech register that adults typically use with infants, no previous neurophysiological study has directly examined whether adult listeners process IDS differently from ADS. To address this, the current study simultaneously recorded EEG and eye-tracking data from adult participants as they were presented with auditory-only (AO), visual-only, and AV recordings of IDS and ADS. Eye-tracking data were recorded because looking behavior to the speaker's eyes and mouth modulates the extent of AV speech benefit experienced. Analyses of cortical tracking accuracy revealed that cortical tracking of the speech envelope was significant in AO and AV modalities for IDS and ADS. However, the AV speech benefit [i.e., AV > (A + V)] was only present for IDS trials. Gaze behavior analyses indicated differences in looking behavior during IDS and ADS trials. Surprisingly, looking behavior to the speaker's eyes and mouth was not correlated with cortical tracking accuracy. Additional exploratory analyses indicated that attention to the whole display was negatively correlated with cortical tracking accuracy of AO and visual-only trials in IDS. Our results underscore the nuances involved in the relationship between neurophysiological AV speech benefit and looking behavior.
Collapse
Affiliation(s)
- Sok Hui Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
- Science of Learning in Education Centre, Office of Education Research, National Institute of Education, Nanyang Technological University, Singapore
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language
- IKERBASQUE, Basque Foundation for Science
| | - Giovanni M Di Liberto
- ADAPT Centre, School of Computer Science and Statistics, Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland
| | - Michael J Crosse
- SEGOTIA, Galway, Ireland
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
| |
Collapse
|
33
|
Van Hirtum T, Somers B, Dieudonné B, Verschueren E, Wouters J, Francart T. Neural envelope tracking predicts speech intelligibility and hearing aid benefit in children with hearing loss. Hear Res 2023; 439:108893. [PMID: 37806102 DOI: 10.1016/j.heares.2023.108893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/01/2023] [Accepted: 09/27/2023] [Indexed: 10/10/2023]
Abstract
Early assessment of hearing aid benefit is crucial, as the extent to which hearing aids provide audible speech information predicts speech and language outcomes. A growing body of research has proposed neural envelope tracking as an objective measure of speech intelligibility, particularly for individuals unable to provide reliable behavioral feedback. However, its potential for evaluating speech intelligibility and hearing aid benefit in children with hearing loss remains unexplored. In this study, we investigated neural envelope tracking in children with permanent hearing loss through two separate experiments. EEG data were recorded while children listened to age-appropriate stories (Experiment 1) or an animated movie (Experiment 2) under aided and unaided conditions (using personal hearing aids) at multiple stimulus intensities. Neural envelope tracking was evaluated using a linear decoder reconstructing the speech envelope from the EEG in the delta band (0.5-4 Hz). Additionally, we calculated temporal response functions (TRFs) to investigate the spatio-temporal dynamics of the response. In both experiments, neural tracking increased with increasing stimulus intensity, but only in the unaided condition. In the aided condition, neural tracking remained stable across a wide range of intensities, as long as speech intelligibility was maintained. Similarly, TRF amplitudes increased with increasing stimulus intensity in the unaided condition, while in the aided condition significant differences were found in TRF latency rather than TRF amplitude. This suggests that decreasing stimulus intensity does not necessarily impact neural tracking. Furthermore, the use of personal hearing aids significantly enhanced neural envelope tracking, particularly in challenging speech conditions that would be inaudible when unaided. Finally, we found a strong correlation between neural envelope tracking and behaviorally measured speech intelligibility for both narrated stories (Experiment 1) and movie stimuli (Experiment 2). Altogether, these findings indicate that neural envelope tracking could be a valuable tool for predicting speech intelligibility benefits derived from personal hearing aids in hearing-impaired children. Incorporating narrated stories or engaging movies expands the accessibility of these methods even in clinical settings, offering new avenues for using objective speech measures to guide pediatric audiology decision-making.
Collapse
Affiliation(s)
- Tilde Van Hirtum
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Ben Somers
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Benjamin Dieudonné
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Eline Verschueren
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Jan Wouters
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Tom Francart
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium.
| |
Collapse
|
34
|
An H, Lee J, Suh MW, Lim Y. Neural correlation of speech envelope tracking for background noise in normal hearing. Front Neurosci 2023; 17:1268591. [PMID: 37916182 PMCID: PMC10616241 DOI: 10.3389/fnins.2023.1268591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 10/04/2023] [Indexed: 11/03/2023] Open
Abstract
Everyday speech communication often occurs in environments with background noise, and the impact of noise on speech recognition can vary depending on factors such as noise type, noise intensity, and the listener's hearing ability. However, the extent to which neural mechanisms in speech understanding are influenced by different types and levels of noise remains unknown. This study aims to investigate whether individuals exhibit distinct neural responses and attention strategies depending on noise conditions. We recorded electroencephalography (EEG) data from 20 participants with normal hearing (13 males) and evaluated both neural tracking of speech envelopes and behavioral performance in speech understanding in the presence of varying types of background noise. Participants engaged in an EEG experiment consisting of two separate sessions. The first session involved listening to a 12-min story presented binaurally without any background noise. In the second session, speech understanding scores were measured using matrix sentences presented under speech-shaped noise (SSN) and Story noise background noise conditions at noise levels corresponding to sentence recognitions score (SRS). We observed differences in neural envelope correlation depending on noise type but not on its level. Interestingly, the impact of noise type on the variation in envelope tracking was more significant among participants with higher speech perception scores, while those with lower scores exhibited similarities in envelope correlation regardless of the noise condition. The findings suggest that even individuals with normal hearing could adopt different strategies to understand speech in challenging listening environments, depending on the type of noise.
Collapse
Affiliation(s)
- HyunJung An
- Center for Intelligent and Interactive Robotics, Korea Institute of Science and Technology, Seoul, Republic of Korea
| | - JeeWon Lee
- Center for Intelligent and Interactive Robotics, Korea Institute of Science and Technology, Seoul, Republic of Korea
- Department of Electronic and Electrical Engineering, Ewha Womans University, Seoul, Republic of Korea
| | - Myung-Whan Suh
- Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul, Republic of Korea
| | - Yoonseob Lim
- Center for Intelligent and Interactive Robotics, Korea Institute of Science and Technology, Seoul, Republic of Korea
- Department of HY-KIST Bio-convergence, Hanyang University, Seoul, Republic of Korea
| |
Collapse
|
35
|
Karunathilake ID, Kulasingham JP, Simon JZ. Neural Tracking Measures of Speech Intelligibility: Manipulating Intelligibility while Keeping Acoustics Unchanged. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541269. [PMID: 37292644 PMCID: PMC10245672 DOI: 10.1101/2023.05.18.541269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography (MEG) recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (non-degraded) version of the speech. This intermediate priming, which generates a 'pop-out' percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affects acoustic and linguistic neural representations using multivariate Temporal Response Functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. TRF analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming, but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA
- Department of Biology, University of Maryland, College Park, MD 20742, USA
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
36
|
Mohammadi Y, Graversen C, Østergaard J, Andersen OK, Reichenbach T. Phase-locking of Neural Activity to the Envelope of Speech in the Delta Frequency Band Reflects Differences between Word Lists and Sentences. J Cogn Neurosci 2023; 35:1301-1311. [PMID: 37379482 DOI: 10.1162/jocn_a_02016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
The envelope of a speech signal is tracked by neural activity in the cerebral cortex. The cortical tracking occurs mainly in two frequency bands, theta (4-8 Hz) and delta (1-4 Hz). Tracking in the faster theta band has been mostly associated with lower-level acoustic processing, such as the parsing of syllables, whereas the slower tracking in the delta band relates to higher-level linguistic information of words and word sequences. However, much regarding the more specific association between cortical tracking and acoustic as well as linguistic processing remains to be uncovered. Here, we recorded EEG responses to both meaningful sentences and random word lists in different levels of signal-to-noise ratios (SNRs) that lead to different levels of speech comprehension as well as listening effort. We then related the neural signals to the acoustic stimuli by computing the phase-locking value (PLV) between the EEG recordings and the speech envelope. We found that the PLV in the delta band increases with increasing SNR for sentences but not for the random word lists, showing that the PLV in this frequency band reflects linguistic information. When attempting to disentangle the effects of SNR, speech comprehension, and listening effort, we observed a trend that the PLV in the delta band might reflect listening effort rather than the other two variables, although the effect was not statistically significant. In summary, our study shows that the PLV in the delta band reflects linguistic information and might be related to listening effort.
Collapse
|
37
|
Tewarie PKB, Hindriks R, Lai YM, Sotiropoulos SN, Kringelbach M, Deco G. Non-reversibility outperforms functional connectivity in characterisation of brain states in MEG data. Neuroimage 2023; 276:120186. [PMID: 37268096 DOI: 10.1016/j.neuroimage.2023.120186] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 04/27/2023] [Accepted: 05/22/2023] [Indexed: 06/04/2023] Open
Abstract
Characterising brain states during tasks is common practice for many neuroscientific experiments using electrophysiological modalities such as electroencephalography (EEG) and magnetoencephalography (MEG). Brain states are often described in terms of oscillatory power and correlated brain activity, i.e. functional connectivity. It is, however, not unusual to observe weak task induced functional connectivity alterations in the presence of strong task induced power modulations using classical time-frequency representation of the data. Here, we propose that non-reversibility, or the temporal asymmetry in functional interactions, may be more sensitive to characterise task induced brain states than functional connectivity. As a second step, we explore causal mechanisms of non-reversibility in MEG data using whole brain computational models. We include working memory, motor, language tasks and resting-state data from participants of the Human Connectome Project (HCP). Non-reversibility is derived from the lagged amplitude envelope correlation (LAEC), and is based on asymmetry of the forward and reversed cross-correlations of the amplitude envelopes. Using random forests, we find that non-reversibility outperforms functional connectivity in the identification of task induced brain states. Non-reversibility shows especially better sensitivity to capture bottom-up gamma induced brain states across all tasks, but also alpha band associated brain states. Using whole brain computational models we find that asymmetry in the effective connectivity and axonal conduction delays play a major role in shaping non-reversibility across the brain. Our work paves the way for better sensitivity in characterising brain states during both bottom-up as well as top-down modulation in future neuroscientific experiments.
Collapse
Affiliation(s)
- Prejaas K B Tewarie
- Center for Brain and Cognition, Computational Neuroscience Group, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Spain; Clinical Neurophysiology Group, University of Twente, Enschede, The Netherlands; Department of Neurology, Amsterdam UMC, Amsterdam, the Netherlands; Sir Peter Mansfield Imaging Centre, School of Physics, University of Nottingham, Nottingham, United Kingdom.
| | - Rikkert Hindriks
- Department of Mathematics, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Yi Ming Lai
- Sir Peter Mansfield Imaging Centre, School of Medicine, University of Nottingham, United Kingdom
| | - Stamatios N Sotiropoulos
- Sir Peter Mansfield Imaging Centre, School of Medicine, University of Nottingham, United Kingdom; NIHR Biomedical Research Centre, University of Nottingham, Nottingham University Hospitals NHS Trust, Nottingham, UK
| | - Morten Kringelbach
- Centre for Eudaimonia and Human Flourishing, Linacre College, University of Oxford, Oxford, UK; Center for Music in the Brain, Department of Clinical Medicine, Aarhus University, Aarhus, Denmark; Department of Psychiatry, University of Oxford, Oxford, UK
| | - Gustavo Deco
- Center for Brain and Cognition, Computational Neuroscience Group, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Spain; Institució Catalana de la Recerca i Estudis Avançats (ICREA), Barcelona, Spain; Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
38
|
Abbasi O, Steingräber N, Chalas N, Kluger DS, Gross J. Spatiotemporal dynamics characterise spectral connectivity profiles of continuous speaking and listening. PLoS Biol 2023; 21:e3002178. [PMID: 37478152 DOI: 10.1371/journal.pbio.3002178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/31/2023] [Indexed: 07/23/2023] Open
Abstract
Speech production and perception are fundamental processes of human cognition that both rely on intricate processing mechanisms that are still poorly understood. Here, we study these processes by using magnetoencephalography (MEG) to comprehensively map connectivity of regional brain activity within the brain and to the speech envelope during continuous speaking and listening. Our results reveal not only a partly shared neural substrate for both processes but also a dissociation in space, delay, and frequency. Neural activity in motor and frontal areas is coupled to succeeding speech in delta band (1 to 3 Hz), whereas coupling in the theta range follows speech in temporal areas during speaking. Neural connectivity results showed a separation of bottom-up and top-down signalling in distinct frequency bands during speaking. Here, we show that frequency-specific connectivity channels for bottom-up and top-down signalling support continuous speaking and listening. These findings further shed light on the complex interplay between different brain regions involved in speech production and perception.
Collapse
Affiliation(s)
- Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Nadine Steingräber
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
- Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
- Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
- Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
39
|
Karunathilake IMD, Dunlap JL, Perera J, Presacco A, Decruy L, Anderson S, Kuchinsky SE, Simon JZ. Effects of aging on cortical representations of continuous speech. J Neurophysiol 2023; 129:1359-1377. [PMID: 37096924 PMCID: PMC10202479 DOI: 10.1152/jn.00356.2022] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 04/04/2023] [Accepted: 04/20/2023] [Indexed: 04/26/2023] Open
Abstract
Understanding speech in a noisy environment is crucial in day-to-day interactions and yet becomes more challenging with age, even for healthy aging. Age-related changes in the neural mechanisms that enable speech-in-noise listening have been investigated previously; however, the extent to which age affects the timing and fidelity of encoding of target and interfering speech streams is not well understood. Using magnetoencephalography (MEG), we investigated how continuous speech is represented in auditory cortex in the presence of interfering speech in younger and older adults. Cortical representations were obtained from neural responses that time-locked to the speech envelopes with speech envelope reconstruction and temporal response functions (TRFs). TRFs showed three prominent peaks corresponding to auditory cortical processing stages: early (∼50 ms), middle (∼100 ms), and late (∼200 ms). Older adults showed exaggerated speech envelope representations compared with younger adults. Temporal analysis revealed both that the age-related exaggeration starts as early as ∼50 ms and that older adults needed a substantially longer integration time window to achieve their better reconstruction of the speech envelope. As expected, with increased speech masking envelope reconstruction for the attended talker decreased and all three TRF peaks were delayed, with aging contributing additionally to the reduction. Interestingly, for older adults the late peak was delayed, suggesting that this late peak may receive contributions from multiple sources. Together these results suggest that there are several mechanisms at play compensating for age-related temporal processing deficits at several stages but which are not able to fully reestablish unimpaired speech perception.NEW & NOTEWORTHY We observed age-related changes in cortical temporal processing of continuous speech that may be related to older adults' difficulty in understanding speech in noise. These changes occur in both timing and strength of the speech representations at different cortical processing stages and depend on both noise condition and selective attention. Critically, their dependence on noise condition changes dramatically among the early, middle, and late cortical processing stages, underscoring how aging differentially affects these stages.
Collapse
Affiliation(s)
- I M Dushyanthi Karunathilake
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States
| | - Jason L Dunlap
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Janani Perera
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Alessandro Presacco
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
| | - Samira Anderson
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Stefanie E Kuchinsky
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, Maryland, United States
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
- Department of Biology, University of Maryland, College Park, Maryland, United States
| |
Collapse
|
40
|
Bellur A, Thakkar K, Elhilali M. Explicit-memory multiresolution adaptive framework for speech and music separation. EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING 2023; 2023:20. [PMID: 37181589 PMCID: PMC10169896 DOI: 10.1186/s13636-023-00286-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 04/21/2023] [Indexed: 05/16/2023]
Abstract
The human auditory system employs a number of principles to facilitate the selection of perceptually separated streams from a complex sound mixture. The brain leverages multi-scale redundant representations of the input and uses memory (or priors) to guide the selection of a target sound from the input mixture. Moreover, feedback mechanisms refine the memory constructs resulting in further improvement of selectivity of a particular sound object amidst dynamic backgrounds. The present study proposes a unified end-to-end computational framework that mimics these principles for sound source separation applied to both speech and music mixtures. While the problems of speech enhancement and music separation have often been tackled separately due to constraints and specificities of each signal domain, the current work posits that common principles for sound source separation are domain-agnostic. In the proposed scheme, parallel and hierarchical convolutional paths map input mixtures onto redundant but distributed higher-dimensional subspaces and utilize the concept of temporal coherence to gate the selection of embeddings belonging to a target stream abstracted in memory. These explicit memories are further refined through self-feedback from incoming observations in order to improve the system's selectivity when faced with unknown backgrounds. The model yields stable outcomes of source separation for both speech and music mixtures and demonstrates benefits of explicit memory as a powerful representation of priors that guide information selection from complex inputs.
Collapse
Affiliation(s)
- Ashwin Bellur
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Karan Thakkar
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Mounya Elhilali
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| |
Collapse
|
41
|
Van Hirtum T, Somers B, Verschueren E, Dieudonné B, Francart T. Delta-band neural envelope tracking predicts speech intelligibility in noise in preschoolers. Hear Res 2023; 434:108785. [PMID: 37172414 DOI: 10.1016/j.heares.2023.108785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/24/2023] [Accepted: 05/05/2023] [Indexed: 05/15/2023]
Abstract
Behavioral tests are currently the gold standard in measuring speech intelligibility. However, these tests can be difficult to administer in young children due to factors such as motivation, linguistic knowledge and cognitive skills. It has been shown that measures of neural envelope tracking can be used to predict speech intelligibility and overcome these issues. However, its potential as an objective measure for speech intelligibility in noise remains to be investigated in preschool children. Here, we evaluated neural envelope tracking as a function of signal-to-noise ratio (SNR) in 14 5-year-old children. We examined EEG responses to natural, continuous speech presented at different SNRs ranging from -8 (very difficult) to 8 dB SNR (very easy). As expected delta band (0.5-4 Hz) tracking increased with increasing stimulus SNR. However, this increase was not strictly monotonic as neural tracking reached a plateau between 0 and 4 dB SNR, similarly to the behavioral speech intelligibility outcomes. These findings indicate that neural tracking in the delta band remains stable, as long as the acoustical degradation of the speech signal does not reflect significant changes in speech intelligibility. Theta band tracking (4-8 Hz), on the other hand, was found to be drastically reduced and more easily affected by noise in children, making it less reliable as a measure of speech intelligibility. By contrast, neural envelope tracking in the delta band was directly associated with behavioral measures of speech intelligibility. This suggests that neural envelope tracking in the delta band is a valuable tool for evaluating speech-in-noise intelligibility in preschoolers, highlighting its potential as an objective measure of speech in difficult-to-test populations.
Collapse
Affiliation(s)
- Tilde Van Hirtum
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium.
| | - Ben Somers
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Eline Verschueren
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Benjamin Dieudonné
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Tom Francart
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| |
Collapse
|
42
|
Park JJ, Baek SC, Suh MW, Choi J, Kim SJ, Lim Y. The effect of topic familiarity and volatility of auditory scene on selective auditory attention. Hear Res 2023; 433:108770. [PMID: 37104990 DOI: 10.1016/j.heares.2023.108770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 04/06/2023] [Accepted: 04/15/2023] [Indexed: 04/29/2023]
Abstract
Selective auditory attention has been shown to modulate the cortical representation of speech. This effect has been well documented in acoustically more challenging environments. However, the influence of top-down factors, in particular topic familiarity, on this process remains unclear, despite evidence that semantic information can promote speech-in-noise perception. Apart from individual features forming a static listening condition, dynamic and irregular changes of auditory scenes-volatile listening environments-have been less studied. To address these gaps, we explored the influence of topic familiarity and volatile listening on the selective auditory attention process during dichotic listening using electroencephalography. When stories with unfamiliar topics were presented, participants' comprehension was severely degraded. However, their cortical activity selectively tracked the speech of the target story well. This implies that topic familiarity hardly influences the speech tracking neural index, possibly when the bottom-up information is sufficient. However, when the listening environment was volatile and the listeners had to re-engage in new speech whenever auditory scenes altered, the neural correlates of the attended speech were degraded. In particular, the cortical response to the attended speech and the spatial asymmetry of the response to the left and right attention were significantly attenuated around 100-200 ms after the speech onset. These findings suggest that volatile listening environments could adversely affect the modulation effect of selective attention, possibly by hampering proper attention due to increased perceptual load.
Collapse
Affiliation(s)
- Jonghwa Jeonglok Park
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Myung-Whan Suh
- Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul 03080, South Korea
| | - Jongsuk Choi
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of AI Robotics, KIST School, Korea University of Science and Technology, Seoul 02792, South Korea
| | - Sung June Kim
- Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Yoonseob Lim
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, South Korea.
| |
Collapse
|
43
|
Abstract
Adaptation is an essential feature of auditory neurons, which reduces their responses to unchanging and recurring sounds and allows their response properties to be matched to the constantly changing statistics of sounds that reach the ears. As a consequence, processing in the auditory system highlights novel or unpredictable sounds and produces an efficient representation of the vast range of sounds that animals can perceive by continually adjusting the sensitivity and, to a lesser extent, the tuning properties of neurons to the most commonly encountered stimulus values. Together with attentional modulation, adaptation to sound statistics also helps to generate neural representations of sound that are tolerant to background noise and therefore plays a vital role in auditory scene analysis. In this review, we consider the diverse forms of adaptation that are found in the auditory system in terms of the processing levels at which they arise, the underlying neural mechanisms, and their impact on neural coding and perception. We also ask what the dynamics of adaptation, which can occur over multiple timescales, reveal about the statistical properties of the environment. Finally, we examine how adaptation to sound statistics is influenced by learning and experience and changes as a result of aging and hearing loss.
Collapse
Affiliation(s)
- Ben D. B. Willmore
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Andrew J. King
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
44
|
Kösem A, Dai B, McQueen JM, Hagoort P. Neural tracking of speech envelope does not unequivocally reflect intelligibility. Neuroimage 2023; 272:120040. [PMID: 36935084 DOI: 10.1016/j.neuroimage.2023.120040] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 03/13/2023] [Accepted: 03/15/2023] [Indexed: 03/19/2023] Open
Abstract
During listening, brain activity tracks the rhythmic structures of speech signals. Here, we directly dissociated the contribution of neural envelope tracking in the processing of speech acoustic cues from that related to linguistic processing. We examined the neural changes associated with the comprehension of Noise-Vocoded (NV) speech using magnetoencephalography (MEG). Participants listened to NV sentences in a 3-phase training paradigm: (1) pre-training, where NV stimuli were barely comprehended, (2) training with exposure of the original clear version of speech stimulus, and (3) post-training, where the same stimuli gained intelligibility from the training phase. Using this paradigm, we tested if the neural responses of a speech signal was modulated by its intelligibility without any change in its acoustic structure. To test the influence of spectral degradation on neural envelope tracking independently of training, participants listened to two types of NV sentences (4-band and 2-band NV speech), but were only trained to understand 4-band NV speech. Significant changes in neural tracking were observed in the delta range in relation to the acoustic degradation of speech. However, we failed to find a direct effect of intelligibility on the neural tracking of speech envelope in both theta and delta ranges, in both auditory regions-of-interest and whole-brain sensor-space analyses. This suggests that acoustics greatly influence the neural tracking response to speech envelope, and that caution needs to be taken when choosing the control signals for speech-brain tracking analyses, considering that a slight change in acoustic parameters can have strong effects on the neural tracking response.
Collapse
Affiliation(s)
- Anne Kösem
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands; Lyon Neuroscience Research Center (CRNL), CoPhy Team, INSERM U1028, 69500 Bron, France.
| | - Bohan Dai
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands
| | - James M McQueen
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands
| | - Peter Hagoort
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands
| |
Collapse
|
45
|
De Clercq P, Vanthornhout J, Vandermosten M, Francart T. Beyond linear neural envelope tracking: a mutual information approach. J Neural Eng 2023; 20. [PMID: 36812597 DOI: 10.1088/1741-2552/acbe1d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 02/22/2023] [Indexed: 02/24/2023]
Abstract
Objective.The human brain tracks the temporal envelope of speech, which contains essential cues for speech understanding. Linear models are the most common tool to study neural envelope tracking. However, information on how speech is processed can be lost since nonlinear relations are precluded. Analysis based on mutual information (MI), on the other hand, can detect both linear and nonlinear relations and is gradually becoming more popular in the field of neural envelope tracking. Yet, several different approaches to calculating MI are applied with no consensus on which approach to use. Furthermore, the added value of nonlinear techniques remains a subject of debate in the field. The present paper aims to resolve these open questions.Approach.We analyzed electroencephalography (EEG) data of participants listening to continuous speech and applied MI analyses and linear models.Main results.Comparing the different MI approaches, we conclude that results are most reliable and robust using the Gaussian copula approach, which first transforms the data to standard Gaussians. With this approach, the MI analysis is a valid technique for studying neural envelope tracking. Like linear models, it allows spatial and temporal interpretations of speech processing, peak latency analyses, and applications to multiple EEG channels combined. In a final analysis, we tested whether nonlinear components were present in the neural response to the envelope by first removing all linear components in the data. We robustly detected nonlinear components on the single-subject level using the MI analysis.Significance.We demonstrate that the human brain processes speech in a nonlinear way. Unlike linear models, the MI analysis detects such nonlinear relations, proving its added value to neural envelope tracking. In addition, the MI analysis retains spatial and temporal characteristics of speech processing, an advantage lost when using more complex (nonlinear) deep neural networks.
Collapse
Affiliation(s)
- Pieter De Clercq
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Maaike Vandermosten
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
46
|
Aljarboa GS, Bell SL, Simpson DM. Detecting cortical responses to continuous running speech using EEG data from only one channel. Int J Audiol 2023; 62:199-208. [PMID: 35152811 DOI: 10.1080/14992027.2022.2035832] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
OBJECTIVE To explore the detection of cortical responses to continuous speech using a single EEG channel. Particularly, to compare detection rates and times using a cross-correlation approach and parameters extracted from the temporal response function (TRF). DESIGN EEG from 32-channels were recorded whilst presenting 25-min continuous English speech. Detection parameters were cross-correlation between speech and EEG (XCOR), peak value and power of the TRF filter (TRF-peak and TRF-power), and correlation between predicted TRF and true EEG (TRF-COR). A bootstrap analysis was used to determine response statistical significance. Different electrode configurations were compared: Using single channels Cz or Fz, or selecting channels with the highest correlation value. STUDY SAMPLE Seventeen native English-speaking subjects with mild-to-moderate hearing loss. RESULTS Significant cortical responses were detected from all subjects at Fz channel with XCOR and TRF-COR. Lower detection time was seen for XCOR (mean = 4.8 min) over TRF parameters (best TRF-COR, mean = 6.4 min), with significant time differences from XCOR to TRF-peak and TRF-power. Analysing multiple EEG channels and testing channels with the highest correlation between envelope and EEG reduced detection sensitivity compared to Fz alone. CONCLUSIONS Cortical responses to continuous speech can be detected from a single channel with recording times that may be suitable for clinical application.
Collapse
Affiliation(s)
- Ghadah S Aljarboa
- Institute of Sound and Vibration Research, University of Southampton, Southampton, UK.,Communication Sciences, Princess Nora bint Abdul Rahman University, Riyadh, Saudi Arabia
| | - Steve L Bell
- Institute of Sound and Vibration Research, University of Southampton, Southampton, UK
| | - David M Simpson
- Institute of Sound and Vibration Research, University of Southampton, Southampton, UK
| |
Collapse
|
47
|
Chen YP, Schmidt F, Keitel A, Rösch S, Hauswald A, Weisz N. Speech intelligibility changes the temporal evolution of neural speech tracking. Neuroimage 2023; 268:119894. [PMID: 36693596 DOI: 10.1016/j.neuroimage.2023.119894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 12/13/2022] [Accepted: 01/20/2023] [Indexed: 01/22/2023] Open
Abstract
Listening to speech with poor signal quality is challenging. Neural speech tracking of degraded speech has been used to advance the understanding of how brain processes and speech intelligibility are interrelated. However, the temporal dynamics of neural speech tracking and their relation to speech intelligibility are not clear. In the present MEG study, we exploited temporal response functions (TRFs), which has been used to describe the time course of speech tracking on a gradient from intelligible to unintelligible degraded speech. In addition, we used inter-related facets of neural speech tracking (e.g., speech envelope reconstruction, speech-brain coherence, and components of broadband coherence spectra) to endorse our findings in TRFs. Our TRF analysis yielded marked temporally differential effects of vocoding: ∼50-110 ms (M50TRF), ∼175-230 ms (M200TRF), and ∼315-380 ms (M350TRF). Reduction of intelligibility went along with large increases of early peak responses M50TRF, but strongly reduced responses in M200TRF. In the late responses M350TRF, the maximum response occurred for degraded speech that was still comprehensible then declined with reduced intelligibility. Furthermore, we related the TRF components to our other neural "tracking" measures and found that M50TRF and M200TRF play a differential role in the shifting center frequency of the broadband coherence spectra. Overall, our study highlights the importance of time-resolved computation of neural speech tracking and decomposition of coherence spectra and provides a better understanding of degraded speech processing.
Collapse
Affiliation(s)
- Ya-Ping Chen
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria.
| | - Fabian Schmidt
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria
| | - Anne Keitel
- Psychology, School of Social Sciences, University of Dundee, DD1 4HN Dundee, UK
| | - Sebastian Rösch
- Department of Otorhinolaryngology, Paracelsus Medical University, 5020 Salzburg, Austria
| | - Anne Hauswald
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria
| | - Nathan Weisz
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria; Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, 5020 Salzburg, Austria
| |
Collapse
|
48
|
Mischler G, Keshishian M, Bickel S, Mehta AD, Mesgarani N. Deep neural networks effectively model neural adaptation to changing background noise and suggest nonlinear noise filtering methods in auditory cortex. Neuroimage 2023; 266:119819. [PMID: 36529203 PMCID: PMC10510744 DOI: 10.1016/j.neuroimage.2022.119819] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 11/28/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
The human auditory system displays a robust capacity to adapt to sudden changes in background noise, allowing for continuous speech comprehension despite changes in background environments. However, despite comprehensive studies characterizing this ability, the computations that underly this process are not well understood. The first step towards understanding a complex system is to propose a suitable model, but the classical and easily interpreted model for the auditory system, the spectro-temporal receptive field (STRF), cannot match the nonlinear neural dynamics involved in noise adaptation. Here, we utilize a deep neural network (DNN) to model neural adaptation to noise, illustrating its effectiveness at reproducing the complex dynamics at the levels of both individual electrodes and the cortical population. By closely inspecting the model's STRF-like computations over time, we find that the model alters both the gain and shape of its receptive field when adapting to a sudden noise change. We show that the DNN model's gain changes allow it to perform adaptive gain control, while the spectro-temporal change creates noise filtering by altering the inhibitory region of the model's receptive field. Further, we find that models of electrodes in nonprimary auditory cortex also exhibit noise filtering changes in their excitatory regions, suggesting differences in noise filtering mechanisms along the cortical hierarchy. These findings demonstrate the capability of deep neural networks to model complex neural adaptation and offer new hypotheses about the computations the auditory cortex performs to enable noise-robust speech perception in real-world, dynamic environments.
Collapse
Affiliation(s)
- Gavin Mischler
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United States
| | - Menoua Keshishian
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United States
| | - Stephan Bickel
- Hofstra Northwell School of Medicine, Manhasset, New York, United States
| | - Ashesh D Mehta
- Hofstra Northwell School of Medicine, Manhasset, New York, United States
| | - Nima Mesgarani
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United States.
| |
Collapse
|
49
|
Niesen M, Bourguignon M, Bertels J, Vander Ghinst M, Wens V, Goldman S, De Tiège X. Cortical tracking of lexical speech units in a multi-talker background is immature in school-aged children. Neuroimage 2023; 265:119770. [PMID: 36462732 DOI: 10.1016/j.neuroimage.2022.119770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 11/09/2022] [Accepted: 11/23/2022] [Indexed: 12/03/2022] Open
Abstract
Children have more difficulty perceiving speech in noise than adults. Whether this difficulty relates to an immature processing of prosodic or linguistic elements of the attended speech is still unclear. To address the impact of noise on linguistic processing per se, we assessed how babble noise impacts the cortical tracking of intelligible speech devoid of prosody in school-aged children and adults. Twenty adults and twenty children (7-9 years) listened to synthesized French monosyllabic words presented at 2.5 Hz, either randomly or in 4-word hierarchical structures wherein 2 words formed a phrase at 1.25 Hz, and 2 phrases formed a sentence at 0.625 Hz, with or without babble noise. Neuromagnetic responses to words, phrases and sentences were identified and source-localized. Children and adults displayed significant cortical tracking of words in all conditions, and of phrases and sentences only when words formed meaningful sentences. In children compared with adults, the cortical tracking was lower for all linguistic units in conditions without noise. In the presence of noise, the cortical tracking was similarly reduced for sentence units in both groups, but remained stable for phrase units. Critically, when there was noise, adults increased the cortical tracking of monosyllabic words in the inferior frontal gyri and supratemporal auditory cortices but children did not. This study demonstrates that the difficulties of school-aged children in understanding speech in a multi-talker background might be partly due to an immature tracking of lexical but not supra-lexical linguistic units.
Collapse
Affiliation(s)
- Maxime Niesen
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of Otorhinolaryngology, 1070 Brussels, Belgium.
| | - Mathieu Bourguignon
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), UNI-ULB Neuroscience Institute, Laboratory of Neurophysiology and Movement Biomechanics, 1070 Brussels, Belgium.; BCBL, Basque Center on Cognition, Brain and Language, 20009 San Sebastian, Spain
| | - Julie Bertels
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), UNI-ULB Neuroscience Institute, Cognition and Computation group, ULBabyLab - Consciousness, Brussels, Belgium
| | - Marc Vander Ghinst
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of Otorhinolaryngology, 1070 Brussels, Belgium
| | - Vincent Wens
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of translational Neuroimaging, 1070 Brussels, Belgium
| | - Serge Goldman
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of Nuclear Medicine, 1070 Brussels, Belgium
| | - Xavier De Tiège
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of translational Neuroimaging, 1070 Brussels, Belgium
| |
Collapse
|
50
|
Understanding why infant-directed speech supports learning: A dynamic attention perspective. DEVELOPMENTAL REVIEW 2022. [DOI: 10.1016/j.dr.2022.101047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|