1
|
Yasmin S, Irsik VC, Johnsrude IS, Herrmann B. The effects of speech masking on neural tracking of acoustic and semantic features of natural speech. Neuropsychologia 2023:108584. [PMID: 37169066 DOI: 10.1016/j.neuropsychologia.2023.108584] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 04/30/2023] [Accepted: 05/08/2023] [Indexed: 05/13/2023]
Abstract
Listening environments contain background sounds that mask speech and lead to communication challenges. Sensitivity to slow acoustic fluctuations in speech can help segregate speech from background noise. Semantic context can also facilitate speech perception in noise, for example, by enabling prediction of upcoming words. However, not much is known about how different degrees of background masking affect the neural processing of acoustic and semantic features during naturalistic speech listening. In the current electroencephalography (EEG) study, participants listened to engaging, spoken stories masked at different levels of multi-talker babble to investigate how neural activity in response to acoustic and semantic features changes with acoustic challenges, and how such effects relate to speech intelligibility. The pattern of neural response amplitudes associated with both acoustic and semantic speech features across masking levels was U-shaped, such that amplitudes were largest for moderate masking levels. This U-shape may be due to increased attentional focus when speech comprehension is challenging, but manageable. The latency of the neural responses increased linearly with increasing background masking, and neural latency change associated with acoustic processing most closely mirrored the changes in speech intelligibility. Finally, tracking responses related to semantic dissimilarity remained robust until severe speech masking (-3 dB SNR). The current study reveals that neural responses to acoustic features are highly sensitive to background masking and decreasing speech intelligibility, whereas neural responses to semantic features are relatively robust, suggesting that individuals track the meaning of the story well even in moderate background sound.
Collapse
Affiliation(s)
- Sonia Yasmin
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada
| | - Vanessa C Irsik
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada.
| | - Ingrid S Johnsrude
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada; School of Communication and Speech Disorders,The University of Western Ontario, London, ON, N6A 5B7, Canada
| | - Björn Herrmann
- Rotman Research Institute, Baycrest, M6A 2E1, Toronto, ON, Canada; Department of Psychology,University of Toronto, M5S 1A1, Toronto, ON, Canada
| |
Collapse
|
2
|
Zhou D, Zhang G, Dang J, Unoki M, Liu X. Detection of Brain Network Communities During Natural Speech Comprehension From Functionally Aligned EEG Sources. Front Comput Neurosci 2022; 16:919215. [PMID: 35874316 PMCID: PMC9301328 DOI: 10.3389/fncom.2022.919215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 06/14/2022] [Indexed: 11/30/2022] Open
Abstract
In recent years, electroencephalograph (EEG) studies on speech comprehension have been extended from a controlled paradigm to a natural paradigm. Under the hypothesis that the brain can be approximated as a linear time-invariant system, the neural response to natural speech has been investigated extensively using temporal response functions (TRFs). However, most studies have modeled TRFs in the electrode space, which is a mixture of brain sources and thus cannot fully reveal the functional mechanism underlying speech comprehension. In this paper, we propose methods for investigating the brain networks of natural speech comprehension using TRFs on the basis of EEG source reconstruction. We first propose a functional hyper-alignment method with an additive average method to reduce EEG noise. Then, we reconstruct neural sources within the brain based on the EEG signals to estimate TRFs from speech stimuli to source areas, and then investigate the brain networks in the neural source space on the basis of the community detection method. To evaluate TRF-based brain networks, EEG data were recorded in story listening tasks with normal speech and time-reversed speech. To obtain reliable structures of brain networks, we detected TRF-based communities from multiple scales. As a result, the proposed functional hyper-alignment method could effectively reduce the noise caused by individual settings in an EEG experiment and thus improve the accuracy of source reconstruction. The detected brain networks for normal speech comprehension were clearly distinctive from those for non-semantically driven (time-reversed speech) audio processing. Our result indicates that the proposed source TRFs can reflect the cognitive processing of spoken language and that the multi-scale community detection method is powerful for investigating brain networks.
Collapse
Affiliation(s)
- Di Zhou
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
| | - Gaoyan Zhang
- College of Intelligence and Computing, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China
- *Correspondence: Gaoyan Zhang
| | - Jianwu Dang
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
- College of Intelligence and Computing, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China
- Jianwu Dang
| | - Masashi Unoki
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
| | - Xin Liu
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
| |
Collapse
|
3
|
Cohen MX. A tutorial on generalized eigendecomposition for denoising, contrast enhancement, and dimension reduction in multichannel electrophysiology. Neuroimage 2021; 247:118809. [PMID: 34906717 DOI: 10.1016/j.neuroimage.2021.118809] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 11/20/2021] [Accepted: 12/10/2021] [Indexed: 10/19/2022] Open
Abstract
The goal of this paper is to present a theoretical and practical introduction to generalized eigendecomposition (GED), which is a robust and flexible framework used for dimension reduction and source separation in multichannel signal processing. In cognitive electrophysiology, GED is used to create spatial filters that maximize a researcher-specified contrast. For example, one may wish to exploit an assumption that different sources have different frequency content, or that sources vary in magnitude across experimental conditions. GED is fast and easy to compute, performs well in simulated and real data, and is easily adaptable to a variety of specific research goals. This paper introduces GED in a way that ties together myriad individual publications and applications of GED in electrophysiology, and provides sample MATLAB and Python code that can be tested and adapted. Practical considerations and issues that often arise in applications are discussed.
Collapse
Affiliation(s)
- Michael X Cohen
- Donders Centre for Medical Neuroscience, Radboud University Medical Center, the Netherlands.
| |
Collapse
|
4
|
Zuk NJ, Murphy JW, Reilly RB, Lalor EC. Envelope reconstruction of speech and music highlights stronger tracking of speech at low frequencies. PLoS Comput Biol 2021; 17:e1009358. [PMID: 34534211 PMCID: PMC8480853 DOI: 10.1371/journal.pcbi.1009358] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 09/29/2021] [Accepted: 08/18/2021] [Indexed: 11/19/2022] Open
Abstract
The human brain tracks amplitude fluctuations of both speech and music, which reflects acoustic processing in addition to the encoding of higher-order features and one's cognitive state. Comparing neural tracking of speech and music envelopes can elucidate stimulus-general mechanisms, but direct comparisons are confounded by differences in their envelope spectra. Here, we use a novel method of frequency-constrained reconstruction of stimulus envelopes using EEG recorded during passive listening. We expected to see music reconstruction match speech in a narrow range of frequencies, but instead we found that speech was reconstructed better than music for all frequencies we examined. Additionally, models trained on all stimulus types performed as well or better than the stimulus-specific models at higher modulation frequencies, suggesting a common neural mechanism for tracking speech and music. However, speech envelope tracking at low frequencies, below 1 Hz, was associated with increased weighting over parietal channels, which was not present for the other stimuli. Our results highlight the importance of low-frequency speech tracking and suggest an origin from speech-specific processing in the brain.
Collapse
Affiliation(s)
- Nathaniel J. Zuk
- Department of Electronic & Electrical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland
- Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America
- Del Monte Institute of Neuroscience, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Jeremy W. Murphy
- Department of Electronic & Electrical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
| | - Richard B. Reilly
- Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
| | - Edmund C. Lalor
- Department of Electronic & Electrical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America
- Del Monte Institute of Neuroscience, University of Rochester Medical Center, Rochester, New York, United States of America
| |
Collapse
|
5
|
Kuruvila I, Can Demir K, Fischer E, Hoppe U. Inference of the Selective Auditory Attention Using Sequential LMMSE Estimation. IEEE Trans Biomed Eng 2021; 68:3501-3512. [PMID: 33891545 DOI: 10.1109/tbme.2021.3075337] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Attentive listening in a multispeaker environment such as a cocktail party requires suppression of the interfering speakers and the noise around. People with normal hearing perform remarkably well in such situations. Analysis of the cortical signals using electroencephalography (EEG) has revealed that the EEG signals track the envelope of the attended speech stronger than that of the interfering speech. This has enabled the development of algorithms that can decode the selective attention of a listener in controlled experimental settings. However, often these algorithms require longer trial duration and computationally expensive calibration to obtain a reliable inference of attention. In this paper, we present a novel framework to decode the attention of a listener within trial durations of the order of two seconds. It comprises of three modules: 1) Dynamic estimation of the temporal response functions (TRF) in every trial using a sequential linear minimum mean squared error (LMMSE) estimator, 2) Extract the N1 -P2 peak of the estimated TRF that serves as a marker related to the attentional state, and 3) Obtain a probabilistic measure of the attentional state using a support vector machine followed by a logistic regression. The efficacy of the proposed decoding framework was evaluated using EEG data collected from 27 subjects. The total number of electrodes required to infer the attention was four: One for the signal estimation, one for the noise estimation and the other two being the reference and the ground electrodes. Our results make further progress towards the realization of neuro-steered hearing aids.
Collapse
|
6
|
Abstract
Brain activity pattern recognition from EEG or MEG signal analysis is one of the most important method in cognitive neuroscience. The supFunSim library is a new Matlab toolbox which generates accurate EEG forward model and implements a collection of spatial filters for EEG source reconstruction, including the linearly constrained minimum-variance (LCMV), eigenspace LCMV, nulling (NL), and minimum-variance pseudo-unbiased reduced-rank (MV-PURE) filters in various versions. It also enables source-level directed connectivity analysis using partial directed coherence (PDC) measure. The supFunSim library is based on the well-known FieldTrip toolbox for EEG and MEG analysis and is written using object-oriented programming paradigm. The resulting modularity of the toolbox enables its simple extensibility. This paper gives a complete overview of the toolbox from both developer and end-user perspectives, including description of the installation process and use cases.
Collapse
|
7
|
Velasco-Álvarez F, Fernández-Rodríguez Á, Medina-Juliá MT, Ron-Angevin R. Speech stream segregation to control an ERP-based auditory BCI. J Neural Eng 2021; 18. [PMID: 33470970 DOI: 10.1088/1741-2552/abdd44] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 01/19/2021] [Indexed: 11/12/2022]
Abstract
OBJECTIVE The use of natural sounds in auditory Brain-Computer Interfaces (BCI) has been shown to improve classification results and usability. Some auditory BCIs are based on stream segregation, in which the subjects must attend one audio stream and ignore the other(s); these streams include some kind of stimuli to be detected. In this work we focus on Event-Related Potentials (ERP) and study whether providing intelligible content to each audio stream could help the users to better concentrate on the desired stream and so to better attend the target stimuli and to ignore the non-target ones. APPROACH In addition to a control condition, two experimental conditions, based on the selective attention and the cocktail party effect, were tested using two simultaneous and spatialized audio streams: i) the condition A2 consisted of an overlap of auditory stimuli (single syllables) on a background consisting of natural speech for each stream, ii) in condition A3, brief alterations of the natural flow of each speech were used as stimuli. MAIN RESULTS The two experimental proposals improved the results of the control condition (single words as stimuli without a speech background) both in a cross validation analysis of the calibration part and in the online test. The analysis of the ERP responses also presented better discriminability for the two proposals in comparison to the control condition. The results of subjective questionnaires support the better usability of the first experimental condition. SIGNIFICANCE The use of natural speech as background improves the stream segregation in an ERP-based auditory BCI (with significant results in the performance metrics, the ERP waveforms, and in the preference parameter in subjective questionnaires). Future work in the field of ERP-based stream segregation should study the use of natural speech in combination with easily perceived but not distracting stimuli.
Collapse
Affiliation(s)
- Francisco Velasco-Álvarez
- Department of Electronic Technology, Universidad de Malaga, E.T.S.I. Telecomunicación, Campus de Teatinos s/n, Malaga, 29071, SPAIN
| | - Álvaro Fernández-Rodríguez
- Department of Electronic Technology, University of Málaga, E.T.S.I. Telecomunicación, Campus de Teatinos s/n, Málaga, 29071, SPAIN
| | - M Teresa Medina-Juliá
- Department of Electronic Technology, Universidad de Malaga, E.T.S.I. Telecomunicación, Campus de Teatinos s/n, Malaga, 29071, SPAIN
| | - Ricardo Ron-Angevin
- Department of Electronic Technology, Universidad de Malaga, E.T.S.I. Telecomunicación, Campus de Teatinos s/n, Malaga, 29071, SPAIN
| |
Collapse
|