1
|
Duan T, Wang Z, Li F, Doretto G, Adjeroh DA, Yin Y, Tao C. Online continual decoding of streaming EEG signal with a balanced and informative memory buffer. Neural Netw 2024; 176:106338. [PMID: 38692190 DOI: 10.1016/j.neunet.2024.106338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 03/20/2024] [Accepted: 04/23/2024] [Indexed: 05/03/2024]
Abstract
Electroencephalography (EEG) based Brain Computer Interface (BCI) systems play a significant role in facilitating how individuals with neurological impairments effectively interact with their environment. In real world applications of BCI system for clinical assistance and rehabilitation training, the EEG classifier often needs to learn on sequentially arriving subjects in an online manner. As patterns of EEG signals can be significantly different for different subjects, the EEG classifier can easily erase knowledge of learnt subjects after learning on later ones as it performs decoding in online streaming scenario, namely catastrophic forgetting. In this work, we tackle this problem with a memory-based approach, which considers the following conditions: (1) subjects arrive sequentially in an online manner, with no large scale dataset available for joint training beforehand, (2) data volume from the different subjects could be imbalanced, (3) decoding difficulty of the sequential streaming signal vary, (4) continual classification for a long time is required. This online sequential EEG decoding problem is more challenging than classic cross subject EEG decoding as there is no large-scale training data from the different subjects available beforehand. The proposed model keeps a small balanced memory buffer during sequential learning, with memory data dynamically selected based on joint consideration of data volume and informativeness. Furthermore, for the more general scenarios where subject identity is unknown to the EEG decoder, aka. subject agnostic scenario, we propose a kernel based subject shift detection method that identifies underlying subject changes on the fly in a computationally efficient manner. We develop challenging benchmarks of streaming EEG data from sequentially arriving subjects with both balanced and imbalanced data volumes, and performed extensive experiments with a detailed ablation study on the proposed model. The results show the effectiveness of our proposed approach, enabling the decoder to maintain performance on all previously seen subjects over a long period of sequential decoding. The model demonstrates the potential for real-world applications.
Collapse
Affiliation(s)
- Tiehang Duan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, 32246 United States
| | - Zhenyi Wang
- Department of Computer Science, University of Maryland, College Park, MD, 20742, United States
| | - Fang Li
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, 32246 United States
| | - Gianfranco Doretto
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, 26506, United States
| | - Donald A Adjeroh
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, 26506, United States.
| | - Yiyi Yin
- Meta AI, Seattle, WA, 98005, United States
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, 32246 United States.
| |
Collapse
|
2
|
Tanveer MA, Skoglund MA, Bernhardsson B, Alickovic E. Deep learning-based auditory attention decoding in listeners with hearing impairment . J Neural Eng 2024; 21:036022. [PMID: 38729132 DOI: 10.1088/1741-2552/ad49d7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 05/10/2024] [Indexed: 05/12/2024]
Abstract
Objective.This study develops a deep learning (DL) method for fast auditory attention decoding (AAD) using electroencephalography (EEG) from listeners with hearing impairment (HI). It addresses three classification tasks: differentiating noise from speech-in-noise, classifying the direction of attended speech (left vs. right) and identifying the activation status of hearing aid noise reduction algorithms (OFF vs. ON). These tasks contribute to our understanding of how hearing technology influences auditory processing in the hearing-impaired population.Approach.Deep convolutional neural network (DCNN) models were designed for each task. Two training strategies were employed to clarify the impact of data splitting on AAD tasks: inter-trial, where the testing set used classification windows from trials that the training set had not seen, and intra-trial, where the testing set used unseen classification windows from trials where other segments were seen during training. The models were evaluated on EEG data from 31 participants with HI, listening to competing talkers amidst background noise.Main results.Using 1 s classification windows, DCNN models achieve accuracy (ACC) of 69.8%, 73.3% and 82.9% and area-under-curve (AUC) of 77.2%, 80.6% and 92.1% for the three tasks respectively on inter-trial strategy. In the intra-trial strategy, they achieved ACC of 87.9%, 80.1% and 97.5%, along with AUC of 94.6%, 89.1%, and 99.8%. Our DCNN models show good performance on short 1 s EEG samples, making them suitable for real-world applications. Conclusion: Our DCNN models successfully addressed three tasks with short 1 s EEG windows from participants with HI, showcasing their potential. While the inter-trial strategy demonstrated promise for assessing AAD, the intra-trial approach yielded inflated results, underscoring the important role of proper data splitting in EEG-based AAD tasks.Significance.Our findings showcase the promising potential of EEG-based tools for assessing auditory attention in clinical contexts and advancing hearing technology, while also promoting further exploration of alternative DL architectures and their potential constraints.
Collapse
Affiliation(s)
- M Asjid Tanveer
- Department of Automatic Control, Lund University, Lund, Sweden
| | - Martin A Skoglund
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Electrical Engineering, Linköping University, Linkoping, Sweden
| | - Bo Bernhardsson
- Department of Automatic Control, Lund University, Lund, Sweden
| | - Emina Alickovic
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Electrical Engineering, Linköping University, Linkoping, Sweden
| |
Collapse
|
3
|
EskandariNasab M, Raeisi Z, Lashaki RA, Najafi H. A GRU-CNN model for auditory attention detection using microstate and recurrence quantification analysis. Sci Rep 2024; 14:8861. [PMID: 38632246 PMCID: PMC11024110 DOI: 10.1038/s41598-024-58886-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 04/04/2024] [Indexed: 04/19/2024] Open
Abstract
Attention as a cognition ability plays a crucial role in perception which helps humans to concentrate on specific objects of the environment while discarding others. In this paper, auditory attention detection (AAD) is investigated using different dynamic features extracted from multichannel electroencephalography (EEG) signals when listeners attend to a target speaker in the presence of a competing talker. To this aim, microstate and recurrence quantification analysis are utilized to extract different types of features that reflect changes in the brain state during cognitive tasks. Then, an optimized feature set is determined by employing the processes of significant feature selection based on classification performance. The classifier model is developed by hybrid sequential learning that employs Gated Recurrent Units (GRU) and Convolutional Neural Network (CNN) into a unified framework for accurate attention detection. The proposed AAD method shows that the selected feature set achieves the most discriminative features for the classification process. Also, it yields the best performance as compared with state-of-the-art AAD approaches from the literature in terms of various measures. The current study is the first to validate the use of microstate and recurrence quantification parameters to differentiate auditory attention using reinforcement learning without access to stimuli.
Collapse
Affiliation(s)
| | - Zahra Raeisi
- Department of Computer Science, University of Fairleigh Dickinson, Vancouver Campus, Vancouver, Canada
| | - Reza Ahmadi Lashaki
- Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Hamidreza Najafi
- Biomedical Engineering Department, School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
| |
Collapse
|
4
|
Simon A, Bech S, Loquet G, Østergaard J. Cortical linear encoding and decoding of sounds: Similarities and differences between naturalistic speech and music listening. Eur J Neurosci 2024; 59:2059-2074. [PMID: 38303522 DOI: 10.1111/ejn.16265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 11/02/2023] [Accepted: 01/12/2024] [Indexed: 02/03/2024]
Abstract
Linear models are becoming increasingly popular to investigate brain activity in response to continuous and naturalistic stimuli. In the context of auditory perception, these predictive models can be 'encoding', when stimulus features are used to reconstruct brain activity, or 'decoding' when neural features are used to reconstruct the audio stimuli. These linear models are a central component of some brain-computer interfaces that can be integrated into hearing assistive devices (e.g., hearing aids). Such advanced neurotechnologies have been widely investigated when listening to speech stimuli but rarely when listening to music. Recent attempts at neural tracking of music show that the reconstruction performances are reduced compared with speech decoding. The present study investigates the performance of stimuli reconstruction and electroencephalogram prediction (decoding and encoding models) based on the cortical entrainment of temporal variations of the audio stimuli for both music and speech listening. Three hypotheses that may explain differences between speech and music stimuli reconstruction were tested to assess the importance of the speech-specific acoustic and linguistic factors. While the results obtained with encoding models suggest different underlying cortical processing between speech and music listening, no differences were found in terms of reconstruction of the stimuli or the cortical data. The results suggest that envelope-based linear modelling can be used to study both speech and music listening, despite the differences in the underlying cortical mechanisms.
Collapse
Affiliation(s)
- Adèle Simon
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Søren Bech
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Gérard Loquet
- Department of Audiology and Speech Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Jan Østergaard
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
| |
Collapse
|
5
|
Wikman P, Salmela V, Sjöblom E, Leminen M, Laine M, Alho K. Attention to audiovisual speech shapes neural processing through feedback-feedforward loops between different nodes of the speech network. PLoS Biol 2024; 22:e3002534. [PMID: 38466713 PMCID: PMC10957087 DOI: 10.1371/journal.pbio.3002534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 03/21/2024] [Accepted: 01/30/2024] [Indexed: 03/13/2024] Open
Abstract
Selective attention-related top-down modulation plays a significant role in separating relevant speech from irrelevant background speech when vocal attributes separating concurrent speakers are small and continuously evolving. Electrophysiological studies have shown that such top-down modulation enhances neural tracking of attended speech. Yet, the specific cortical regions involved remain unclear due to the limited spatial resolution of most electrophysiological techniques. To overcome such limitations, we collected both electroencephalography (EEG) (high temporal resolution) and functional magnetic resonance imaging (fMRI) (high spatial resolution), while human participants selectively attended to speakers in audiovisual scenes containing overlapping cocktail party speech. To utilise the advantages of the respective techniques, we analysed neural tracking of speech using the EEG data and performed representational dissimilarity-based EEG-fMRI fusion. We observed that attention enhanced neural tracking and modulated EEG correlates throughout the latencies studied. Further, attention-related enhancement of neural tracking fluctuated in predictable temporal profiles. We discuss how such temporal dynamics could arise from a combination of interactions between attention and prediction as well as plastic properties of the auditory cortex. EEG-fMRI fusion revealed attention-related iterative feedforward-feedback loops between hierarchically organised nodes of the ventral auditory object related processing stream. Our findings support models where attention facilitates dynamic neural changes in the auditory cortex, ultimately aiding discrimination of relevant sounds from irrelevant ones while conserving neural resources.
Collapse
Affiliation(s)
- Patrik Wikman
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Viljami Salmela
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Eetu Sjöblom
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
| | - Miika Leminen
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- AI and Analytics Unit, Helsinki University Hospital, Helsinki, Finland
| | - Matti Laine
- Department of Psychology, Åbo Akademi University, Turku, Finland
| | - Kimmo Alho
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| |
Collapse
|
6
|
Ha J, Baek SC, Lim Y, Chung JH. Validation of cost-efficient EEG experimental setup for neural tracking in an auditory attention task. Sci Rep 2023; 13:22682. [PMID: 38114579 PMCID: PMC10730561 DOI: 10.1038/s41598-023-49990-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 12/14/2023] [Indexed: 12/21/2023] Open
Abstract
When individuals listen to speech, their neural activity phase-locks to the slow temporal rhythm, which is commonly referred to as "neural tracking". The neural tracking mechanism allows for the detection of an attended sound source in a multi-talker situation by decoding neural signals obtained by electroencephalography (EEG), known as auditory attention decoding (AAD). Neural tracking with AAD can be utilized as an objective measurement tool for diverse clinical contexts, and it has potential to be applied to neuro-steered hearing devices. To effectively utilize this technology, it is essential to enhance the accessibility of EEG experimental setup and analysis. The aim of the study was to develop a cost-efficient neural tracking system and validate the feasibility of neural tracking measurement by conducting an AAD task using an offline and real-time decoder model outside the soundproof environment. We devised a neural tracking system capable of conducting AAD experiments using an OpenBCI and Arduino board. Nine participants were recruited to assess the performance of the AAD using the developed system, which involved presenting competing speech signals in an experiment setting without soundproofing. As a result, the offline decoder model demonstrated an average performance of 90%, and real-time decoder model exhibited a performance of 78%. The present study demonstrates the feasibility of implementing neural tracking and AAD using cost-effective devices in a practical environment.
Collapse
Affiliation(s)
- Jiyeon Ha
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, 60322, Frankfurt\ Main, Germany
| | - Yoonseob Lim
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea.
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea.
| | - Jae Ho Chung
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea.
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea.
- Department of Otolaryngology-Head and Neck Surgery, College of Medicine, Hanyang University, Seoul, 04763, Korea.
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Hanyang University, 222-Wangshimni-ro, Seongdong-gu, Seoul, 133-792, Korea.
| |
Collapse
|
7
|
Wang B, Xu X, Niu Y, Wu C, Wu X, Chen J. EEG-based auditory attention decoding with audiovisual speech for hearing-impaired listeners. Cereb Cortex 2023; 33:10972-10983. [PMID: 37750333 DOI: 10.1093/cercor/bhad325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 08/21/2023] [Accepted: 08/22/2023] [Indexed: 09/27/2023] Open
Abstract
Auditory attention decoding (AAD) was used to determine the attended speaker during an auditory selective attention task. However, the auditory factors modulating AAD remained unclear for hearing-impaired (HI) listeners. In this study, scalp electroencephalogram (EEG) was recorded with an auditory selective attention paradigm, in which HI listeners were instructed to attend one of the two simultaneous speech streams with or without congruent visual input (articulation movements), and at a high or low target-to-masker ratio (TMR). Meanwhile, behavioral hearing tests (i.e. audiogram, speech reception threshold, temporal modulation transfer function) were used to assess listeners' individual auditory abilities. The results showed that both visual input and increasing TMR could significantly enhance the cortical tracking of the attended speech and AAD accuracy. Further analysis revealed that the audiovisual (AV) gain in attended speech cortical tracking was significantly correlated with listeners' auditory amplitude modulation (AM) sensitivity, and the TMR gain in attended speech cortical tracking was significantly correlated with listeners' hearing thresholds. Temporal response function analysis revealed that subjects with higher AM sensitivity demonstrated more AV gain over the right occipitotemporal and bilateral frontocentral scalp electrodes.
Collapse
Affiliation(s)
- Bo Wang
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Xiran Xu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Yadong Niu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Chao Wu
- School of Nursing, Peking University, Beijing 100191, China
| | - Xihong Wu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| | - Jing Chen
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| |
Collapse
|
8
|
Li J, Hong B, Nolte G, Engel AK, Zhang D. EEG-based speaker-listener neural coupling reflects speech-selective attentional mechanisms beyond the speech stimulus. Cereb Cortex 2023; 33:11080-11091. [PMID: 37814353 DOI: 10.1093/cercor/bhad347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 09/01/2023] [Accepted: 09/04/2023] [Indexed: 10/11/2023] Open
Abstract
When we pay attention to someone, do we focus only on the sound they make, the word they use, or do we form a mental space shared with the speaker we want to pay attention to? Some would argue that the human language is no other than a simple signal, but others claim that human beings understand each other because they form a shared mental ground between the speaker and the listener. Our study aimed to explore the neural mechanisms of speech-selective attention by investigating the electroencephalogram-based neural coupling between the speaker and the listener in a cocktail party paradigm. The temporal response function method was employed to reveal how the listener was coupled to the speaker at the neural level. The results showed that the neural coupling between the listener and the attended speaker peaked 5 s before speech onset at the delta band over the left frontal region, and was correlated with speech comprehension performance. In contrast, the attentional processing of speech acoustics and semantics occurred primarily at a later stage after speech onset and was not significantly correlated with comprehension performance. These findings suggest a predictive mechanism to achieve speaker-listener neural coupling for successful speech comprehension.
Collapse
Affiliation(s)
- Jiawei Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Education and Psychology, Freie Universität Berlin, Habelschwerdter Allee, Berlin 14195, Germany
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| |
Collapse
|
9
|
Schüller A, Schilling A, Krauss P, Rampp S, Reichenbach T. Attentional Modulation of the Cortical Contribution to the Frequency-Following Response Evoked by Continuous Speech. J Neurosci 2023; 43:7429-7440. [PMID: 37793908 PMCID: PMC10621774 DOI: 10.1523/jneurosci.1247-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 09/07/2023] [Accepted: 09/21/2023] [Indexed: 10/06/2023] Open
Abstract
Selective attention to one of several competing speakers is required for comprehending a target speaker among other voices and for successful communication with them. It moreover has been found to involve the neural tracking of low-frequency speech rhythms in the auditory cortex. Effects of selective attention have also been found in subcortical neural activities, in particular regarding the frequency-following response related to the fundamental frequency of speech (speech-FFR). Recent investigations have, however, shown that the speech-FFR contains cortical contributions as well. It remains unclear whether these are also modulated by selective attention. Here we used magnetoencephalography to assess the attentional modulation of the cortical contributions to the speech-FFR. We presented both male and female participants with two competing speech signals and analyzed the cortical responses during attentional switching between the two speakers. Our findings revealed robust attentional modulation of the cortical contribution to the speech-FFR: the neural responses were higher when the speaker was attended than when they were ignored. We also found that, regardless of attention, a voice with a lower fundamental frequency elicited a larger cortical contribution to the speech-FFR than a voice with a higher fundamental frequency. Our results show that the attentional modulation of the speech-FFR does not only occur subcortically but extends to the auditory cortex as well.SIGNIFICANCE STATEMENT Understanding speech in noise requires attention to a target speaker. One of the speech features that a listener can use to identify a target voice among others and attend it is the fundamental frequency, together with its higher harmonics. The fundamental frequency arises from the opening and closing of the vocal folds and is tracked by high-frequency neural activity in the auditory brainstem and in the cortex. Previous investigations showed that the subcortical neural tracking is modulated by selective attention. Here we show that attention affects the cortical tracking of the fundamental frequency as well: it is stronger when a particular voice is attended than when it is ignored.
Collapse
Affiliation(s)
- Alina Schüller
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Achim Schilling
- Neuroscience Laboratory, University Hospital Erlangen, 91058 Erlangen, Germany
| | - Patrick Krauss
- Neuroscience Laboratory, University Hospital Erlangen, 91058 Erlangen, Germany
- Pattern Recognition Lab, Department Computer Science, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Stefan Rampp
- Department of Neurosurgery, University Hospital Erlangen, 91058 Erlangen, Germany
- Department of Neurosurgery, University Hospital Halle (Saale), 06120 Halle (Saale), Germany
- Department of Neuroradiology, University Hospital Erlangen, 91058 Erlangen, Germany
| | - Tobias Reichenbach
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| |
Collapse
|
10
|
Kurmanavičiūtė D, Kataja H, Jas M, Välilä A, Parkkonen L. Target of selective auditory attention can be robustly followed with MEG. Sci Rep 2023; 13:10959. [PMID: 37414861 PMCID: PMC10325959 DOI: 10.1038/s41598-023-37959-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 06/30/2023] [Indexed: 07/08/2023] Open
Abstract
Selective auditory attention enables filtering of relevant acoustic information from irrelevant. Specific auditory responses, measurable by magneto- and electroencephalography (MEG/EEG), are known to be modulated by attention to the evoking stimuli. However, such attention effects have typically been studied in unnatural conditions (e.g. during dichotic listening of pure tones) and have been demonstrated mostly in averaged auditory evoked responses. To test how reliably we can detect the attention target from unaveraged brain responses, we recorded MEG data from 15 healthy subjects that were presented with two human speakers uttering continuously the words "Yes" and "No" in an interleaved manner. The subjects were asked to attend to one speaker. To investigate which temporal and spatial aspects of the responses carry the most information about the target of auditory attention, we performed spatially and temporally resolved classification of the unaveraged MEG responses using a support vector machine. Sensor-level decoding of the responses to attended vs. unattended words resulted in a mean accuracy of [Formula: see text] (N = 14) for both stimulus words. The discriminating information was mostly available 200-400 ms after the stimulus onset. Spatially-resolved source-level decoding indicated that the most informative sources were in the auditory cortices, in both the left and right hemisphere. Our result corroborates attention modulation of auditory evoked responses and shows that such modulations are detectable in unaveraged MEG responses at high accuracy, which could be exploited e.g. in an intuitive brain-computer interface.
Collapse
Affiliation(s)
- Dovilė Kurmanavičiūtė
- Department of Neuroscience and Biomedical Engineering, Aalto University, P.O. Box 12200, 00076, Aalto, Finland.
| | - Hanna Kataja
- Department of Neuroscience and Biomedical Engineering, Aalto University, P.O. Box 12200, 00076, Aalto, Finland
| | - Mainak Jas
- Department of Neuroscience and Biomedical Engineering, Aalto University, P.O. Box 12200, 00076, Aalto, Finland
- Athinoula A. Martinos Center for Biomedical Imaging, 149 Thirteenth Street, Charlestown, MA, 02129, USA
| | - Anne Välilä
- Department of Neuroscience and Biomedical Engineering, Aalto University, P.O. Box 12200, 00076, Aalto, Finland
| | - Lauri Parkkonen
- Department of Neuroscience and Biomedical Engineering, Aalto University, P.O. Box 12200, 00076, Aalto, Finland
- Aalto NeuroImaging, Aalto University, 00076, Aalto, Finland
| |
Collapse
|
11
|
Cai S, Li J, Yang H, Li H. RGCnet: An Efficient Recursive Gated Convolutional Network for EEG-based Auditory Attention Detection. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083536 DOI: 10.1109/embc40787.2023.10340432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Humans are able to listen to one speaker and disregard others in a speaking crowd, referred to as the "cocktail party effect". EEG-based auditory attention detection (AAD) seeks to identify whom a listener is listening to by decoding one's EEG signals. Recent research has demonstrated that the self-attention mechanism is effective for AAD. In this paper, we present the Recursive Gated Convolutional network (RGCnet) for AAD, which implements long-range and high-order interactions as a self-attention mechanism, while maintaining a low computational cost. The RGCnet expands the 2nd order feature interactions to a higher order to model the complex interactions between EEG features. We evaluate RGCnet on two public datasets and compare it with other AAD models. Our results demonstrate that RGCnet outperforms other comparative models under various conditions, thus potentially improving the control of neuro-steered hearing devices.
Collapse
|
12
|
Rosenkranz M, Cetin T, Uslar VN, Bleichner MG. Investigating the attentional focus to workplace-related soundscapes in a complex audio-visual-motor task using EEG. FRONTIERS IN NEUROERGONOMICS 2023; 3:1062227. [PMID: 38235454 PMCID: PMC10790850 DOI: 10.3389/fnrgo.2022.1062227] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 12/16/2022] [Indexed: 01/19/2024]
Abstract
Introduction In demanding work situations (e.g., during a surgery), the processing of complex soundscapes varies over time and can be a burden for medical personnel. Here we study, using mobile electroencephalography (EEG), how humans process workplace-related soundscapes while performing a complex audio-visual-motor task (3D Tetris). Specifically, we wanted to know how the attentional focus changes the processing of the soundscape as a whole. Method Participants played a game of 3D Tetris in which they had to use both hands to control falling blocks. At the same time, participants listened to a complex soundscape, similar to what is found in an operating room (i.e., the sound of machinery, people talking in the background, alarm sounds, and instructions). In this within-subject design, participants had to react to instructions (e.g., "place the next block in the upper left corner") and to sounds depending on the experimental condition, either to a specific alarm sound originating from a fixed location or to a beep sound that originated from varying locations. Attention to the alarm reflected a narrow attentional focus, as it was easy to detect and most of the soundscape could be ignored. Attention to the beep reflected a wide attentional focus, as it required the participants to monitor multiple different sound streams. Results and discussion Results show the robustness of the N1 and P3 event related potential response during this dynamic task with a complex auditory soundscape. Furthermore, we used temporal response functions to study auditory processing to the whole soundscape. This work is a step toward studying workplace-related sound processing in the operating room using mobile EEG.
Collapse
Affiliation(s)
- Marc Rosenkranz
- Neurophysiology of Everyday Life Group, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Timur Cetin
- Pius-Hospital Oldenburg, University Hospital for Visceral Surgery, University of Oldenburg, Oldenburg, Germany
| | - Verena N. Uslar
- Pius-Hospital Oldenburg, University Hospital for Visceral Surgery, University of Oldenburg, Oldenburg, Germany
| | - Martin G. Bleichner
- Neurophysiology of Everyday Life Group, Department of Psychology, University of Oldenburg, Oldenburg, Germany
- Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
13
|
Mesik J, Wojtczak M. The effects of data quantity on performance of temporal response function analyses of natural speech processing. Front Neurosci 2023; 16:963629. [PMID: 36711133 PMCID: PMC9878558 DOI: 10.3389/fnins.2022.963629] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 12/26/2022] [Indexed: 01/15/2023] Open
Abstract
In recent years, temporal response function (TRF) analyses of neural activity recordings evoked by continuous naturalistic stimuli have become increasingly popular for characterizing response properties within the auditory hierarchy. However, despite this rise in TRF usage, relatively few educational resources for these tools exist. Here we use a dual-talker continuous speech paradigm to demonstrate how a key parameter of experimental design, the quantity of acquired data, influences TRF analyses fit to either individual data (subject-specific analyses), or group data (generic analyses). We show that although model prediction accuracy increases monotonically with data quantity, the amount of data required to achieve significant prediction accuracies can vary substantially based on whether the fitted model contains densely (e.g., acoustic envelope) or sparsely (e.g., lexical surprisal) spaced features, especially when the goal of the analyses is to capture the aspect of neural responses uniquely explained by specific features. Moreover, we demonstrate that generic models can exhibit high performance on small amounts of test data (2-8 min), if they are trained on a sufficiently large data set. As such, they may be particularly useful for clinical and multi-task study designs with limited recording time. Finally, we show that the regularization procedure used in fitting TRF models can interact with the quantity of data used to fit the models, with larger training quantities resulting in systematically larger TRF amplitudes. Together, demonstrations in this work should aid new users of TRF analyses, and in combination with other tools, such as piloting and power analyses, may serve as a detailed reference for choosing acquisition duration in future studies.
Collapse
|
14
|
Dolhopiatenko H, Nogueira W. Selective attention decoding in bimodal cochlear implant users. Front Neurosci 2023; 16:1057605. [PMID: 36711138 PMCID: PMC9874229 DOI: 10.3389/fnins.2022.1057605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 12/20/2022] [Indexed: 01/12/2023] Open
Abstract
The growing group of cochlear implant (CI) users includes subjects with preserved acoustic hearing on the opposite side to the CI. The use of both listening sides results in improved speech perception in comparison to listening with one side alone. However, large variability in the measured benefit is observed. It is possible that this variability is associated with the integration of speech across electric and acoustic stimulation modalities. However, there is a lack of established methods to assess speech integration between electric and acoustic stimulation and consequently to adequately program the devices. Moreover, existing methods do not provide information about the underlying physiological mechanisms of this integration or are based on simple stimuli that are difficult to relate to speech integration. Electroencephalography (EEG) to continuous speech is promising as an objective measure of speech perception, however, its application in CIs is challenging because it is influenced by the electrical artifact introduced by these devices. For this reason, the main goal of this work is to investigate a possible electrophysiological measure of speech integration between electric and acoustic stimulation in bimodal CI users. For this purpose, a selective attention decoding paradigm has been designed and validated in bimodal CI users. The current study included behavioral and electrophysiological measures. The behavioral measure consisted of a speech understanding test, where subjects repeated words to a target speaker in the presence of a competing voice listening with the CI side (CIS) only, with the acoustic side (AS) only or with both listening sides (CIS+AS). Electrophysiological measures included cortical auditory evoked potentials (CAEPs) and selective attention decoding through EEG. CAEPs were recorded to broadband stimuli to confirm the feasibility to record cortical responses with CIS only, AS only, and CIS+AS listening modes. In the selective attention decoding paradigm a co-located target and a competing speech stream were presented to the subjects using the three listening modes (CIS only, AS only, and CIS+AS). The main hypothesis of the current study is that selective attention can be decoded in CI users despite the presence of CI electrical artifact. If selective attention decoding improves combining electric and acoustic stimulation with respect to electric stimulation alone, the hypothesis can be confirmed. No significant difference in behavioral speech understanding performance when listening with CIS+AS and AS only was found, mainly due to the ceiling effect observed with these two listening modes. The main finding of the current study is the possibility to decode selective attention in CI users even if continuous artifact is present. Moreover, an amplitude reduction of the forward transfer response function (TRF) of selective attention decoding was observed when listening with CIS+AS compared to AS only. Further studies to validate selective attention decoding as an electrophysiological measure of electric acoustic speech integration are required.
Collapse
|
15
|
Holtze B, Rosenkranz M, Jaeger M, Debener S, Mirkovic B. Ear-EEG Measures of Auditory Attention to Continuous Speech. Front Neurosci 2022; 16:869426. [PMID: 35592265 PMCID: PMC9111016 DOI: 10.3389/fnins.2022.869426] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 03/25/2022] [Indexed: 11/13/2022] Open
Abstract
Auditory attention is an important cognitive function used to separate relevant from irrelevant auditory information. However, most findings on attentional selection have been obtained in highly controlled laboratory settings using bulky recording setups and unnaturalistic stimuli. Recent advances in electroencephalography (EEG) facilitate the measurement of brain activity outside the laboratory, and around-the-ear sensors such as the cEEGrid promise unobtrusive acquisition. In parallel, methods such as speech envelope tracking, intersubject correlations and spectral entropy measures emerged which allow us to study attentional effects in the neural processing of natural, continuous auditory scenes. In the current study, we investigated whether these three attentional measures can be reliably obtained when using around-the-ear EEG. To this end, we analyzed the cEEGrid data of 36 participants who attended to one of two simultaneously presented speech streams. Speech envelope tracking results confirmed a reliable identification of the attended speaker from cEEGrid data. The accuracies in identifying the attended speaker increased when fitting the classification model to the individual. Artifact correction of the cEEGrid data with artifact subspace reconstruction did not increase the classification accuracy. Intersubject correlations were higher for those participants attending to the same speech stream than for those attending to different speech streams, replicating previously obtained results with high-density cap-EEG. We also found that spectral entropy decreased over time, possibly reflecting the decrease in the listener's level of attention. Overall, these results support the idea of using ear-EEG measurements to unobtrusively monitor auditory attention to continuous speech. This knowledge may help to develop assistive devices that support listeners separating relevant from irrelevant information in complex auditory environments.
Collapse
Affiliation(s)
- Björn Holtze
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Marc Rosenkranz
- Neurophysiology of Everyday Life Group, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
- Division Hearing, Speech and Audio Technology, Fraunhofer Institute for Digital Media Technology IDMT, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
- Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
- Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
16
|
Nogueira W, Dolhopiatenko H. Predicting speech intelligibility from a selective attention decoding paradigm in cochlear implant users. J Neural Eng 2022; 19. [PMID: 35234663 DOI: 10.1088/1741-2552/ac599f] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 03/01/2022] [Indexed: 11/12/2022]
Abstract
OBJECTIVES Electroencephalography (EEG) can be used to decode selective attention in cochlear implant (CI) users. This work investigates if selective attention to an attended speech source in the presence of a concurrent speech source can predict speech understanding in CI users. APPROACH CI users were instructed to attend to one out of two speech streams while EEG was recorded. Both speech streams were presented to the same ear and at different signal to interference ratios (SIRs). Speech envelope reconstruction of the to-be-attended speech from EEG was obtained by training decoders using regularized least squares. The correlation coefficient between the reconstructed and the attended (ρ_(A_SIR )) or the unattended (ρ_(U_SIR )) speech stream at each SIR was computed. Additionally, we computed the difference correlation coefficient at the same 〖(ρ〗_Diff= ρ_(A_SIR )-ρ_(U_SIR )) and opposite SIR (ρ_DiffOpp= ρ_(A_SIR )-ρ_(U_(-SIR) )). ρ_Diff compares the attended and unattended correlation coefficient to speech sources presented at different presentation levels depending on SIR. In contrast, ρ_DiffOpp compares the attended and unattended correlation coefficients to speech sources presented at the same presentation level irrespective of SIR. MAIN RESULTS Selective attention decoding in CI users is possible even if both speech streams are presented monaurally. A significant effect of SIR on ρ_(A_SIR ), ρ_Diff and ρ_DiffOpp, but not on ρ_(U_SIR ), was observed. Finally, the results show a significant correlation between speech understanding performance and ρ_(A_SIR ) as well as with ρ_(U_SIR ) across subjects. Moreover, ρ_DiffOpp which is less affected by the CI artifact, also demonstrated a significant correlation with speech understanding. SIGNIFICANCE Selective attention decoding in CI users is possible, however care needs to be taken with the CI artifact and the speech material used to train the decoders. These results are important for future development of objective speech understanding measures for CI users.
Collapse
Affiliation(s)
- Waldo Nogueira
- Department of Otolaryngology and Cluster of Excellence "Hearing4all", Hannover Medical School, Karl-Wiechert Allee, 3, Hannover, Niedersachsen, 30625, GERMANY
| | - Hanna Dolhopiatenko
- Department of Otolaryngology and Cluster of Excellence "Hearing4all", Hannover Medical School, Karl-Wiechert Allee, 3, Hannover, Niedersachsen, 30625, GERMANY
| |
Collapse
|
17
|
Huet MP, Micheyl C, Gaudrain E, Parizet E. Vocal and semantic cues for the segregation of long concurrent speech stimuli in diotic and dichotic listening-The Long-SWoRD test. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:1557. [PMID: 35364949 DOI: 10.1121/10.0007225] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 10/25/2021] [Indexed: 06/14/2023]
Abstract
It is not always easy to follow a conversation in a noisy environment. To distinguish between two speakers, a listener must mobilize many perceptual and cognitive processes to maintain attention on a target voice and avoid shifting attention to the background noise. The development of an intelligibility task with long stimuli-the Long-SWoRD test-is introduced. This protocol allows participants to fully benefit from the cognitive resources, such as semantic knowledge, to separate two talkers in a realistic listening environment. Moreover, this task also provides the experimenters with a means to infer fluctuations in auditory selective attention. Two experiments document the performance of normal-hearing listeners in situations where the perceptual separability of the competing voices ranges from easy to hard using a combination of voice and binaural cues. The results show a strong effect of voice differences when the voices are presented diotically. In addition, analyzing the influence of the semantic context on the pattern of responses indicates that the semantic information induces a response bias in situations where the competing voices are distinguishable and indistinguishable from one another.
Collapse
Affiliation(s)
- Moïra-Phoebé Huet
- Laboratory of Vibration and Acoustics, National Institute of Applied Sciences, University of Lyon, 20 Avenue Albert Einstein, Villeurbanne, 69100, France
| | | | - Etienne Gaudrain
- Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Centre National de la Recerche Scientifique UMR5292, Institut National de la Santé et de la Recherche Médicale U1028, Université Claude Bernard Lyon 1, Université de Lyon, Centre Hospitalier Le Vinatier, Neurocampus, 95 boulevard Pinel, Bron Cedex, 69675, France
| | - Etienne Parizet
- Laboratory of Vibration and Acoustics, National Institute of Applied Sciences, University of Lyon, 20 Avenue Albert Einstein, Villeurbanne, 69100, France
| |
Collapse
|
18
|
Aldag N, Büchner A, Lenarz T, Nogueira W. Towards decoding selective attention through cochlear implant electrodes as sensors in subjects with contralateral acoustic hearing. J Neural Eng 2022; 19. [DOI: 10.1088/1741-2552/ac4de6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 01/21/2022] [Indexed: 11/12/2022]
Abstract
Abstract
Objectives: Focusing attention on one speaker in a situation with multiple background speakers or noise is referred to as auditory selective attention. Decoding selective attention is an interesting line of research with respect to future brain-guided hearing aids or cochlear implants (CIs) that are designed to adaptively adjust sound processing through cortical feedback loops. This study investigates the feasibility of using the electrodes and backward telemetry of a CI to record electroencephalography (EEG). Approach: The study population included 6 normal-hearing (NH) listeners and 5 CI users with contralateral acoustic hearing. Cortical auditory evoked potentials (CAEP) and selective attention were recorded using a state-of-the-art high-density scalp EEG and, in the case of CI users, also using two CI electrodes as sensors in combination with the backward telemetry system of these devices (iEEG). Main results: In the selective attention paradigm with multi-channel scalp EEG the mean decoding accuracy across subjects was 94.8 % and 94.6 % for NH listeners and CI users, respectively. With single-channel scalp EEG the accuracy dropped but was above chance level in 8 to 9 out of 11 subjects, depending on the electrode montage. With the single-channel iEEG, the selective attention decoding accuracy could only be analyzed in 2 out of 5 CI users due to a loss of data in the other 3 subjects. In these 2 CI users, the selective attention decoding accuracy was above chance level. Significance: This study shows that single-channel EEG is suitable for auditory selective attention decoding, even though it reduces the decoding quality compared to a multi-channel approach. CI-based iEEG can be used for the purpose of recording CAEPs and decoding selective attention. However, the study also points out the need for further technical development for the CI backward telemetry regarding long-term recordings and the optimal sensor positions.
Collapse
|
19
|
Su E, Cai S, Xie L, Li H, Schultz T. STAnet: A Spatiotemporal Attention Network for Decoding Auditory Spatial Attention from EEG. IEEE Trans Biomed Eng 2022; 69:2233-2242. [PMID: 34982671 DOI: 10.1109/tbme.2022.3140246] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Humans are able to localize the source of a sound. This enables them to direct attention to a particular speaker in a cocktail party. Psycho-acoustic studies show that the sensory cortices of the human brain respond to the location of sound sources differently, and the auditory attention itself is a dynamic and temporally based brain activity. In this work, we seek to build a computational model which uses both spatial and temporal information manifested in EEG signals for auditory spatial attention detection (ASAD). METHODS We propose an end-to-end spatiotemporal attention network, denoted as STAnet, to detect auditory spatial attention from EEG. The STAnet is designed to assign differentiated weights dynamically to EEG channels through a spatial attention mechanism, and to temporal patterns in EEG signals through a temporal attention mechanism. RESULTS We report the ASAD experiments on two publicly available datasets. The STAnet outperforms other competitive models by a large margin under various experimental conditions. Its attention decision for 1-second decision window outperforms that of the state-of-the-art techniques for 10-second decision window. Experimental results also demonstrate that the STAnet achieves competitive performance on EEG signals ranging from 64 to as few as 16 channels. CONCLUSION This study provides evidence suggesting that efficient low-density EEG online decoding is within reach. SIGNIFICANCE This study also marks an important step towards the practical implementation of ASAD in real life applications.
Collapse
|
20
|
Su E, Cai S, Li P, Xie L, Li H. Auditory Attention Detection with EEG Channel Attention. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:5804-5807. [PMID: 34892439 DOI: 10.1109/embc46164.2021.9630508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Auditory attention detection (AAD) seeks to detect the attended speech from EEG signals in a multi-talker scenario, i.e. cocktail party. As the EEG channels reflect the activities of different brain areas, a task-oriented channel selection technique improves the performance of brain-computer interface applications. In this study, we propose a soft channel attention mechanism, instead of hard channel selection, that derives an EEG channel mask by optimizing the auditory attention detection task. The neural AAD system consists of a neural channel attention mechanism and a convolutional neural network (CNN) classifier. We evaluate the proposed framework on a publicly available database. We achieve 88.3% and 77.2% for 2-second and 0.1-second decision windows with 64-channel EEG; and 86.1% and 83.9% for 2-second decision windows with 32-channel and 16-channel EEG, respectively. The proposed framework outperforms other competitive models by a large margin across all test cases.
Collapse
|
21
|
Mundanad Narayanan A, Zink R, Bertrand A. EEG miniaturization limits for stimulus decoding with EEG sensor networks. J Neural Eng 2021; 18. [PMID: 34517358 DOI: 10.1088/1741-2552/ac2629] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 09/13/2021] [Indexed: 11/12/2022]
Abstract
Objective. Unobtrusive electroencephalography (EEG) monitoring in everyday life requires the availability of highly miniaturized EEG devices (mini-EEGs), which ideally consist of a wireless node with a small scalp area footprint, in which the electrodes, amplifier and wireless radio are embedded. By attaching a multitude of mini-EEGs at relevant positions on the scalp, a wireless 'EEG sensor network' (WESN) can be formed. However, each mini-EEG in the network only has access to its own local electrodes, thereby recording local scalp potentials with short inter-electrode distances. This is unlike using traditional cap-EEG, which by the virtue of re-referencing can measure EEG across arbitrarily large distances on the scalp. We evaluate the implications and limitations of such far-driven miniaturization on neural decoding performance.Approach. We collected 255-channel EEG data in an auditory attention decoding (AAD) task. As opposed to previous studies with a lower channel density, this new high-density dataset allows emulation of mini-EEGs with inter-electrode distances down to 1 cm in order to identify and quantify the lower bound on miniaturization for EEG-based stimulus decoding.Main results. We demonstrate that the performance remains reasonably stable for inter-electrode distances down to 3 cm, but decreases quickly for shorter distances if the mini-EEG nodes can be placed at optimal scalp locations and orientations selected by a data-driven algorithm.Significance. The results indicate the potential for the use of mini-EEGs in a WESN context for AAD applications and provide guidance on inter-electrode distances while designing such devices for neuro-steered hearing devices.
Collapse
Affiliation(s)
- Abhijith Mundanad Narayanan
- KU Leuven, Dept. of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics (STADIUS), Kasteelpark Arenberg 10, B-3001 Leuven, Belgium.,Leuven.AI-KU Leuven institute for AI, B-3000 Leuven, Belgium
| | - Rob Zink
- KU Leuven, Dept. of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics (STADIUS), Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
| | - Alexander Bertrand
- KU Leuven, Dept. of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics (STADIUS), Kasteelpark Arenberg 10, B-3001 Leuven, Belgium.,Leuven.AI-KU Leuven institute for AI, B-3000 Leuven, Belgium
| |
Collapse
|
22
|
Islam MN, Sulaiman N, Farid FA, Uddin J, Alyami SA, Rashid M, P.P. Abdul Majeed A, Moni MA. Diagnosis of hearing deficiency using EEG based AEP signals: CWT and improved-VGG16 pipeline. PeerJ Comput Sci 2021; 7:e638. [PMID: 34712786 PMCID: PMC8507488 DOI: 10.7717/peerj-cs.638] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 06/21/2021] [Indexed: 05/14/2023]
Abstract
Hearing deficiency is the world's most common sensation of impairment and impedes human communication and learning. Early and precise hearing diagnosis using electroencephalogram (EEG) is referred to as the optimum strategy to deal with this issue. Among a wide range of EEG control signals, the most relevant modality for hearing loss diagnosis is auditory evoked potential (AEP) which is produced in the brain's cortex area through an auditory stimulus. This study aims to develop a robust intelligent auditory sensation system utilizing a pre-train deep learning framework by analyzing and evaluating the functional reliability of the hearing based on the AEP response. First, the raw AEP data is transformed into time-frequency images through the wavelet transformation. Then, lower-level functionality is eliminated using a pre-trained network. Here, an improved-VGG16 architecture has been designed based on removing some convolutional layers and adding new layers in the fully connected block. Subsequently, the higher levels of the neural network architecture are fine-tuned using the labelled time-frequency images. Finally, the proposed method's performance has been validated by a reputed publicly available AEP dataset, recorded from sixteen subjects when they have heard specific auditory stimuli in the left or right ear. The proposed method outperforms the state-of-art studies by improving the classification accuracy to 96.87% (from 57.375%), which indicates that the proposed improved-VGG16 architecture can significantly deal with AEP response in early hearing loss diagnosis.
Collapse
Affiliation(s)
- Md Nahidul Islam
- Faculty of Electrical and Electronics Engineering Technology, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
| | - Norizam Sulaiman
- Faculty of Electrical and Electronics Engineering Technology, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
| | - Fahmid Al Farid
- Faculty of Computing and Informatics, Multimedia University, Malaysia
| | - Jia Uddin
- Technology Studies Department, Endicott College, Woosong university, Daejeon, South Korea
| | - Salem A. Alyami
- Department of Mathematics and Statistics, Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia
| | - Mamunur Rashid
- Faculty of Electrical and Electronics Engineering Technology, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
| | - Anwar P.P. Abdul Majeed
- Innovative Manufacturing, Mechatronics and Sports Laboratory, Faculty of Manufacturing and Mechatronic Engineering Technology, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
- Centre for Software Development & Integrated Computing, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
| | - Mohammad Ali Moni
- School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland St Lucia, Australia
| |
Collapse
|
23
|
Hausfeld L, Disbergen NR, Valente G, Zatorre RJ, Formisano E. Modulating Cortical Instrument Representations During Auditory Stream Segregation and Integration With Polyphonic Music. Front Neurosci 2021; 15:635937. [PMID: 34630007 PMCID: PMC8498193 DOI: 10.3389/fnins.2021.635937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 08/24/2021] [Indexed: 11/13/2022] Open
Abstract
Numerous neuroimaging studies demonstrated that the auditory cortex tracks ongoing speech and that, in multi-speaker environments, tracking of the attended speaker is enhanced compared to the other irrelevant speakers. In contrast to speech, multi-instrument music can be appreciated by attending not only on its individual entities (i.e., segregation) but also on multiple instruments simultaneously (i.e., integration). We investigated the neural correlates of these two modes of music listening using electroencephalography (EEG) and sound envelope tracking. To this end, we presented uniquely composed music pieces played by two instruments, a bassoon and a cello, in combination with a previously validated music auditory scene analysis behavioral paradigm (Disbergen et al., 2018). Similar to results obtained through selective listening tasks for speech, relevant instruments could be reconstructed better than irrelevant ones during the segregation task. A delay-specific analysis showed higher reconstruction for the relevant instrument during a middle-latency window for both the bassoon and cello and during a late window for the bassoon. During the integration task, we did not observe significant attentional modulation when reconstructing the overall music envelope. Subsequent analyses indicated that this null result might be due to the heterogeneous strategies listeners employ during the integration task. Overall, our results suggest that subsequent to a common processing stage, top-down modulations consistently enhance the relevant instrument's representation during an instrument segregation task, whereas such an enhancement is not observed during an instrument integration task. These findings extend previous results from speech tracking to the tracking of multi-instrument music and, furthermore, inform current theories on polyphonic music perception.
Collapse
Affiliation(s)
- Lars Hausfeld
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
| | - Niels R Disbergen
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
| | - Giancarlo Valente
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
| | - Robert J Zatorre
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada
| | - Elia Formisano
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, Netherlands
- Brightlands Institute for Smart Society (BISS), Maastricht University, Maastricht, Netherlands
| |
Collapse
|
24
|
Lu Y, Wang M, Yao L, Shen H, Wu W, Zhang Q, Zhang L, Chen M, Liu H, Peng R, Liu M, Chen S. Auditory attention decoding from electroencephalography based on long short-term memory networks. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102966] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
25
|
Li J, Hong B, Nolte G, Engel AK, Zhang D. Preparatory delta phase response is correlated with naturalistic speech comprehension performance. Cogn Neurodyn 2021; 16:337-352. [PMID: 35401861 PMCID: PMC8934811 DOI: 10.1007/s11571-021-09711-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 07/09/2021] [Accepted: 08/12/2021] [Indexed: 01/07/2023] Open
Abstract
While human speech comprehension is thought to be an active process that involves top-down predictions, it remains unclear how predictive information is used to prepare for the processing of upcoming speech information. We aimed to identify the neural signatures of the preparatory processing of upcoming speech. Participants selectively attended to one of two competing naturalistic, narrative speech streams, and a temporal response function (TRF) method was applied to derive event-related-like neural responses from electroencephalographic data. The phase responses to the attended speech at the delta band (1-4 Hz) were correlated with the comprehension performance of individual participants, with a latency of - 200-0 ms relative to the onset of speech amplitude envelope fluctuations over the fronto-central and left-lateralized parietal electrodes. The phase responses to the attended speech at the alpha band also correlated with comprehension performance but with a latency of 650-980 ms post-onset over the fronto-central electrodes. Distinct neural signatures were found for the attentional modulation, taking the form of TRF-based amplitude responses at a latency of 240-320 ms post-onset over the left-lateralized fronto-central and occipital electrodes. Our findings reveal how the brain gets prepared to process an upcoming speech in a continuous, naturalistic speech context.
Collapse
Affiliation(s)
- Jiawei Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China
| | - Bo Hong
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg, Germany
| | - Andreas K. Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg, Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China
| |
Collapse
|
26
|
Kuruvila I, Muncke J, Fischer E, Hoppe U. Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model. Front Physiol 2021; 12:700655. [PMID: 34408661 PMCID: PMC8365753 DOI: 10.3389/fphys.2021.700655] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 07/05/2021] [Indexed: 11/25/2022] Open
Abstract
Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)—long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy.
Collapse
Affiliation(s)
- Ivine Kuruvila
- Department of Audiology, ENT-Clinic, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Jan Muncke
- Department of Audiology, ENT-Clinic, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | | | - Ulrich Hoppe
- Department of Audiology, ENT-Clinic, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| |
Collapse
|
27
|
Masood N, Farooq H. EEG electrodes selection for emotion recognition independent of stimulus presentation paradigms. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-201779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Most of the electroencephalography (EEG) based emotion recognition systems rely on single stimulus to evoke emotions. EEG data is mostly recorded with higher number of electrodes that can lead to data redundancy and longer experimental setup time. The question “whether the configuration with lesser number of electrodes is common amongst different stimuli presentation paradigms” remains unanswered. There are publicly available datasets for EEG based human emotional states recognition. Since this work is focused towards classifying emotions while subjects are experiencing different stimuli, therefore we need to perform new experiments. Keeping aforementioned issues in consideration, this work presents a novel experimental study that records EEG data for three different human emotional states evoked with four different stimuli presentation paradigms. A methodology based on iterative Genetic Algorithm in combination with majority voting has been used to achieve configuration with reduced number of EEG electrodes keeping in consideration minimum loss of classification accuracy. The results obtained are comparable with recent studies. Stimulus independent configurations with lesser number of electrodes lead towards low computational complexity as well as reduced set up time for future EEG based smart systems for emotions recognition
Collapse
Affiliation(s)
- Naveen Masood
- Electrical Engineering Department, BahriaUniversity, Karachi, Pakistan
| | - Humera Farooq
- Computer Science Department, Bahria University, Karachi, Pakistan
| |
Collapse
|
28
|
Geravanchizadeh M, Roushan H. Dynamic selective auditory attention detection using RNN and reinforcement learning. Sci Rep 2021; 11:15497. [PMID: 34326401 PMCID: PMC8322190 DOI: 10.1038/s41598-021-94876-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 07/16/2021] [Indexed: 11/08/2022] Open
Abstract
The cocktail party phenomenon describes the ability of the human brain to focus auditory attention on a particular stimulus while ignoring other acoustic events. Selective auditory attention detection (SAAD) is an important issue in the development of brain-computer interface systems and cocktail party processors. This paper proposes a new dynamic attention detection system to process the temporal evolution of the input signal. The proposed dynamic SAAD is modeled as a sequential decision-making problem, which is solved by recurrent neural network (RNN) and reinforcement learning methods of Q-learning and deep Q-learning. Among different dynamic learning approaches, the evaluation results show that the deep Q-learning approach with RNN as agent provides the highest classification accuracy (94.2%) with the least detection delay. The proposed SAAD system is advantageous, in the sense that the detection of attention is performed dynamically for the sequential inputs. Also, the system has the potential to be used in scenarios, where the attention of the listener might be switched in time in the presence of various acoustic events.
Collapse
Affiliation(s)
- Masoud Geravanchizadeh
- Faculty of Electrical & Computer Engineering, University of Tabriz, 51666-15813, Tabriz, Iran.
| | - Hossein Roushan
- Faculty of Electrical & Computer Engineering, University of Tabriz, 51666-15813, Tabriz, Iran
| |
Collapse
|
29
|
Tiwari S, Goel S, Bhardwaj A. MIDNN- a classification approach for the EEG based motor imagery tasks using deep neural network. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02622-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
30
|
Cai S, Li P, Su E, Xie L. Auditory Attention Detection via Cross-Modal Attention. Front Neurosci 2021; 15:652058. [PMID: 34366770 PMCID: PMC8333999 DOI: 10.3389/fnins.2021.652058] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 06/24/2021] [Indexed: 11/13/2022] Open
Abstract
Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. However, previous AAD approaches perform poorly on short signal segments, more advanced decoding strategies are needed to realize robust real-time AAD. In this study, we propose a novel approach, i.e., cross-modal attention-based AAD (CMAA), to exploit the discriminative features and the correlation between audio and EEG signals. With this mechanism, we hope to dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features, thereby detecting the auditory attention activities manifested in brain signals. We also validate the CMAA model through data visualization and comprehensive experiments on a publicly available database. Experiments show that the CMAA achieves accuracy values of 82.8, 86.4, and 87.6% for 1-, 2-, and 5-s decision windows under anechoic conditions, respectively; for a 2-s decision window, it achieves an average of 84.1% under real-world reverberant conditions. The proposed CMAA network not only achieves better performance than the conventional linear model, but also outperforms the state-of-the-art non-linear approaches. These results and data visualization suggest that the CMAA model can dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features in order to improve the AAD performance.
Collapse
Affiliation(s)
| | | | | | - Longhan Xie
- Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
31
|
Strypsteen T, Bertrand A. End-to-end learnable EEG channel selection for deep neural networks with Gumbel-softmax. J Neural Eng 2021; 18. [PMID: 34225257 DOI: 10.1088/1741-2552/ac115d] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 07/05/2021] [Indexed: 12/26/2022]
Abstract
Objective.To develop an efficient, embedded electroencephalogram (EEG) channel selection approach for deep neural networks, allowing us to match the channel selection to the target model, while avoiding the large computational burdens of wrapper approaches in conjunction with neural networks.Approach.We employ a concrete selector layer to jointly optimize the EEG channel selection and network parameters. This layer uses a Gumbel-softmax trick to build continuous relaxations of the discrete parameters involved in the selection process, allowing them be learned in an end-to-end manner with traditional backpropagation. As the selection layer was often observed to include the same channel twice in a certain selection, we propose a regularization function to mitigate this behavior. We validate this method on two different EEG tasks: motor execution and auditory attention decoding. For each task, we compare the performance of the Gumbel-softmax method with a baseline EEG channel selection approach tailored towards this specific task: mutual information and greedy forward selection with the utility metric respectively.Main results.Our experiments show that the proposed framework is generally applicable, while performing at least as well as (and often better than) these state-of-the-art, task-specific approaches.Significance.The proposed method offers an efficient, task- and model-independent approach to jointly learn the optimal EEG channels along with the neural network weights.
Collapse
Affiliation(s)
- Thomas Strypsteen
- KU Leuven, Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics with Leuven.AI - KU Leuven institute for AI, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
| | - Alexander Bertrand
- KU Leuven, Department of Electrical Engineering (ESAT) STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics with Leuven.AI - KU Leuven institute for AI, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
| |
Collapse
|
32
|
Akbarzadeh S, Lee S, Tan CT. The Spatial Selective Auditory Attention of Cochlear Implant Users in Different Conversational Sound Levels. J Clin Med 2021; 10:jcm10143078. [PMID: 34300245 PMCID: PMC8304083 DOI: 10.3390/jcm10143078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/04/2021] [Accepted: 07/07/2021] [Indexed: 11/16/2022] Open
Abstract
In multi-speaker environments, cochlear implant (CI) users may attend to a target sound source in a different manner from normal hearing (NH) individuals during a conversation. This study attempted to investigate the effect of conversational sound levels on the mechanisms adopted by CI and NH listeners in selective auditory attention and how it affects their daily conversation. Nine CI users (five bilateral, three unilateral, and one bimodal) and eight NH listeners participated in this study. The behavioral speech recognition scores were collected using a matrix sentences test, and neural tracking to speech envelope was recorded using electroencephalography (EEG). Speech stimuli were presented at three different levels (75, 65, and 55 dB SPL) in the presence of two maskers from three spatially separated speakers. Different combinations of assisted/impaired hearing modes were evaluated for CI users, and the outcomes were analyzed in three categories: electric hearing only, acoustic hearing only, and electric + acoustic hearing. Our results showed that increasing the conversational sound level degraded the selective auditory attention in electrical hearing. On the other hand, increasing the sound level improved the selective auditory attention for the acoustic hearing group. In the NH listeners, however, increasing the sound level did not cause a significant change in the auditory attention. Our result implies that the effect of the sound level on selective auditory attention varies depending on the hearing modes, and the loudness control is necessary for the ease of attending to the conversation by CI users.
Collapse
Affiliation(s)
- Sara Akbarzadeh
- Department of Electrical and Computer Engineering, University of Texas at Dallas, Richardson, TX 75080, USA;
- Correspondence: ; Tel.: +1-469-231-5034
| | - Sungmin Lee
- Department of Speech-Language Pathology and Audiology, Tongmyong University, Busan 48520, Korea;
| | - Chin-Tuan Tan
- Department of Electrical and Computer Engineering, University of Texas at Dallas, Richardson, TX 75080, USA;
| |
Collapse
|
33
|
Lunner T, Alickovic E, Graversen C, Ng EHN, Wendt D, Keidser G. Three New Outcome Measures That Tap Into Cognitive Processes Required for Real-Life Communication. Ear Hear 2021; 41 Suppl 1:39S-47S. [PMID: 33105258 PMCID: PMC7676869 DOI: 10.1097/aud.0000000000000941] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 07/11/2020] [Indexed: 11/29/2022]
Abstract
To increase the ecological validity of outcomes from laboratory evaluations of hearing and hearing devices, it is desirable to introduce more realistic outcome measures in the laboratory. This article presents and discusses three outcome measures that have been designed to go beyond traditional speech-in-noise measures to better reflect realistic everyday challenges. The outcome measures reviewed are: the Sentence-final Word Identification and Recall (SWIR) test that measures working memory performance while listening to speech in noise at ceiling performance; a neural tracking method that produces a quantitative measure of selective speech attention in noise; and pupillometry that measures changes in pupil dilation to assess listening effort while listening to speech in noise. According to evaluation data, the SWIR test provides a sensitive measure in situations where speech perception performance might be unaffected. Similarly, pupil dilation has also shown sensitivity in situations where traditional speech-in-noise measures are insensitive. Changes in working memory capacity and effort mobilization were found at positive signal-to-noise ratios (SNR), that is, at SNRs that might reflect everyday situations. Using stimulus reconstruction, it has been demonstrated that neural tracking is a robust method at determining to what degree a listener is attending to a specific talker in a typical cocktail party situation. Using both established and commercially available noise reduction schemes, data have further shown that all three measures are sensitive to variation in SNR. In summary, the new outcome measures seem suitable for testing hearing and hearing devices under more realistic and demanding everyday conditions than traditional speech-in-noise tests.
Collapse
Affiliation(s)
- Thomas Lunner
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Linköping University, Linköping, Sweden
- Department of Electrical Engineering, Division Automatic Control, Linköping University, Linköping, Sweden
- Department of Health Technology, Hearing Systems, Technical University of Denmark, Lyngby, Denmark
| | - Emina Alickovic
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Electrical Engineering, Division Automatic Control, Linköping University, Linköping, Sweden
| | | | - Elaine Hoi Ning Ng
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Linköping University, Linköping, Sweden
- Oticon A/S, Kongebakken, Denmark
| | - Dorothea Wendt
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Health Technology, Hearing Systems, Technical University of Denmark, Lyngby, Denmark
| | - Gitte Keidser
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Linköping University, Linköping, Sweden
| |
Collapse
|
34
|
Belo J, Clerc M, Schön D. EEG-Based Auditory Attention Detection and Its Possible Future Applications for Passive BCI. FRONTIERS IN COMPUTER SCIENCE 2021. [DOI: 10.3389/fcomp.2021.661178] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The ability to discriminate and attend one specific sound source in a complex auditory environment is a fundamental skill for efficient communication. Indeed, it allows us to follow a family conversation or discuss with a friend in a bar. This ability is challenged in hearing-impaired individuals and more precisely in those with a cochlear implant (CI). Indeed, due to the limited spectral resolution of the implant, auditory perception remains quite poor in a noisy environment or in presence of simultaneous auditory sources. Recent methodological advances allow now to detect, on the basis of neural signals, which auditory stream within a set of multiple concurrent streams an individual is attending to. This approach, called EEG-based auditory attention detection (AAD), is based on fundamental research findings demonstrating that, in a multi speech scenario, cortical tracking of the envelope of the attended speech is enhanced compared to the unattended speech. Following these findings, other studies showed that it is possible to use EEG/MEG (Electroencephalography/Magnetoencephalography) to explore auditory attention during speech listening in a Cocktail-party-like scenario. Overall, these findings make it possible to conceive next-generation hearing aids combining customary technology and AAD. Importantly, AAD has also a great potential in the context of passive BCI, in the educational context as well as in the context of interactive music performances. In this mini review, we firstly present the different approaches of AAD and the main limitations of the global concept. We then expose its potential applications in the world of non-clinical passive BCI.
Collapse
|
35
|
Vandecappelle S, Deckers L, Das N, Ansari AH, Bertrand A, Francart T. EEG-based detection of the locus of auditory attention with convolutional neural networks. eLife 2021; 10:e56481. [PMID: 33929315 PMCID: PMC8143791 DOI: 10.7554/elife.56481] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 04/28/2021] [Indexed: 01/16/2023] Open
Abstract
In a multi-speaker scenario, the human auditory system is able to attend to one particular speaker of interest and ignore the others. It has been demonstrated that it is possible to use electroencephalography (EEG) signals to infer to which speaker someone is attending by relating the neural activity to the speech signals. However, classifying auditory attention within a short time interval remains the main challenge. We present a convolutional neural network-based approach to extract the locus of auditory attention (left/right) without knowledge of the speech envelopes. Our results show that it is possible to decode the locus of attention within 1-2 s, with a median accuracy of around 81%. These results are promising for neuro-steered noise suppression in hearing aids, in particular in scenarios where per-speaker envelopes are unavailable.
Collapse
Affiliation(s)
- Servaas Vandecappelle
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Lucas Deckers
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Neetha Das
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Amir Hossein Ansari
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Alexander Bertrand
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Tom Francart
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
| |
Collapse
|
36
|
Vandecappelle S, Deckers L, Das N, Ansari AH, Bertrand A, Francart T. EEG-based detection of the locus of auditory attention with convolutional neural networks. eLife 2021; 10:56481. [PMID: 33929315 DOI: 10.1101/475673] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 04/28/2021] [Indexed: 05/27/2023] Open
Abstract
In a multi-speaker scenario, the human auditory system is able to attend to one particular speaker of interest and ignore the others. It has been demonstrated that it is possible to use electroencephalography (EEG) signals to infer to which speaker someone is attending by relating the neural activity to the speech signals. However, classifying auditory attention within a short time interval remains the main challenge. We present a convolutional neural network-based approach to extract the locus of auditory attention (left/right) without knowledge of the speech envelopes. Our results show that it is possible to decode the locus of attention within 1-2 s, with a median accuracy of around 81%. These results are promising for neuro-steered noise suppression in hearing aids, in particular in scenarios where per-speaker envelopes are unavailable.
Collapse
Affiliation(s)
- Servaas Vandecappelle
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Lucas Deckers
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Neetha Das
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Amir Hossein Ansari
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Alexander Bertrand
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Tom Francart
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
| |
Collapse
|
37
|
Kuruvila I, Can Demir K, Fischer E, Hoppe U. Inference of the Selective Auditory Attention Using Sequential LMMSE Estimation. IEEE Trans Biomed Eng 2021; 68:3501-3512. [PMID: 33891545 DOI: 10.1109/tbme.2021.3075337] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Attentive listening in a multispeaker environment such as a cocktail party requires suppression of the interfering speakers and the noise around. People with normal hearing perform remarkably well in such situations. Analysis of the cortical signals using electroencephalography (EEG) has revealed that the EEG signals track the envelope of the attended speech stronger than that of the interfering speech. This has enabled the development of algorithms that can decode the selective attention of a listener in controlled experimental settings. However, often these algorithms require longer trial duration and computationally expensive calibration to obtain a reliable inference of attention. In this paper, we present a novel framework to decode the attention of a listener within trial durations of the order of two seconds. It comprises of three modules: 1) Dynamic estimation of the temporal response functions (TRF) in every trial using a sequential linear minimum mean squared error (LMMSE) estimator, 2) Extract the N1 -P2 peak of the estimated TRF that serves as a marker related to the attentional state, and 3) Obtain a probabilistic measure of the attentional state using a support vector machine followed by a logistic regression. The efficacy of the proposed decoding framework was evaluated using EEG data collected from 27 subjects. The total number of electrodes required to infer the attention was four: One for the signal estimation, one for the noise estimation and the other two being the reference and the ground electrodes. Our results make further progress towards the realization of neuro-steered hearing aids.
Collapse
|
38
|
Holtze B, Jaeger M, Debener S, Adiloğlu K, Mirkovic B. Are They Calling My Name? Attention Capture Is Reflected in the Neural Tracking of Attended and Ignored Speech. Front Neurosci 2021; 15:643705. [PMID: 33828451 PMCID: PMC8019946 DOI: 10.3389/fnins.2021.643705] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 02/19/2021] [Indexed: 11/15/2022] Open
Abstract
Difficulties in selectively attending to one among several speakers have mainly been associated with the distraction caused by ignored speech. Thus, in the current study, we investigated the neural processing of ignored speech in a two-competing-speaker paradigm. For this, we recorded the participant’s brain activity using electroencephalography (EEG) to track the neural representation of the attended and ignored speech envelope. To provoke distraction, we occasionally embedded the participant’s first name in the ignored speech stream. Retrospective reports as well as the presence of a P3 component in response to the name indicate that participants noticed the occurrence of their name. As predicted, the neural representation of the ignored speech envelope increased after the name was presented therein, suggesting that the name had attracted the participant’s attention. Interestingly, in contrast to our hypothesis, the neural tracking of the attended speech envelope also increased after the name occurrence. On this account, we conclude that the name might not have primarily distracted the participants, at most for a brief duration, but that it alerted them to focus to their actual task. These observations remained robust even when the sound intensity of the ignored speech stream, and thus the sound intensity of the name, was attenuated.
Collapse
Affiliation(s)
- Björn Holtze
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Kamil Adiloğlu
- Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,HörTech gGmbH, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
39
|
Mikkelsen KB, Tabar YR, Christensen CB, Kidmose P. EEGs Vary Less Between Lab and Home Locations Than They Do Between People. Front Comput Neurosci 2021; 15:565244. [PMID: 33679356 PMCID: PMC7928278 DOI: 10.3389/fncom.2021.565244] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Accepted: 01/13/2021] [Indexed: 11/24/2022] Open
Abstract
Given the rapid development of light weight EEG devices which we have witnessed the past decade, it is reasonable to ask to which extent neuroscience could now be taken outside the lab. In this study, we have designed an EEG paradigm well suited for deployment “in the wild.” The paradigm is tested in repeated recordings on 20 subjects, on eight different occasions (4 in the laboratory, 4 in the subject's own home). By calculating the inter subject, intra subject and inter location variance, we find that the inter location variation for this paradigm is considerably less than the inter subject variation. We believe the paradigm is representative of a large group of other relevant paradigms. This means that given the positive results in this study, we find that if a research paradigm would benefit from being performed in less controlled environments, we expect limited problems in doing so.
Collapse
Affiliation(s)
- Kaare B Mikkelsen
- Department of Electrical and Computer Engineering, Aarhus University, Aarhus, Denmark
| | - Yousef R Tabar
- Department of Electrical and Computer Engineering, Aarhus University, Aarhus, Denmark
| | | | - Preben Kidmose
- Department of Electrical and Computer Engineering, Aarhus University, Aarhus, Denmark
| |
Collapse
|
40
|
Effect of number and placement of EEG electrodes on measurement of neural tracking of speech. PLoS One 2021; 16:e0246769. [PMID: 33571299 PMCID: PMC7877609 DOI: 10.1371/journal.pone.0246769] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 01/25/2021] [Indexed: 11/19/2022] Open
Abstract
Measurement of neural tracking of natural running speech from the electroencephalogram (EEG) is an increasingly popular method in auditory neuroscience and has applications in audiology. The method involves decoding the envelope of the speech signal from the EEG signal, and calculating the correlation with the envelope of the audio stream that was presented to the subject. Typically EEG systems with 64 or more electrodes are used. However, in practical applications, set-ups with fewer electrodes are required. Here, we determine the optimal number of electrodes, and the best position to place a limited number of electrodes on the scalp. We propose a channel selection strategy based on an utility metric, which allows a quick quantitative assessment of the influence of a channel (or a group of channels) on the reconstruction error. We consider two use cases: a subject-specific case, where the optimal number and position of the electrodes is determined for each subject individually, and a subject-independent case, where the electrodes are placed at the same positions (in the 10-20 system) for all the subjects. We evaluated our approach using 64-channel EEG data from 90 subjects. In the subject-specific case we found that the correlation between actual and reconstructed envelope first increased with decreasing number of electrodes, with an optimum at around 20 electrodes, yielding 29% higher correlations using the optimal number of electrodes compared to all electrodes. This means that our strategy of removing electrodes can be used to improve the correlation metric in high-density EEG recordings. In the subject-independent case, we obtained a stable decoding performance when decreasing from 64 to 22 channels. When the number of channels was further decreased, the correlation decreased. For a maximal decrease in correlation of 10%, 32 well-placed electrodes were sufficient in 91% of the subjects.
Collapse
|
41
|
Velasco-Álvarez F, Fernández-Rodríguez Á, Medina-Juliá MT, Ron-Angevin R. Speech stream segregation to control an ERP-based auditory BCI. J Neural Eng 2021; 18. [PMID: 33470970 DOI: 10.1088/1741-2552/abdd44] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 01/19/2021] [Indexed: 11/12/2022]
Abstract
OBJECTIVE The use of natural sounds in auditory Brain-Computer Interfaces (BCI) has been shown to improve classification results and usability. Some auditory BCIs are based on stream segregation, in which the subjects must attend one audio stream and ignore the other(s); these streams include some kind of stimuli to be detected. In this work we focus on Event-Related Potentials (ERP) and study whether providing intelligible content to each audio stream could help the users to better concentrate on the desired stream and so to better attend the target stimuli and to ignore the non-target ones. APPROACH In addition to a control condition, two experimental conditions, based on the selective attention and the cocktail party effect, were tested using two simultaneous and spatialized audio streams: i) the condition A2 consisted of an overlap of auditory stimuli (single syllables) on a background consisting of natural speech for each stream, ii) in condition A3, brief alterations of the natural flow of each speech were used as stimuli. MAIN RESULTS The two experimental proposals improved the results of the control condition (single words as stimuli without a speech background) both in a cross validation analysis of the calibration part and in the online test. The analysis of the ERP responses also presented better discriminability for the two proposals in comparison to the control condition. The results of subjective questionnaires support the better usability of the first experimental condition. SIGNIFICANCE The use of natural speech as background improves the stream segregation in an ERP-based auditory BCI (with significant results in the performance metrics, the ERP waveforms, and in the preference parameter in subjective questionnaires). Future work in the field of ERP-based stream segregation should study the use of natural speech in combination with easily perceived but not distracting stimuli.
Collapse
Affiliation(s)
- Francisco Velasco-Álvarez
- Department of Electronic Technology, Universidad de Malaga, E.T.S.I. Telecomunicación, Campus de Teatinos s/n, Malaga, 29071, SPAIN
| | - Álvaro Fernández-Rodríguez
- Department of Electronic Technology, University of Málaga, E.T.S.I. Telecomunicación, Campus de Teatinos s/n, Málaga, 29071, SPAIN
| | - M Teresa Medina-Juliá
- Department of Electronic Technology, Universidad de Malaga, E.T.S.I. Telecomunicación, Campus de Teatinos s/n, Malaga, 29071, SPAIN
| | - Ricardo Ron-Angevin
- Department of Electronic Technology, Universidad de Malaga, E.T.S.I. Telecomunicación, Campus de Teatinos s/n, Malaga, 29071, SPAIN
| |
Collapse
|
42
|
Baek SC, Chung JH, Lim Y. Implementation of an Online Auditory Attention Detection Model with Electroencephalography in a Dichotomous Listening Experiment. SENSORS 2021; 21:s21020531. [PMID: 33451041 PMCID: PMC7828508 DOI: 10.3390/s21020531] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 01/07/2021] [Accepted: 01/09/2021] [Indexed: 11/16/2022]
Abstract
Auditory attention detection (AAD) is the tracking of a sound source to which a listener is attending based on neural signals. Despite expectation for the applicability of AAD in real-life, most AAD research has been conducted on recorded electroencephalograms (EEGs), which is far from online implementation. In the present study, we attempted to propose an online AAD model and to implement it on a streaming EEG. The proposed model was devised by introducing a sliding window into the linear decoder model and was simulated using two datasets obtained from separate experiments to evaluate the feasibility. After simulation, the online model was constructed and evaluated based on the streaming EEG of an individual, acquired during a dichotomous listening experiment. Our model was able to detect the transient direction of a participant's attention on the order of one second during the experiment and showed up to 70% average detection accuracy. We expect that the proposed online model could be applied to develop adaptive hearing aids or neurofeedback training for auditory attention and speech perception.
Collapse
Affiliation(s)
- Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, Korea;
| | - Jae Ho Chung
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, Korea;
- Department of Otolaryngology-Head and Neck Surgery, College of Medicine, Hanyang University, Seoul 04763, Korea
- Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, Korea
- Correspondence: (J.H.C.); (Y.L.); Tel.: +82-2-31-560-2298 (J.H.C.); +82-2-958-6641 (Y.L.)
| | - Yoonseob Lim
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, Korea;
- Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, Korea
- Research Center for Diagnosis, Treatment and Care System of Dementia, Korea Institute of Science and Technology, Seoul 02792, Korea
- Correspondence: (J.H.C.); (Y.L.); Tel.: +82-2-31-560-2298 (J.H.C.); +82-2-958-6641 (Y.L.)
| |
Collapse
|
43
|
Hausfeld L, Shiell M, Formisano E, Riecke L. Cortical processing of distracting speech in noisy auditory scenes depends on perceptual demand. Neuroimage 2020; 228:117670. [PMID: 33359352 DOI: 10.1016/j.neuroimage.2020.117670] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 12/13/2020] [Accepted: 12/14/2020] [Indexed: 11/15/2022] Open
Abstract
Selective attention is essential for the processing of multi-speaker auditory scenes because they require the perceptual segregation of the relevant speech ("target") from irrelevant speech ("distractors"). For simple sounds, it has been suggested that the processing of multiple distractor sounds depends on bottom-up factors affecting task performance. However, it remains unclear whether such dependency applies to naturalistic multi-speaker auditory scenes. In this study, we tested the hypothesis that increased perceptual demand (the processing requirement posed by the scene to separate the target speech) reduces the cortical processing of distractor speech thus decreasing their perceptual segregation. Human participants were presented with auditory scenes including three speakers and asked to selectively attend to one speaker while their EEG was acquired. The perceptual demand of this selective listening task was varied by introducing an auditory cue (interaural time differences, ITDs) for segregating the target from the distractor speakers, while acoustic differences between the distractors were matched in ITD and loudness. We obtained a quantitative measure of the cortical segregation of distractor speakers by assessing the difference in how accurately speech-envelope following EEG responses could be predicted by models of averaged distractor speech versus models of individual distractor speech. In agreement with our hypothesis, results show that interaural segregation cues led to improved behavioral word-recognition performance and stronger cortical segregation of the distractor speakers. The neural effect was strongest in the δ-band and at early delays (0 - 200 ms). Our results indicate that during low perceptual demand, the human cortex represents individual distractor speech signals as more segregated. This suggests that, in addition to purely acoustical properties, the cortical processing of distractor speakers depends on factors like perceptual demand.
Collapse
Affiliation(s)
- Lars Hausfeld
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands.
| | - Martha Shiell
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands; Maastricht Centre for Systems Biology, 6200MD Maastricht, The Netherlands
| | - Lars Riecke
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands
| |
Collapse
|
44
|
Narayanan AM, Patrinos P, Bertrand A. Optimal Versus Approximate Channel Selection Methods for EEG Decoding With Application to Topology-Constrained Neuro-Sensor Networks. IEEE Trans Neural Syst Rehabil Eng 2020; 29:92-102. [PMID: 33141674 DOI: 10.1109/tnsre.2020.3035499] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Channel selection or electrode placement for neural decoding is a commonly encountered problem in electroencephalography (EEG). Since evaluating all possible channel combinations is usually infeasible, one usually has to settle for heuristic methods or convex approximations without optimality guarantees. To date, it remains unclear how large the gap is between the selection made by these approximate methods and the truly optimal selection. The goal of this paper is to quantify this optimality gap for several state-of-the-art channel selection methods in the context of least-squares based neural decoding. To this end, we reformulate the channel selection problem as a mixed-integer quadratic program (MIQP), which allows the use of efficient MIQP solvers to find the optimal channel combination in a feasible computation time for up to 100 candidate channels. As this reveals the exact solution to the combinatorial problem, it allows to quantify the performance losses when using state-of-the-art sub-optimal (yet faster) channel selection methods. In a context of auditory attention decoding, we find that a greedy channel selection based on the utility metric does not show a significant optimality gap compared to optimal channel selection, whereas other state-of-the-art greedy or l1 -norm penalized methods do show a significant loss in performance. Furthermore, we demonstrate that the MIQP formulation also provides a natural way to incorporate topology constraints in the selection, e.g., for electrode placement in neuro-sensor networks with galvanic separation constraints. Furthermore, a combination of this utility-based greedy selection with an MIQP solver allows to perform a topology constrained electrode placement, even in large scale problems with more than 100 candidate positions.
Collapse
|
45
|
Wang L, Wu EX, Chen F. Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions. Front Hum Neurosci 2020; 14:557534. [PMID: 33132874 PMCID: PMC7576187 DOI: 10.3389/fnhum.2020.557534] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 09/09/2020] [Indexed: 11/25/2022] Open
Abstract
The attended speech stream can be detected robustly, even in adverse auditory scenarios with auditory attentional modulation, and can be decoded using electroencephalographic (EEG) data. Speech segmentation based on the relative root-mean-square (RMS) intensity can be used to estimate segmental contributions to perception in noisy conditions. High-RMS-level segments contain crucial information for speech perception. Hence, this study aimed to investigate the effect of high-RMS-level speech segments on auditory attention decoding performance under various signal-to-noise ratio (SNR) conditions. Scalp EEG signals were recorded when subjects listened to the attended speech stream in the mixed speech narrated concurrently by two Mandarin speakers. The temporal response function was used to identify the attended speech from EEG responses of tracking to the temporal envelopes of intact speech and high-RMS-level speech segments alone, respectively. Auditory decoding performance was then analyzed under various SNR conditions by comparing EEG correlations to the attended and ignored speech streams. The accuracy of auditory attention decoding based on the temporal envelope with high-RMS-level speech segments was not inferior to that based on the temporal envelope of intact speech. Cortical activity correlated more strongly with attended than with ignored speech under different SNR conditions. These results suggest that EEG recordings corresponding to high-RMS-level speech segments carry crucial information for the identification and tracking of attended speech in the presence of background noise. This study also showed that with the modulation of auditory attention, attended speech can be decoded more robustly from neural activity than from behavioral measures under a wide range of SNR.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong
| | - Ed X Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
46
|
Greenlaw KM, Puschmann S, Coffey EBJ. Decoding of Envelope vs. Fundamental Frequency During Complex Auditory Stream Segregation. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2020; 1:268-287. [PMID: 37215227 PMCID: PMC10158587 DOI: 10.1162/nol_a_00013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 04/25/2020] [Indexed: 05/24/2023]
Abstract
Hearing-in-noise perception is a challenging task that is critical to human function, but how the brain accomplishes it is not well understood. A candidate mechanism proposes that the neural representation of an attended auditory stream is enhanced relative to background sound via a combination of bottom-up and top-down mechanisms. To date, few studies have compared neural representation and its task-related enhancement across frequency bands that carry different auditory information, such as a sound's amplitude envelope (i.e., syllabic rate or rhythm; 1-9 Hz), and the fundamental frequency of periodic stimuli (i.e., pitch; >40 Hz). Furthermore, hearing-in-noise in the real world is frequently both messier and richer than the majority of tasks used in its study. In the present study, we use continuous sound excerpts that simultaneously offer predictive, visual, and spatial cues to help listeners separate the target from four acoustically similar simultaneously presented sound streams. We show that while both lower and higher frequency information about the entire sound stream is represented in the brain's response, the to-be-attended sound stream is strongly enhanced only in the slower, lower frequency sound representations. These results are consistent with the hypothesis that attended sound representations are strengthened progressively at higher level, later processing stages, and that the interaction of multiple brain systems can aid in this process. Our findings contribute to our understanding of auditory stream separation in difficult, naturalistic listening conditions and demonstrate that pitch and envelope information can be decoded from single-channel EEG data.
Collapse
Affiliation(s)
- Keelin M. Greenlaw
- Department of Psychology, Concordia University, Montreal, QC, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS)
- The Centre for Research on Brain, Language and Music (CRBLM)
| | | | | |
Collapse
|
47
|
Jaeger M, Mirkovic B, Bleichner MG, Debener S. Decoding the Attended Speaker From EEG Using Adaptive Evaluation Intervals Captures Fluctuations in Attentional Listening. Front Neurosci 2020; 14:603. [PMID: 32612507 PMCID: PMC7308709 DOI: 10.3389/fnins.2020.00603] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 05/15/2020] [Indexed: 11/13/2022] Open
Abstract
Listeners differ in their ability to attend to a speech stream in the presence of a competing sound. Differences in speech intelligibility in noise cannot be fully explained by the hearing ability which suggests the involvement of additional cognitive factors. A better understanding of the temporal fluctuations in the ability to pay selective auditory attention to a desired speech stream may help in explaining these variabilities. In order to better understand the temporal dynamics of selective auditory attention, we developed an online auditory attention decoding (AAD) processing pipeline based on speech envelope tracking in the electroencephalogram (EEG). Participants had to attend to one audiobook story while a second one had to be ignored. Online AAD was applied to track the attention toward the target speech signal. Individual temporal attention profiles were computed by combining an established AAD method with an adaptive staircase procedure. The individual decoding performance over time was analyzed and linked to behavioral performance as well as subjective ratings of listening effort, motivation, and fatigue. The grand average attended speaker decoding profile derived in the online experiment indicated performance above chance level. Parameters describing the individual AAD performance in each testing block indicated significant differences in decoding performance over time to be closely related to the behavioral performance in the selective listening task. Further, an exploratory analysis indicated that subjects with poor decoding performance reported higher listening effort and fatigue compared to good performers. Taken together our results show that online EEG based AAD in a complex listening situation is feasible. Adaptive attended speaker decoding profiles over time could be used as an objective measure of behavioral performance and listening effort. The developed online processing pipeline could also serve as a basis for future EEG based near real-time auditory neurofeedback systems.
Collapse
Affiliation(s)
- Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Martin G Bleichner
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Neurophysiology of Everyday Life Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
48
|
Geravanchizadeh M, Bakhshalipour Gavgani S. Selective auditory attention detection based on effective connectivity by single-trial EEG. J Neural Eng 2020; 17:026021. [PMID: 32131059 DOI: 10.1088/1741-2552/ab7c8d] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Focusing attention on one speaker in an environment with lots of speakers is one of the important abilities of the human auditory system. The temporal dynamics of the attention process and how the brain precisely performs this task are yet unknown. This paper proposes a new method for the selective auditory attention detection (SAAD) from single-trial EEG signals using the brain effective connectivity and complex network analysis for two groups of listeners attending to the left or right ear. APPROACH Here, the connectivity matrices of all subjects obtained from the Granger causality method are used to extract different features. Then, by employing the processes of feature selection and optimization, an optimized feature set is determined for the train of a classifier. MAIN RESULTS Among different measures of brain connectivity (i.e. segregation, integration, and centrality), the evaluation results show that the optimized feature set obtained by the combination of the centrality measures contain the most discriminative features for the classification process. The proposed SAAD method as compared with state-of-the-art attention detection approaches from the literature yields the best performance in terms of various measures. SIGNIFICANCE The new SAAD approach is advantageous, in the sense that the detection of attention is performed from single-trial EEG signals of each subject, without reconstructing the speech stimuli. This means that the proposed method could be employed for real-time applications such as smart hearing aid devices or brain-computer interface (BCI) systems.
Collapse
|
49
|
Vanheusden FJ, Kegler M, Ireland K, Georga C, Simpson DM, Reichenbach T, Bell SL. Hearing Aids Do Not Alter Cortical Entrainment to Speech at Audible Levels in Mild-to-Moderately Hearing-Impaired Subjects. Front Hum Neurosci 2020; 14:109. [PMID: 32317951 PMCID: PMC7147120 DOI: 10.3389/fnhum.2020.00109] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 03/11/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Cortical entrainment to speech correlates with speech intelligibility and attention to a speech stream in noisy environments. However, there is a lack of data on whether cortical entrainment can help in evaluating hearing aid fittings for subjects with mild to moderate hearing loss. One particular problem that may arise is that hearing aids may alter the speech stimulus during (pre-)processing steps, which might alter cortical entrainment to the speech. Here, the effect of hearing aid processing on cortical entrainment to running speech in hearing impaired subjects was investigated. METHODOLOGY Seventeen native English-speaking subjects with mild-to-moderate hearing loss participated in the study. Hearing function and hearing aid fitting were evaluated using standard clinical procedures. Participants then listened to a 25-min audiobook under aided and unaided conditions at 70 dBA sound pressure level (SPL) in quiet conditions. EEG data were collected using a 32-channel system. Cortical entrainment to speech was evaluated using decoders reconstructing the speech envelope from the EEG data. Null decoders, obtained from EEG and the time-reversed speech envelope, were used to assess the chance level reconstructions. Entrainment in the delta- (1-4 Hz) and theta- (4-8 Hz) band, as well as wideband (1-20 Hz) EEG data was investigated. RESULTS Significant cortical responses could be detected for all but one subject in all three frequency bands under both aided and unaided conditions. However, no significant differences could be found between the two conditions in the number of responses detected, nor in the strength of cortical entrainment. The results show that the relatively small change in speech input provided by the hearing aid was not sufficient to elicit a detectable change in cortical entrainment. CONCLUSION For subjects with mild to moderate hearing loss, cortical entrainment to speech in quiet at an audible level is not affected by hearing aids. These results clear the pathway for exploring the potential to use cortical entrainment to running speech for evaluating hearing aid fitting at lower speech intensities (which could be inaudible when unaided), or using speech in noise conditions.
Collapse
Affiliation(s)
- Frederique J. Vanheusden
- Department of Engineering, School of Science and Technology, Nottingham Trent University, Nottingham, United Kingdom
- Institute of Sound and Vibration Research, Faculty of Engineering and Physical Sciences, University of Southampton, Southampton, United Kingdom
| | - Mikolaj Kegler
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, United Kingdom
| | - Katie Ireland
- Audiology Department, Royal Berkshire NHS Foundation Trust, Reading, United Kingdom
| | - Constantina Georga
- Audiology Department, Royal Berkshire NHS Foundation Trust, Reading, United Kingdom
| | - David M. Simpson
- Institute of Sound and Vibration Research, Faculty of Engineering and Physical Sciences, University of Southampton, Southampton, United Kingdom
| | - Tobias Reichenbach
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, United Kingdom
| | - Steven L. Bell
- Institute of Sound and Vibration Research, Faculty of Engineering and Physical Sciences, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
50
|
Müller JA, Kollmeier B, Debener S, Brand T. Influence of auditory attention on sentence recognition captured by the neural phase. Eur J Neurosci 2020. [DOI: 10.1111/ejn.13896] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Jana Annina Müller
- Medizinische Physik Carl von Ossietzky Universität Oldenburg 26111 Oldenburg Germany
- Cluster of Excellence Hearing4all Carl von Ossietzky Universität Oldenburg Oldenburg Germany
| | - Birger Kollmeier
- Medizinische Physik Carl von Ossietzky Universität Oldenburg 26111 Oldenburg Germany
- Cluster of Excellence Hearing4all Carl von Ossietzky Universität Oldenburg Oldenburg Germany
| | - Stefan Debener
- Cluster of Excellence Hearing4all Carl von Ossietzky Universität Oldenburg Oldenburg Germany
- Neuropsychology Carl von Ossietzky Universität Oldenburg Oldenburg Germany
| | - Thomas Brand
- Medizinische Physik Carl von Ossietzky Universität Oldenburg 26111 Oldenburg Germany
- Cluster of Excellence Hearing4all Carl von Ossietzky Universität Oldenburg Oldenburg Germany
| |
Collapse
|