1
|
Ni Y, Yang Y, Chen H, Wang X, Lesica N, Zeng FG, Imani M. Hyperdimensional Brain-Inspired Learning for Phoneme Recognition With Large-Scale Inferior Colliculus Neural Activities. IEEE Trans Biomed Eng 2024; 71:3098-3110. [PMID: 39008389 DOI: 10.1109/tbme.2024.3408279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
OBJECTIVE Develop a novel and highly efficient framework that decodes Inferior Colliculus (IC) neural activities for phoneme recognition. METHODS We propose using Hyperdimensional Computing (HDC) to support an efficient phoneme recognition algorithm, in contrast to widely applied Deep Neural Networks (DNN). The high-dimensional representation and operations in HDC are rooted in human brain functionalities and naturally parallelizable, showing the potential for efficient neural activity analysis. Our proposed method includes a spatial and temporal-aware HDC encoder that effectively captures global and local patterns. As part of our framework, we deploy the lightweight HDC-based algorithm on a highly customizable and flexible hardware platform, i.e., Field Programmable Gate Arrays (FPGA), for optimal algorithm speedup. To evaluate our method, we record IC neural activities on gerbils while playing the sound of different phonemes. RESULTS We compare our proposed method with multiple baseline machine learning algorithms in recognition quality and learning efficiency, across different hardware platforms. The results show that our method generally achieves better classification quality than the best-performing baseline. Compared to the Deep Residual Neural Network (i.e., ResNet), our method shows a speedup up to 74×, 67×, 210× on CPU, GPU, and FPGA respectively. We achieve up to 15% (10%) higher accuracy in consonant (vowel) classification than ResNet. CONCLUSION By leveraging brain-inspired HDC for IC neural activity encoding and phoneme classification, we achieve orders of magnitude runtime speedup while improving accuracy in various challenging task settings. SIGNIFICANCE Decoding IC neural activities is an important step to enhance understanding about human auditory system. However, these responses from the central auditory system are noisy and contain high variance, demanding large-scale datasets and iterative model fine-tuning. The proposed HDC-based framework is more scalable and viable for future real-world deployment thanks to its fast training and overall better quality.
Collapse
|
2
|
Jalilpour Monesi M, Vanthornhout J, Francart T, Van Hamme H. The role of vowel and consonant onsets in neural tracking of natural speech. J Neural Eng 2024; 21:016002. [PMID: 38205849 DOI: 10.1088/1741-2552/ad1784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 12/20/2023] [Indexed: 01/12/2024]
Abstract
Objective. To investigate how the auditory system processes natural speech, models have been created to relate the electroencephalography (EEG) signal of a person listening to speech to various representations of the speech. Mainly the speech envelope has been used, but also phonetic representations. We investigated to which degree of granularity phonetic representations can be related to the EEG signal.Approach. We used recorded EEG signals from 105 subjects while they listened to fairy tale stories. We utilized speech representations, including onset of any phone, vowel-consonant onsets, broad phonetic class (BPC) onsets, and narrow phonetic class onsets, and related them to EEG using forward modeling and match-mismatch tasks. In forward modeling, we used a linear model to predict EEG from speech representations. In the match-mismatch task, we trained a long short term memory based model to determine which of two candidate speech segments matches with a given EEG segment.Main results. Our results show that vowel-consonant onsets outperform onsets of any phone in both tasks, which suggests that neural tracking of the vowel vs. consonant exists in the EEG to some degree. We also observed that vowel (syllable nucleus) onsets exhibit a more consistent representation in EEG compared to syllable onsets.Significance. Finally, our findings suggest that neural tracking previously thought to be associated with BPCs might actually originate from vowel-consonant onsets rather than the differentiation between different phonetic classes.
Collapse
Affiliation(s)
- Mohammad Jalilpour Monesi
- Department of Electrical Engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | | | - Tom Francart
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | - Hugo Van Hamme
- Department of Electrical Engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| |
Collapse
|
3
|
Nitta T, Horikawa J, Iribe Y, Taguchi R, Katsurada K, Shinohara S, Kawai G. Linguistic representation of vowels in speech imagery EEG. Front Hum Neurosci 2023; 17:1163578. [PMID: 37275343 PMCID: PMC10237317 DOI: 10.3389/fnhum.2023.1163578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 04/27/2023] [Indexed: 06/07/2023] Open
Abstract
Speech imagery recognition from electroencephalograms (EEGs) could potentially become a strong contender among non-invasive brain-computer interfaces (BCIs). In this report, first we extract language representations as the difference of line-spectra of phones by statistically analyzing many EEG signals from the Broca area. Then we extract vowels by using iterative search from hand-labeled short-syllable data. The iterative search process consists of principal component analysis (PCA) that visualizes linguistic representation of vowels through eigen-vectors φ(m), and subspace method (SM) that searches an optimum line-spectrum for redesigning φ(m). The extracted linguistic representation of Japanese vowels /i/ /e/ /a/ /o/ /u/ shows 2 distinguished spectral peaks (P1, P2) in the upper frequency range. The 5 vowels are aligned on the P1-P2 chart. A 5-vowel recognition experiment using a data set of 5 subjects and a convolutional neural network (CNN) classifier gave a mean accuracy rate of 72.6%.
Collapse
Affiliation(s)
- Tsuneo Nitta
- Graduate School of Engineering, Toyohashi University of Technology, Toyohashi, Japan
| | - Junsei Horikawa
- Graduate School of Engineering, Toyohashi University of Technology, Toyohashi, Japan
| | - Yurie Iribe
- Graduate School of Information Science and Technology, Aichi Prefectural University, Nagakute, Japan
| | - Ryo Taguchi
- Graduate School of Information, Nagoya Institute of Technology, Nagoya, Japan
| | - Kouichi Katsurada
- Faculty of Science and Technology, Tokyo University of Science, Noda, Japan
| | - Shuji Shinohara
- School of Science and Engineering, Tokyo Denki University, Saitama, Japan
| | - Goh Kawai
- Online Learning Support Team, Tokyo University of Foreign Studies, Tokyo, Japan
| |
Collapse
|
4
|
Losorelli S, Kaneshiro B, Musacchia GA, Blevins NH, Fitzgerald MB. Factors influencing classification of frequency following responses to speech and music stimuli. Hear Res 2020; 398:108101. [PMID: 33142106 DOI: 10.1016/j.heares.2020.108101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 09/25/2020] [Accepted: 10/19/2020] [Indexed: 01/08/2023]
Abstract
Successful mapping of meaningful labels to sound input requires accurate representation of that sound's acoustic variances in time and spectrum. For some individuals, such as children or those with hearing loss, having an objective measure of the integrity of this representation could be useful. Classification is a promising machine learning approach which can be used to objectively predict a stimulus label from the brain response. This approach has been previously used with auditory evoked potentials (AEP) such as the frequency following response (FFR), but a number of key issues remain unresolved before classification can be translated into clinical practice. Specifically, past efforts at FFR classification have used data from a given subject for both training and testing the classifier. It is also unclear which components of the FFR elicit optimal classification accuracy. To address these issues, we recorded FFRs from 13 adults with normal hearing in response to speech and music stimuli. We compared labeling accuracy of two cross-validation classification approaches using FFR data: (1) a more traditional method combining subject data in both the training and testing set, and (2) a "leave-one-out" approach, in which subject data is classified based on a model built exclusively from the data of other individuals. We also examined classification accuracy on decomposed and time-segmented FFRs. Our results indicate that the accuracy of leave-one-subject-out cross validation approaches that obtained in the more conventional cross-validation classifications while allowing a subject's results to be analysed with respect to normative data pooled from a separate population. In addition, we demonstrate that classification accuracy is highest when the entire FFR is used to train the classifier. Taken together, these efforts contribute key steps toward translation of classification-based machine learning approaches into clinical practice.
Collapse
Affiliation(s)
- Steven Losorelli
- Department of Otolaryngology Head and Neck Surgery, Stanford University School of Medicine, Palo Alto, CA, USA.
| | - Blair Kaneshiro
- Department of Otolaryngology Head and Neck Surgery, Stanford University School of Medicine, Palo Alto, CA, USA.
| | - Gabriella A Musacchia
- Department of Otolaryngology Head and Neck Surgery, Stanford University School of Medicine, Palo Alto, CA, USA; Department of Audiology, University of the Pacific, San Francisco, CA, USA.
| | - Nikolas H Blevins
- Department of Otolaryngology Head and Neck Surgery, Stanford University School of Medicine, Palo Alto, CA, USA.
| | - Matthew B Fitzgerald
- Department of Otolaryngology Head and Neck Surgery, Stanford University School of Medicine, Palo Alto, CA, USA.
| |
Collapse
|
5
|
Moinnereau MA, Rouat J, Whittingstall K, Plourde E. A frequency-band coupling model of EEG signals can capture features from an input audio stimulus. Hear Res 2020; 393:107994. [DOI: 10.1016/j.heares.2020.107994] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 04/27/2020] [Accepted: 05/04/2020] [Indexed: 10/24/2022]
|
6
|
Wang Y, Wang P, Yu Y. Decoding English Alphabet Letters Using EEG Phase Information. Front Neurosci 2018; 12:62. [PMID: 29467615 PMCID: PMC5808334 DOI: 10.3389/fnins.2018.00062] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 01/25/2018] [Indexed: 11/13/2022] Open
Abstract
Increasing evidence indicates that the phase pattern and power of the low frequency oscillations of brain electroencephalograms (EEG) contain significant information during the human cognition of sensory signals such as auditory and visual stimuli. Here, we investigate whether and how the letters of the alphabet can be directly decoded from EEG phase and power data. In addition, we investigate how different band oscillations contribute to the classification and determine the critical time periods. An English letter recognition task was assigned, and statistical analyses were conducted to decode the EEG signal corresponding to each letter visualized on a computer screen. We applied support vector machine (SVM) with gradient descent method to learn the potential features for classification. It was observed that the EEG phase signals have a higher decoding accuracy than the oscillation power information. Low-frequency theta and alpha oscillations have phase information with higher accuracy than do other bands. The decoding performance was best when the analysis period began from 180 to 380 ms after stimulus presentation, especially in the lateral occipital and posterior temporal scalp regions (PO7 and PO8). These results may provide a new approach for brain-computer interface techniques (BCI) and may deepen our understanding of EEG oscillations in cognition.
Collapse
Affiliation(s)
- YiYan Wang
- State Key Laboratory of Medical Neurobiology, School of Life Science and the Collaborative Innovation Center for Brain Science, Center for Computational Systems Biology, Institutes of Brain Science, Fudan University, Shanghai, China.,Institute of Modern Physics, Fudan University, Shanghai, China
| | - Pingxiao Wang
- Institute of Modern Physics, Fudan University, Shanghai, China
| | - Yuguo Yu
- State Key Laboratory of Medical Neurobiology, School of Life Science and the Collaborative Innovation Center for Brain Science, Center for Computational Systems Biology, Institutes of Brain Science, Fudan University, Shanghai, China
| |
Collapse
|
7
|
Lee B, Cho KH. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference. Sci Rep 2016; 6:37647. [PMID: 27876875 PMCID: PMC5120313 DOI: 10.1038/srep37647] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Accepted: 10/28/2016] [Indexed: 11/18/2022] Open
Abstract
Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.
Collapse
Affiliation(s)
- Byeongwook Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Kwang-Hyun Cho
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| |
Collapse
|
8
|
Left Superior Temporal Gyrus Is Coupled to Attended Speech in a Cocktail-Party Auditory Scene. J Neurosci 2016; 36:1596-606. [PMID: 26843641 DOI: 10.1523/jneurosci.1730-15.2016] [Citation(s) in RCA: 74] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
UNLABELLED Using a continuous listening task, we evaluated the coupling between the listener's cortical activity and the temporal envelopes of different sounds in a multitalker auditory scene using magnetoencephalography and corticovocal coherence analysis. Neuromagnetic signals were recorded from 20 right-handed healthy adult humans who listened to five different recorded stories (attended speech streams), one without any multitalker background (No noise) and four mixed with a "cocktail party" multitalker background noise at four signal-to-noise ratios (5, 0, -5, and -10 dB) to produce speech-in-noise mixtures, here referred to as Global scene. Coherence analysis revealed that the modulations of the attended speech stream, presented without multitalker background, were coupled at ∼0.5 Hz to the activity of both superior temporal gyri, whereas the modulations at 4-8 Hz were coupled to the activity of the right supratemporal auditory cortex. In cocktail party conditions, with the multitalker background noise, the coupling was at both frequencies stronger for the attended speech stream than for the unattended Multitalker background. The coupling strengths decreased as the Multitalker background increased. During the cocktail party conditions, the ∼0.5 Hz coupling became left-hemisphere dominant, compared with bilateral coupling without the multitalker background, whereas the 4-8 Hz coupling remained right-hemisphere lateralized in both conditions. The brain activity was not coupled to the multitalker background or to its individual talkers. The results highlight the key role of listener's left superior temporal gyri in extracting the slow ∼0.5 Hz modulations, likely reflecting the attended speech stream within a multitalker auditory scene. SIGNIFICANCE STATEMENT When people listen to one person in a "cocktail party," their auditory cortex mainly follows the attended speech stream rather than the entire auditory scene. However, how the brain extracts the attended speech stream from the whole auditory scene and how increasing background noise corrupts this process is still debated. In this magnetoencephalography study, subjects had to attend a speech stream with or without multitalker background noise. Results argue for frequency-dependent cortical tracking mechanisms for the attended speech stream. The left superior temporal gyrus tracked the ∼0.5 Hz modulations of the attended speech stream only when the speech was embedded in multitalker background, whereas the right supratemporal auditory cortex tracked 4-8 Hz modulations during both noiseless and cocktail-party conditions.
Collapse
|
9
|
Mai G, Minett JW, Wang WSY. Delta, theta, beta, and gamma brain oscillations index levels of auditory sentence processing. Neuroimage 2016; 133:516-528. [DOI: 10.1016/j.neuroimage.2016.02.064] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Revised: 02/05/2016] [Accepted: 02/21/2016] [Indexed: 11/30/2022] Open
|
10
|
Zhang Q, Hu X, Luo H, Li J, Zhang X, Zhang B. Deciphering phonemes from syllables in blood oxygenation level-dependent signals in human superior temporal gyrus. Eur J Neurosci 2016; 43:773-81. [DOI: 10.1111/ejn.13164] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Revised: 01/06/2016] [Accepted: 01/06/2016] [Indexed: 11/30/2022]
Affiliation(s)
- Qingtian Zhang
- Tsinghua National Laboratory for Information Science and Technology (TNList); Department of Computer Science and Technology; Tsinghua University; Room 4-504, FIT Building Beijing 100084 China
| | - Xiaolin Hu
- Tsinghua National Laboratory for Information Science and Technology (TNList); Department of Computer Science and Technology; Tsinghua University; Room 4-504, FIT Building Beijing 100084 China
- Center for Brain-Inspired Computing Research (CBICR); Tsinghua University; Beijing China
| | - Huan Luo
- Department of Psychology; Peking University; Beijing China
- IDG/McGovern Institute for Brain Research; Peking University; Beijing China
| | - Jianmin Li
- Tsinghua National Laboratory for Information Science and Technology (TNList); Department of Computer Science and Technology; Tsinghua University; Room 4-504, FIT Building Beijing 100084 China
| | - Xiaolu Zhang
- Tsinghua National Laboratory for Information Science and Technology (TNList); Department of Computer Science and Technology; Tsinghua University; Room 4-504, FIT Building Beijing 100084 China
| | - Bo Zhang
- Tsinghua National Laboratory for Information Science and Technology (TNList); Department of Computer Science and Technology; Tsinghua University; Room 4-504, FIT Building Beijing 100084 China
- Center for Brain-Inspired Computing Research (CBICR); Tsinghua University; Beijing China
| |
Collapse
|
11
|
Carvalhaes C, de Barros JA. The surface Laplacian technique in EEG: Theory and methods. Int J Psychophysiol 2015; 97:174-88. [DOI: 10.1016/j.ijpsycho.2015.04.023] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Revised: 04/24/2015] [Accepted: 04/28/2015] [Indexed: 11/30/2022]
|
12
|
Carvalhaes CG, de Barros JA, Perreau-Guimaraes M, Suppes P. The Joint Use of the Tangential Electric Field and Surface Laplacian in EEG Classification. Brain Topogr 2013; 27:84-94. [DOI: 10.1007/s10548-013-0305-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Accepted: 07/07/2013] [Indexed: 11/28/2022]
|