1
|
Carvalho VR, Mendes EMAM, Fallah A, Sejnowski TJ, Comstock L, Lainscsek C. Decoding imagined speech with delay differential analysis. Front Hum Neurosci 2024; 18:1398065. [PMID: 38826617 PMCID: PMC11140152 DOI: 10.3389/fnhum.2024.1398065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 04/25/2024] [Indexed: 06/04/2024] Open
Abstract
Speech decoding from non-invasive EEG signals can achieve relatively high accuracy (70-80%) for strictly delimited classification tasks, but for more complex tasks non-invasive speech decoding typically yields a 20-50% classification accuracy. However, decoder generalization, or how well algorithms perform objectively across datasets, is complicated by the small size and heterogeneity of existing EEG datasets. Furthermore, the limited availability of open access code hampers a comparison between methods. This study explores the application of a novel non-linear method for signal processing, delay differential analysis (DDA), to speech decoding. We provide a systematic evaluation of its performance on two public imagined speech decoding datasets relative to all publicly available deep learning methods. The results support DDA as a compelling alternative or complementary approach to deep learning methods for speech decoding. DDA is a fast and efficient time-domain open-source method that fits data using only few strong features and does not require extensive preprocessing.
Collapse
Affiliation(s)
- Vinícius Rezende Carvalho
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Postgraduate Program in Electrical Engineering, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | | | - Aria Fallah
- Department of Neurosurgery, University of California, Los Angeles, Los Angeles, CA, United States
| | - Terrence J. Sejnowski
- Computational Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, United States
- Institute for Neural Computation University of California, San Diego, La Jolla, CA, United States
- Department of Neurobiology, University of California, San Diego, La Jolla, CA, United States
| | - Lindy Comstock
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, CA, United States
| | - Claudia Lainscsek
- Computational Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, United States
- Institute for Neural Computation University of California, San Diego, La Jolla, CA, United States
| |
Collapse
|
2
|
Silva AB, Littlejohn KT, Liu JR, Moses DA, Chang EF. The speech neuroprosthesis. Nat Rev Neurosci 2024:10.1038/s41583-024-00819-9. [PMID: 38745103 DOI: 10.1038/s41583-024-00819-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/12/2024] [Indexed: 05/16/2024]
Abstract
Loss of speech after paralysis is devastating, but circumventing motor-pathway injury by directly decoding speech from intact cortical activity has the potential to restore natural communication and self-expression. Recent discoveries have defined how key features of speech production are facilitated by the coordinated activity of vocal-tract articulatory and motor-planning cortical representations. In this Review, we highlight such progress and how it has led to successful speech decoding, first in individuals implanted with intracranial electrodes for clinical epilepsy monitoring and subsequently in individuals with paralysis as part of early feasibility clinical trials to restore speech. We discuss high-spatiotemporal-resolution neural interfaces and the adaptation of state-of-the-art speech computational algorithms that have driven rapid and substantial progress in decoding neural activity into text, audible speech, and facial movements. Although restoring natural speech is a long-term goal, speech neuroprostheses already have performance levels that surpass communication rates offered by current assistive-communication technology. Given this accelerated rate of progress in the field, we propose key evaluation metrics for speed and accuracy, among others, to help standardize across studies. We finish by highlighting several directions to more fully explore the multidimensional feature space of speech and language, which will continue to accelerate progress towards a clinically viable speech neuroprosthesis.
Collapse
Affiliation(s)
- Alexander B Silva
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - Kaylo T Littlejohn
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Jessie R Liu
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - David A Moses
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
3
|
Wandelt SK, Bjånes DA, Pejsa K, Lee B, Liu C, Andersen RA. Representation of internal speech by single neurons in human supramarginal gyrus. Nat Hum Behav 2024:10.1038/s41562-024-01867-y. [PMID: 38740984 DOI: 10.1038/s41562-024-01867-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 03/16/2024] [Indexed: 05/16/2024]
Abstract
Speech brain-machine interfaces (BMIs) translate brain signals into words or audio outputs, enabling communication for people having lost their speech abilities due to diseases or injury. While important advances in vocalized, attempted and mimed speech decoding have been achieved, results for internal speech decoding are sparse and have yet to achieve high functionality. Notably, it is still unclear from which brain areas internal speech can be decoded. Here two participants with tetraplegia with implanted microelectrode arrays located in the supramarginal gyrus (SMG) and primary somatosensory cortex (S1) performed internal and vocalized speech of six words and two pseudowords. In both participants, we found significant neural representation of internal and vocalized speech, at the single neuron and population level in the SMG. From recorded population activity in the SMG, the internally spoken and vocalized words were significantly decodable. In an offline analysis, we achieved average decoding accuracies of 55% and 24% for each participant, respectively (chance level 12.5%), and during an online internal speech BMI task, we averaged 79% and 23% accuracy, respectively. Evidence of shared neural representations between internal speech, word reading and vocalized speech processes was found in participant 1. SMG represented words as well as pseudowords, providing evidence for phonetic encoding. Furthermore, our decoder achieved high classification with multiple internal speech strategies (auditory imagination/visual imagination). Activity in S1 was modulated by vocalized but not internal speech in both participants, suggesting no articulator movements of the vocal tract occurred during internal speech production. This work represents a proof-of-concept for a high-performance internal speech BMI.
Collapse
Affiliation(s)
- Sarah K Wandelt
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
- T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA.
| | - David A Bjånes
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA
- Rancho Los Amigos National Rehabilitation Center, Downey, CA, USA
| | - Kelsie Pejsa
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA
| | - Brian Lee
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA
- USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA, USA
| | - Charles Liu
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Rancho Los Amigos National Rehabilitation Center, Downey, CA, USA
- Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA
- USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA, USA
| | - Richard A Andersen
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
4
|
Hossain A, Khan P, Kader MF. Imagined speech classification exploiting EEG power spectrum features. Med Biol Eng Comput 2024:10.1007/s11517-024-03083-2. [PMID: 38632207 DOI: 10.1007/s11517-024-03083-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 03/26/2024] [Indexed: 04/19/2024]
Abstract
Imagined speech recognition has developed as a significant topic of research in the field of brain-computer interfaces. This innovative technique has great promise as a communication tool, providing essential help to those with impairments. An imagined speech recognition model is proposed in this paper to identify the ten most frequently used English alphabets (e.g., A, D, E, H, I, N, O, R, S, T) and numerals (e.g., 0 to 9). A novel electroencephalogram (EEG) dataset was created by measuring the brain activity of 30 people while they imagined these alphabets and digits. As part of signal preprocessing, EEG signals are filtered before extracting delta, theta, alpha, and beta band power features. These features are used as input for classification using support vector machines, k-nearest neighbors, and random forest (RF) classifiers. It is identified that the RF classifier outperformed the others in terms of classification accuracy. Classification accuracies of 99.38% and 95.39% were achieved at the coarse-level and fine-level, respectively with the RF classifier. From our study, it is also revealed that the beta frequency band and the frontal lobe of the brain played crucial roles in imagined speech recognition. Furthermore, a comparative analysis against state-of-the-art techniques is conducted to demonstrate the efficacy of our proposed model.
Collapse
Affiliation(s)
- Arman Hossain
- Department of Electrical and Electronic Engineering, University of Chittagong, Chittagong, 4331, Bangladesh
| | - Protima Khan
- Department of Electrical and Electronic Engineering, University of Chittagong, Chittagong, 4331, Bangladesh
| | - Md Fazlul Kader
- Department of Electrical and Electronic Engineering, University of Chittagong, Chittagong, 4331, Bangladesh.
| |
Collapse
|
5
|
LaRocco J, Tahmina Q, Lecian S, Moore J, Helbig C, Gupta S. Evaluation of an English language phoneme-based imagined speech brain computer interface with low-cost electroencephalography. Front Neuroinform 2023; 17:1306277. [PMID: 38192730 PMCID: PMC10773705 DOI: 10.3389/fninf.2023.1306277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 11/30/2023] [Indexed: 01/10/2024] Open
Abstract
Introduction Paralyzed and physically impaired patients face communication difficulties, even when they are mentally coherent and aware. Electroencephalographic (EEG) brain-computer interfaces (BCIs) offer a potential communication method for these people without invasive surgery or physical device controls. Methods Although virtual keyboard protocols are well documented in EEG BCI paradigms, these implementations are visually taxing and fatiguing. All English words combine 44 unique phonemes, each corresponding to a unique EEG pattern. In this study, a complete phoneme-based imagined speech EEG BCI was developed and tested on 16 subjects. Results Using open-source hardware and software, machine learning models, such as k-nearest neighbor (KNN), reliably achieved a mean accuracy of 97 ± 0.001%, a mean F1 of 0.55 ± 0.01, and a mean AUC-ROC of 0.68 ± 0.002 in a modified one-versus-rest configuration, resulting in an information transfer rate of 304.15 bits per minute. In line with prior literature, the distinguishing feature between phonemes was the gamma power on channels F3 and F7. Discussion However, adjustments to feature selection, trial window length, and classifier algorithms may improve performance. In summary, these are iterative changes to a viable method directly deployable in current, commercially available systems and software. The development of an intuitive phoneme-based EEG BCI with open-source hardware and software demonstrates the potential ease with which the technology could be deployed in real-world applications.
Collapse
Affiliation(s)
- John LaRocco
- Wexner Medical Center, The Ohio State University, Columbus, OH, United States
| | - Qudsia Tahmina
- Department of Electrical Engineering, The Ohio State University, Columbus, OH, United States
| | - Sam Lecian
- Department of Electrical Engineering, The Ohio State University, Columbus, OH, United States
| | - Jason Moore
- Department of Electrical Engineering, The Ohio State University, Columbus, OH, United States
| | - Cole Helbig
- Department of Electrical Engineering, The Ohio State University, Columbus, OH, United States
| | - Surya Gupta
- Department of Electrical Engineering, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
6
|
Canny E, Vansteensel MJ, van der Salm SMA, Müller-Putz GR, Berezutskaya J. Boosting brain-computer interfaces with functional electrical stimulation: potential applications in people with locked-in syndrome. J Neuroeng Rehabil 2023; 20:157. [PMID: 37980536 PMCID: PMC10656959 DOI: 10.1186/s12984-023-01272-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 10/23/2023] [Indexed: 11/20/2023] Open
Abstract
Individuals with a locked-in state live with severe whole-body paralysis that limits their ability to communicate with family and loved ones. Recent advances in brain-computer interface (BCI) technology have presented a potential alternative for these people to communicate by detecting neural activity associated with attempted hand or speech movements and translating the decoded intended movements to a control signal for a computer. A technique that could potentially enrich the communication capacity of BCIs is functional electrical stimulation (FES) of paralyzed limbs and face to restore body and facial movements of paralyzed individuals, allowing to add body language and facial expression to communication BCI utterances. Here, we review the current state of the art of existing BCI and FES work in people with paralysis of body and face and propose that a combined BCI-FES approach, which has already proved successful in several applications in stroke and spinal cord injury, can provide a novel promising mode of communication for locked-in individuals.
Collapse
Affiliation(s)
- Evan Canny
- Department of Neurology and Neurosurgery, Brain Center, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Mariska J Vansteensel
- Department of Neurology and Neurosurgery, Brain Center, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Sandra M A van der Salm
- Department of Neurology and Neurosurgery, Brain Center, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Gernot R Müller-Putz
- Institute of Neural Engineering, Laboratory of Brain-Computer Interfaces, Graz University of Technology, Graz, Austria
| | - Julia Berezutskaya
- Department of Neurology and Neurosurgery, Brain Center, University Medical Center Utrecht, Utrecht, The Netherlands.
| |
Collapse
|
7
|
Park HJ, Lee B. Multiclass classification of imagined speech EEG using noise-assisted multivariate empirical mode decomposition and multireceptive field convolutional neural network. Front Hum Neurosci 2023; 17:1186594. [PMID: 37645689 PMCID: PMC10461632 DOI: 10.3389/fnhum.2023.1186594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 07/21/2023] [Indexed: 08/31/2023] Open
Abstract
Introduction In this study, we classified electroencephalography (EEG) data of imagined speech using signal decomposition and multireceptive convolutional neural network. The imagined speech EEG with five vowels /a/, /e/, /i/, /o/, and /u/, and mute (rest) sounds were obtained from ten study participants. Materials and methods First, two different signal decomposition methods were applied for comparison: noise-assisted multivariate empirical mode decomposition and wavelet packet decomposition. Six statistical features were calculated from the decomposed eight sub-frequency bands EEG. Next, all features obtained from each channel of the trial were vectorized and used as the input vector of classifiers. Lastly, EEG was classified using multireceptive field convolutional neural network and several other classifiers for comparison. Results We achieved an average classification rate of 73.09 and up to 80.41% in a multiclass (six classes) setup (Chance: 16.67%). In comparison with various other classifiers, significant improvements for other classifiers were achieved (p-value < 0.05). From the frequency sub-band analysis, high-frequency band regions and the lowest-frequency band region contain more information about imagined vowel EEG data. The misclassification and classification rate of each vowel imaginary EEG was analyzed through a confusion matrix. Discussion Imagined speech EEG can be classified successfully using the proposed signal decomposition method and a convolutional neural network. The proposed classification method for imagined speech EEG can contribute to developing a practical imagined speech-based brain-computer interfaces system.
Collapse
Affiliation(s)
- Hyeong-jun Park
- Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Boreom Lee
- Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
- AI Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| |
Collapse
|
8
|
Nitta T, Horikawa J, Iribe Y, Taguchi R, Katsurada K, Shinohara S, Kawai G. Linguistic representation of vowels in speech imagery EEG. Front Hum Neurosci 2023; 17:1163578. [PMID: 37275343 PMCID: PMC10237317 DOI: 10.3389/fnhum.2023.1163578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 04/27/2023] [Indexed: 06/07/2023] Open
Abstract
Speech imagery recognition from electroencephalograms (EEGs) could potentially become a strong contender among non-invasive brain-computer interfaces (BCIs). In this report, first we extract language representations as the difference of line-spectra of phones by statistically analyzing many EEG signals from the Broca area. Then we extract vowels by using iterative search from hand-labeled short-syllable data. The iterative search process consists of principal component analysis (PCA) that visualizes linguistic representation of vowels through eigen-vectors φ(m), and subspace method (SM) that searches an optimum line-spectrum for redesigning φ(m). The extracted linguistic representation of Japanese vowels /i/ /e/ /a/ /o/ /u/ shows 2 distinguished spectral peaks (P1, P2) in the upper frequency range. The 5 vowels are aligned on the P1-P2 chart. A 5-vowel recognition experiment using a data set of 5 subjects and a convolutional neural network (CNN) classifier gave a mean accuracy rate of 72.6%.
Collapse
Affiliation(s)
- Tsuneo Nitta
- Graduate School of Engineering, Toyohashi University of Technology, Toyohashi, Japan
| | - Junsei Horikawa
- Graduate School of Engineering, Toyohashi University of Technology, Toyohashi, Japan
| | - Yurie Iribe
- Graduate School of Information Science and Technology, Aichi Prefectural University, Nagakute, Japan
| | - Ryo Taguchi
- Graduate School of Information, Nagoya Institute of Technology, Nagoya, Japan
| | - Kouichi Katsurada
- Faculty of Science and Technology, Tokyo University of Science, Noda, Japan
| | - Shuji Shinohara
- School of Science and Engineering, Tokyo Denki University, Saitama, Japan
| | - Goh Kawai
- Online Learning Support Team, Tokyo University of Foreign Studies, Tokyo, Japan
| |
Collapse
|