1
|
Li X, Dong X, Wang J, Mao H, Tu X, Li W, He J, Li Q, Zhang P. Deep transfer learning-based decoder calibration for intracortical brain-machine interfaces. Comput Biol Med 2025; 192:110231. [PMID: 40262392 DOI: 10.1016/j.compbiomed.2025.110231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Revised: 04/03/2025] [Accepted: 04/16/2025] [Indexed: 04/24/2025]
Abstract
Intracortical brain-machine interfaces (iBMIs) aim to establish a communication path between the brain and external devices. However, in the daily use of iBMIs, the non-stationarity of recorded neural signals necessitates frequent recalibration of the iBMI decoder to maintain decoding performance, which requires collecting and labeling a large amount of new data. To address this challenge and minimize the time needed for decoder recalibration, we proposed an active learning domain adversarial neural network (AL-DANN). This model leveraged a substantial volume of historical data alongside a small amount of current data (four samples per category) to calibrate the decoder. By incorporating domain adversarial and active learning strategies, the model effectively transferred knowledge from historical data to new data, reducing the demand for new samples. We validated the proposed method using neural signals recorded from three monkeys performing different movements in a classification task or a regression task. The results showed that the AL-DANN outperformed existing state-of-the-art methods. Impressively, it required only four new samples per category for decoder recalibration, leading to a recalibration time reduction of over 80 %. To our knowledge, this is the first study to incorporate deep transfer learning into iBMI decoder calibration, highlighting the significant potential of applying deep learning technologies in iBMIs.
Collapse
Affiliation(s)
- Xiao Li
- Hubei Key Laboratory of Modern Manufacturing Quantity Engineering, School of Mechanical Engineering, Hubei University of Technology, Wuhan, 430068, China
| | - Xianxin Dong
- Hubei Key Laboratory of Modern Manufacturing Quantity Engineering, School of Mechanical Engineering, Hubei University of Technology, Wuhan, 430068, China
| | - Jun Wang
- Hubei Key Laboratory of Modern Manufacturing Quantity Engineering, School of Mechanical Engineering, Hubei University of Technology, Wuhan, 430068, China
| | - Haodong Mao
- Britton Chance Center for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei, China; MoE Key Laboratory for Biomedical Photonics, Collaborative Innovation Center for Biomedical Engineering, School of Engineering Sciences, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Xikai Tu
- Hubei Key Laboratory of Modern Manufacturing Quantity Engineering, School of Mechanical Engineering, Hubei University of Technology, Wuhan, 430068, China
| | - Wei Li
- The Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Jiping He
- Advanced Innovation Center for Intelligent Robots and Systems, Beijing Institute of Technology, Beijing, China
| | - Qiang Li
- Britton Chance Center for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei, China; MoE Key Laboratory for Biomedical Photonics, Collaborative Innovation Center for Biomedical Engineering, School of Engineering Sciences, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Peng Zhang
- Britton Chance Center for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei, China; MoE Key Laboratory for Biomedical Photonics, Collaborative Innovation Center for Biomedical Engineering, School of Engineering Sciences, Huazhong University of Science and Technology, Wuhan, Hubei, China.
| |
Collapse
|
2
|
Stavisky SD. Restoring Speech Using Brain-Computer Interfaces. Annu Rev Biomed Eng 2025; 27:29-54. [PMID: 39745941 DOI: 10.1146/annurev-bioeng-110122-012818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
Abstract
People who have lost the ability to speak due to neurological injuries would greatly benefit from assistive technology that provides a fast, intuitive, and naturalistic means of communication. This need can be met with brain-computer interfaces (BCIs): medical devices that bypass injured parts of the nervous system and directly transform neural activity into outputs such as text or sound. BCIs for restoring movement and typing have progressed rapidly in recent clinical trials; speech BCIs are the next frontier. This review covers the clinical need for speech BCIs, surveys foundational studies that point to where and how speech can be decoded in the brain, describes recent progress in both discrete and continuous speech decoding and closed-loop speech BCIs, provides metrics for assessing these systems' performance, and highlights key remaining challenges on the road toward clinically useful speech neuroprostheses.
Collapse
Affiliation(s)
- Sergey D Stavisky
- Department of Neurological Surgery, University of California, Davis, California, USA;
| |
Collapse
|
3
|
Feng C, Cao L, Wu D, Zhang E, Wang T, Jiang X, Chen J, Wu H, Lin S, Hou Q, Zhu J, Yang J, Sawan M, Zhang Y. Acoustic Inspired Brain-to-Sentence Decoder for Logosyllabic Language. CYBORG AND BIONIC SYSTEMS 2025; 6:0257. [PMID: 40302941 PMCID: PMC12038182 DOI: 10.34133/cbsystems.0257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Revised: 02/16/2025] [Accepted: 03/18/2025] [Indexed: 05/02/2025] Open
Abstract
Recent advances in brain-computer interfaces (BCIs) have demonstrated the potential to decode language from brain activity into sound or text, which has predominantly focused on alphabetic languages, such as English. However, logosyllabic languages, such as Mandarin Chinese, present marked challenges for establishing decoders that cover all characters, due to its unique syllable structures, extended character sets (e.g., over 50,000 characters for Mandarin Chinese), and complex mappings between characters and syllables, thus hindering practical applications. Here, we leverage the acoustic features of Mandarin Chinese syllables, constructing prediction models for syllable components (initials, tones, and finals), and decode speech-related stereoelectroencephalography (sEEG) signals into coherent Chinese sentences. The results demonstrate a high sentence-level offline decoding performance with a median character accuracy of 71.00% over the full spectrum of characters in the best participant. We also verified that incorporating acoustic-related features into the design of prediction models substantially enhances the accuracy of initials, tones, and finals. Moreover, our findings revealed that effective speech decoding also involves subcortical structures like the thalamus in addition to traditional language-related brain regions. Overall, we established a brain-to-sentence decoder for logosyllabic languages over full character set with a large intracranial electroencephalography dataset.
Collapse
Affiliation(s)
- Chen Feng
- Department of Neurosurgery,
The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, Zhejiang Province, China
- School of Engineering,
Westlake University, Hangzhou, Zhejiang Province, China
- Center of Excellence in Biomedical Research on Advanced Integrated-on-chips Neurotechnologies (CenBRAIN), School of Engineering,
Westlake University, Hangzhou, China
| | - Lu Cao
- School of Engineering,
Westlake University, Hangzhou, Zhejiang Province, China
| | - Di Wu
- School of Engineering,
Westlake University, Hangzhou, Zhejiang Province, China
- Center of Excellence in Biomedical Research on Advanced Integrated-on-chips Neurotechnologies (CenBRAIN), School of Engineering,
Westlake University, Hangzhou, China
| | - En Zhang
- State Key Laboratory of Cognitive Neuroscience and Learning,
Beijing Normal University, Beijing, China
| | - Ting Wang
- School of Foreign Languages,
Tongji University, Shanghai, China
- Center for Speech and Language Processing,
Tongji University, Shanghai, China
| | - Xiaowei Jiang
- Australian AI Institute, School of Computer Science, Faculty of Engineering and Information Technology,
University of Technology Sydney, Sydney, Australia
| | - Jinbo Chen
- School of Engineering,
Westlake University, Hangzhou, Zhejiang Province, China
- Center of Excellence in Biomedical Research on Advanced Integrated-on-chips Neurotechnologies (CenBRAIN), School of Engineering,
Westlake University, Hangzhou, China
| | - Hui Wu
- School of Engineering,
Westlake University, Hangzhou, Zhejiang Province, China
- Center of Excellence in Biomedical Research on Advanced Integrated-on-chips Neurotechnologies (CenBRAIN), School of Engineering,
Westlake University, Hangzhou, China
| | - Siyu Lin
- School of Engineering,
Westlake University, Hangzhou, Zhejiang Province, China
- Center of Excellence in Biomedical Research on Advanced Integrated-on-chips Neurotechnologies (CenBRAIN), School of Engineering,
Westlake University, Hangzhou, China
| | - Qiming Hou
- School of Engineering,
Westlake University, Hangzhou, Zhejiang Province, China
- Center of Excellence in Biomedical Research on Advanced Integrated-on-chips Neurotechnologies (CenBRAIN), School of Engineering,
Westlake University, Hangzhou, China
| | - Junming Zhu
- Department of Neurosurgery,
The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, Zhejiang Province, China
- Key Laboratory of Precise Treatment and Clinical Translational Research of Neurological Diseases of Zhejiang Province, Hangzhou, China
| | - Jie Yang
- School of Engineering,
Westlake University, Hangzhou, Zhejiang Province, China
- Center of Excellence in Biomedical Research on Advanced Integrated-on-chips Neurotechnologies (CenBRAIN), School of Engineering,
Westlake University, Hangzhou, China
| | - Mohamad Sawan
- School of Engineering,
Westlake University, Hangzhou, Zhejiang Province, China
- Center of Excellence in Biomedical Research on Advanced Integrated-on-chips Neurotechnologies (CenBRAIN), School of Engineering,
Westlake University, Hangzhou, China
| | - Yue Zhang
- School of Engineering,
Westlake University, Hangzhou, Zhejiang Province, China
| |
Collapse
|
4
|
He T, Wei M, Wang R, Wang R, Du S, Cai S, Tao W, Li H. VocalMind: A Stereotactic EEG Dataset for Vocalized, Mimed, and Imagined Speech in Tonal Language. Sci Data 2025; 12:657. [PMID: 40253415 PMCID: PMC12009324 DOI: 10.1038/s41597-025-04741-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 02/28/2025] [Indexed: 04/21/2025] Open
Abstract
Speech BCIs based on implanted electrodes hold significant promise for enhancing spoken communication through high temporal resolution and invasive neural sensing. Despite the potential, acquiring such data is challenging due to its invasive nature, and publicly available datasets, particularly for tonal languages, are limited. In this study, we introduce VocalMind, a stereotactic electroencephalography (sEEG) dataset focused on Mandarin Chinese, a tonal language. This dataset includes sEEG-speech parallel recordings from three distinct speech modes, namely vocalized speech, mimed speech, and imagined speech, at both word and sentence levels, totaling over one hour of intracranial neural recordings related to speech production. This paper also presents a baseline model as the reference model for future studies, at the same time, ensuring the integrity of the dataset. The diversity of tasks and the substantial data volume provide a valuable resource for developing advanced algorithms for speech decoding, thereby advancing BCI research for spoken communication.
Collapse
Affiliation(s)
- Tianyu He
- School of Data Science, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P. R. China
| | - Mingyi Wei
- Department of Neurosurgery, South China Hospital, Medical School, Shenzhen University, Shenzhen, 518116, P. R. China
| | - Ruicong Wang
- School of Data Science, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P. R. China
| | - Renzhi Wang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P. R. China
| | - Shiwei Du
- Department of Neurosurgery, South China Hospital, Medical School, Shenzhen University, Shenzhen, 518116, P. R. China
| | - Siqi Cai
- School of Data Science, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P. R. China.
| | - Wei Tao
- Department of Neurosurgery, South China Hospital, Medical School, Shenzhen University, Shenzhen, 518116, P. R. China.
| | - Haizhou Li
- School of Data Science, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P. R. China.
| |
Collapse
|
5
|
Littlejohn KT, Cho CJ, Liu JR, Silva AB, Yu B, Anderson VR, Kurtz-Miott CM, Brosler S, Kashyap AP, Hallinan IP, Shah A, Tu-Chan A, Ganguly K, Moses DA, Chang EF, Anumanchipalli GK. A streaming brain-to-voice neuroprosthesis to restore naturalistic communication. Nat Neurosci 2025; 28:902-912. [PMID: 40164740 DOI: 10.1038/s41593-025-01905-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 01/29/2025] [Indexed: 04/02/2025]
Abstract
Natural spoken communication happens instantaneously. Speech delays longer than a few seconds can disrupt the natural flow of conversation. This makes it difficult for individuals with paralysis to participate in meaningful dialogue, potentially leading to feelings of isolation and frustration. Here we used high-density surface recordings of the speech sensorimotor cortex in a clinical trial participant with severe paralysis and anarthria to drive a continuously streaming naturalistic speech synthesizer. We designed and used deep learning recurrent neural network transducer models to achieve online large-vocabulary intelligible fluent speech synthesis personalized to the participant's preinjury voice with neural decoding in 80-ms increments. Offline, the models demonstrated implicit speech detection capabilities and could continuously decode speech indefinitely, enabling uninterrupted use of the decoder and further increasing speed. Our framework also successfully generalized to other silent-speech interfaces, including single-unit recordings and electromyography. Our findings introduce a speech-neuroprosthetic paradigm to restore naturalistic spoken communication to people with paralysis.
Collapse
Affiliation(s)
- Kaylo T Littlejohn
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - Cheol Jun Cho
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - Jessie R Liu
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - Alexander B Silva
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
- Graduate Program in Bioengineering, University of California, Berkeley-University of California, San Francisco, Berkeley, CA, USA
| | - Bohan Yu
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Vanessa R Anderson
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
| | - Cady M Kurtz-Miott
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
| | - Samantha Brosler
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
- Graduate Program in Bioengineering, University of California, Berkeley-University of California, San Francisco, Berkeley, CA, USA
| | - Anshul P Kashyap
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Irina P Hallinan
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
| | - Adit Shah
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Adelyn Tu-Chan
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Karunesh Ganguly
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - David A Moses
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA.
- Graduate Program in Bioengineering, University of California, Berkeley-University of California, San Francisco, Berkeley, CA, USA.
| | - Gopala K Anumanchipalli
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA.
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
6
|
Khan S, Kallis L, Mee H, El Hadwe S, Barone D, Hutchinson P, Kolias A. Invasive Brain-Computer Interface for Communication: A Scoping Review. Brain Sci 2025; 15:336. [PMID: 40309789 PMCID: PMC12026362 DOI: 10.3390/brainsci15040336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Revised: 03/10/2025] [Accepted: 03/19/2025] [Indexed: 05/02/2025] Open
Abstract
BACKGROUND The rapid expansion of the brain-computer interface for patients with neurological deficits has garnered significant interest, and for patients, it provides an additional route where conventional rehabilitation has its limits. This has particularly been the case for patients who lose the ability to communicate. Circumventing neural injuries by recording from the intact cortex and subcortex has the potential to allow patients to communicate and restore self-expression. Discoveries over the last 10-15 years have been possible through advancements in technology, neuroscience, and computing. By examining studies involving intracranial brain-computer interfaces that aim to restore communication, we aimed to explore the advances made and explore where the technology is heading. METHODS For this scoping review, we systematically searched PubMed and OVID Embase. After processing the articles, the search yielded 41 articles that we included in this review. RESULTS The articles predominantly assessed patients who had either suffered from amyotrophic lateral sclerosis, cervical cord injury, or brainstem stroke, resulting in tetraplegia and, in some cases, difficulty speaking. Of the intracranial implants, ten had ALS, six had brainstem stroke, and thirteen had a spinal cord injury. Stereoelectroencephalography was also used, but the results, whilst promising, are still in their infancy. Studies involving patients who were moving cursors on a screen could improve the speed of movement by optimising the interface and utilising better decoding methods. In recent years, intracortical devices have been successfully used for accurate speech-to-text and speech-to-audio decoding in patients who are unable to speak. CONCLUSIONS Here, we summarise the progress made by BCIs used for communication. Speech decoding directly from the cortex can provide a novel therapeutic method to restore full, embodied communication to patients suffering from tetraplegia who otherwise cannot communicate.
Collapse
Affiliation(s)
- Shujhat Khan
- Department of Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK; (S.K.); (H.M.); (S.E.H.); (D.B.); (P.H.)
| | - Leonie Kallis
- Department of Medicine, University of Cambridge, Trinity Ln, Cambridge CB2 1TN, UK;
| | - Harry Mee
- Department of Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK; (S.K.); (H.M.); (S.E.H.); (D.B.); (P.H.)
- Department of Rehabilitation, Addenbrookes Hospital, Hills Rd., Cambridge CB2 0QQ, UK
| | - Salim El Hadwe
- Department of Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK; (S.K.); (H.M.); (S.E.H.); (D.B.); (P.H.)
- Bioelectronics Laboratory, Department of Electrical Engineering, University of Cambridge, Cambridge CB2 1PZ, UK
| | - Damiano Barone
- Department of Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK; (S.K.); (H.M.); (S.E.H.); (D.B.); (P.H.)
- Department of Neurosurgery, Houston Methodist, Houston, TX 77079, USA
| | - Peter Hutchinson
- Department of Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK; (S.K.); (H.M.); (S.E.H.); (D.B.); (P.H.)
- Department of Neurosurgery, Addenbrookes Hospital, Hills Rd., Cambridge CB2 0QQ, UK
| | - Angelos Kolias
- Department of Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK; (S.K.); (H.M.); (S.E.H.); (D.B.); (P.H.)
- Department of Neurosurgery, Addenbrookes Hospital, Hills Rd., Cambridge CB2 0QQ, UK
| |
Collapse
|
7
|
Verwoert M, Amigó-Vega J, Gao Y, Ottenhoff MC, Kubben PL, Herff C. Whole-brain dynamics of articulatory, acoustic and semantic speech representations. Commun Biol 2025; 8:432. [PMID: 40082683 PMCID: PMC11906857 DOI: 10.1038/s42003-025-07862-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 02/28/2025] [Indexed: 03/16/2025] Open
Abstract
Speech production is a complex process that traverses several representations, from the meaning of spoken words (semantic), through the movement of articulatory muscles (articulatory) and, ultimately, to the produced audio waveform (acoustic). In this study, we identify how these different representations of speech are spatially and temporally distributed throughout the depth of the brain. Intracranial neural data is recorded from 15 participants, across 1647 electrode contacts, while overtly speaking 100 unique words. We find a bilateral spatial distribution for all three representations, with a more widespread and temporally dynamic distribution in the left compared to the right hemisphere. The articulatory and acoustic representations share a similar spatial distribution surrounding the Sylvian fissure, while the semantic representation is more widely distributed across the brain in a mostly distinct network. These results highlight the distributed nature of the speech production neural process and the potential of non-motor representations for speech brain-computer interfaces.
Collapse
Affiliation(s)
- Maxime Verwoert
- Department of Neurosurgery, Mental Health and Neuroscience Research Institute, Maastricht University, Maastricht, The Netherlands.
| | - Joaquín Amigó-Vega
- Department of Neurosurgery, Mental Health and Neuroscience Research Institute, Maastricht University, Maastricht, The Netherlands
- Computer Science Department, Gran Sasso Science Institute, L'Aquila, Italy
| | - Yingming Gao
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
| | - Maarten C Ottenhoff
- Department of Neurosurgery, Mental Health and Neuroscience Research Institute, Maastricht University, Maastricht, The Netherlands
| | - Pieter L Kubben
- Department of Neurosurgery, Mental Health and Neuroscience Research Institute, Maastricht University, Maastricht, The Netherlands
- Academic Center for Epileptology, Kempenhaeghe/Maastricht University Medical Center, Maastricht, The Netherlands
| | - Christian Herff
- Department of Neurosurgery, Mental Health and Neuroscience Research Institute, Maastricht University, Maastricht, The Netherlands.
| |
Collapse
|
8
|
Bhadra K, Giraud AL, Marchesotti S. Learning to operate an imagined speech Brain-Computer Interface involves the spatial and frequency tuning of neural activity. Commun Biol 2025; 8:271. [PMID: 39979463 PMCID: PMC11842755 DOI: 10.1038/s42003-025-07464-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 01/03/2025] [Indexed: 02/22/2025] Open
Abstract
Brain-Computer Interfaces (BCI) will revolutionize the way people with severe impairment of speech production can communicate. While current efforts focus on training classifiers on vast amounts of neurophysiological signals to decode imagined speech, much less attention has been given to users' ability to adapt their neural activity to improve BCI-control. To address whether BCI-control improves with training and characterize the underlying neural dynamics, we trained 15 healthy participants to operate a binary BCI system based on electroencephalography (EEG) signals through syllable imagery for five consecutive days. Despite considerable interindividual variability in performance and learning, a significant improvement in BCI-control was globally observed. Using a control experiment, we show that a continuous feedback about the decoded activity is necessary for learning to occur. Performance improvement was associated with a broad EEG power increase in frontal theta activity and focal enhancement in temporal low-gamma activity, showing that learning to operate an imagined-speech BCI involves dynamic changes in neural features at different spectral scales. These findings demonstrate that combining machine and human learning is a successful strategy to enhance BCI controllability.
Collapse
Affiliation(s)
- Kinkini Bhadra
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Université Paris Cité, Institut Pasteur, AP-HP, Inserm, Fondation Pour l'Audition, Institut de l'Audition, IHU reConnect, Paris, France
| | - Silvia Marchesotti
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland.
| |
Collapse
|
9
|
Tankus A, Stern E, Klein G, Kaptzon N, Nash L, Marziano T, Shamia O, Gurevitch G, Bergman L, Goldstein L, Fahoum F, Strauss I. A Speech Neuroprosthesis in the Frontal Lobe and Hippocampus: Decoding High-Frequency Activity into Phonemes. Neurosurgery 2025; 96:356-364. [PMID: 38934637 DOI: 10.1227/neu.0000000000003068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 05/05/2024] [Indexed: 06/28/2024] Open
Abstract
BACKGROUND AND OBJECTIVES Loss of speech due to injury or disease is devastating. Here, we report a novel speech neuroprosthesis that artificially articulates building blocks of speech based on high-frequency activity in brain areas never harnessed for a neuroprosthesis before: anterior cingulate and orbitofrontal cortices, and hippocampus. METHODS A 37-year-old male neurosurgical epilepsy patient with intact speech, implanted with depth electrodes for clinical reasons only, silently controlled the neuroprosthesis almost immediately and in a natural way to voluntarily produce 2 vowel sounds. RESULTS During the first set of trials, the participant made the neuroprosthesis produce the different vowel sounds artificially with 85% accuracy. In the following trials, performance improved consistently, which may be attributed to neuroplasticity. We show that a neuroprosthesis trained on overt speech data may be controlled silently. CONCLUSION This may open the way for a novel strategy of neuroprosthesis implantation at earlier disease stages (eg, amyotrophic lateral sclerosis), while speech is intact, for improved training that still allows silent control at later stages. The results demonstrate clinical feasibility of direct decoding of high-frequency activity that includes spiking activity in the aforementioned areas for silent production of phonemes that may serve as a part of a neuroprosthesis for replacing lost speech control pathways.
Collapse
Affiliation(s)
- Ariel Tankus
- Functional Neurosurgery Unit, Tel Aviv Sourasky Medical Center, Tel Aviv , Israel
- Department of Neurology and Neurosurgery, School of Medicine, Tel Aviv University, Tel Aviv , Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv , Israel
| | - Einat Stern
- Department of Neurology and Neurosurgery, School of Medicine, Tel Aviv University, Tel Aviv , Israel
| | - Guy Klein
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv , Israel
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv , Israel
| | - Nufar Kaptzon
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv , Israel
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv , Israel
| | - Lilac Nash
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv , Israel
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv , Israel
| | - Tal Marziano
- School of Electrical Engineering, Iby and Aladar Fleischman Faculty of Engineering, Tel Aviv University, Tel Aviv , Israel
| | - Omer Shamia
- School of Electrical Engineering, Iby and Aladar Fleischman Faculty of Engineering, Tel Aviv University, Tel Aviv , Israel
| | - Guy Gurevitch
- Sagol Brain Institute, Tel-Aviv Sourasky Medical Center, Tel-Aviv , Israel
- Department of Physiology and Pharmacology, Faculty of Medicine, Tel Aviv University, Tel Aviv , Israel
| | - Lottem Bergman
- Functional Neurosurgery Unit, Tel Aviv Sourasky Medical Center, Tel Aviv , Israel
| | - Lilach Goldstein
- Department of Neurology, Tel Aviv Sourasky Medical Center, Tel Aviv , Israel
| | - Firas Fahoum
- Department of Neurology and Neurosurgery, School of Medicine, Tel Aviv University, Tel Aviv , Israel
- Department of Neurology, Tel Aviv Sourasky Medical Center, Tel Aviv , Israel
| | - Ido Strauss
- Functional Neurosurgery Unit, Tel Aviv Sourasky Medical Center, Tel Aviv , Israel
- Department of Neurology and Neurosurgery, School of Medicine, Tel Aviv University, Tel Aviv , Israel
| |
Collapse
|
10
|
Chen J, Chen X, Wang R, Le C, Khalilian-Gourtani A, Jensen E, Dugan P, Doyle W, Devinsky O, Friedman D, Flinker A, Wang Y. Transformer-based neural speech decoding from surface and depth electrode signals. J Neural Eng 2025; 22:016017. [PMID: 39819752 PMCID: PMC11773629 DOI: 10.1088/1741-2552/adab21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 11/24/2024] [Accepted: 01/15/2025] [Indexed: 01/19/2025]
Abstract
Objective.This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e. Electrocorticographic (ECoG) or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface ECoG and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements. The model should not have subject-specific layers and the trained model should perform well on participants unseen during training.Approach.We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train subject-specific models using data from a single participant and multi-subject models exploiting data from multiple participants.Main results.The subject-specific models using only low-density 8 × 8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC = 0.817), overN= 43 participants, significantly outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N= 39) led to further improvement (PCC = 0.838). For participants with only sEEG electrodes (N= 9), subject-specific models still enjoy comparable performance with an average PCC = 0.798. A single multi-subject model trained on ECoG data from 15 participants yielded comparable results (PCC = 0.837) as 15 models trained individually for these participants (PCC = 0.831). Furthermore, the multi-subject models achieved high performance on unseen participants, with an average PCC = 0.765 in leave-one-out cross-validation.Significance.The proposed SwinTW decoder enables future speech decoding approaches to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. The success of the single multi-subject model when tested on participants within the training cohort demonstrates that the model architecture is capable of exploiting data from multiple participants with diverse electrode placements. The architecture's flexibility in training with both single-subject and multi-subject data, as well as grid and non-grid electrodes, ensures its broad applicability. Importantly, the generalizability of the multi-subject models in our study population suggests that a model trained using paired acoustic and neural data from multiple patients can potentially be applied to new patients with speech disability where acoustic-neural training data is not feasible.
Collapse
Affiliation(s)
- Junbo Chen
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
| | - Xupeng Chen
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
| | - Ran Wang
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
| | - Chenqian Le
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
| | | | - Erika Jensen
- Neurology Department, New York University, 223 East 34th Street, Manhattan, NY 10016, United States of America
| | - Patricia Dugan
- Neurology Department, New York University, 223 East 34th Street, Manhattan, NY 10016, United States of America
| | - Werner Doyle
- Neurosurgery Department, New York University, 550 1st Avenue, Manhattan, NY 10016, United States of America
| | - Orrin Devinsky
- Neurology Department, New York University, 223 East 34th Street, Manhattan, NY 10016, United States of America
| | - Daniel Friedman
- Neurology Department, New York University, 223 East 34th Street, Manhattan, NY 10016, United States of America
| | - Adeen Flinker
- Neurology Department, New York University, 223 East 34th Street, Manhattan, NY 10016, United States of America
- Biomedical Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
| | - Yao Wang
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
- Biomedical Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
| |
Collapse
|
11
|
Chen J, Chen X, Wang R, Le C, Khalilian-Gourtani A, Jensen E, Dugan P, Doyle W, Devinsky O, Friedman D, Flinker A, Wang Y. Subject-Agnostic Transformer-Based Neural Speech Decoding from Surface and Depth Electrode Signals. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.11.584533. [PMID: 38559163 PMCID: PMC10980022 DOI: 10.1101/2024.03.11.584533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Objective This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e., Electrocorticographic or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface (ECoG) and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements and the trained model should perform well on participants unseen during training. Approach We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train subject-specific models using data from a single participant and multi-patient models exploiting data from multiple participants. Main Results The subject-specific models using only low-density 8x8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC=0.817), over N=43 participants, outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N=39) led to further improvement (PCC=0.838). For participants with only sEEG electrodes (N=9), subject-specific models still enjoy comparable performance with an average PCC=0.798. The multi-subject models achieved high performance on unseen participants, with an average PCC=0.765 in leave-one-out cross-validation. Significance The proposed SwinTW decoder enables future speech neuroprostheses to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. Importantly, the generalizability of the multi-patient models suggests that such a model can be applied to new patients that do not have paired acoustic and neural data, providing an advance in neuroprostheses for people with speech disability, where acoustic-neural training data is not feasible.
Collapse
Affiliation(s)
- Junbo Chen
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
| | - Xupeng Chen
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
| | - Ran Wang
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
| | - Chenqian Le
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
| | | | - Erika Jensen
- Neurology Department, New York University, 223 East 34th Street, Manhattan, 10016, NY, USA
| | - Patricia Dugan
- Neurology Department, New York University, 223 East 34th Street, Manhattan, 10016, NY, USA
| | - Werner Doyle
- Neurosurgery Department, New York University, 550 1st Avenue, Manhattan, 10016, NY, USA
| | - Orrin Devinsky
- Neurology Department, New York University, 223 East 34th Street, Manhattan, 10016, NY, USA
| | - Daniel Friedman
- Neurology Department, New York University, 223 East 34th Street, Manhattan, 10016, NY, USA
| | - Adeen Flinker
- Neurology Department, New York University, 223 East 34th Street, Manhattan, 10016, NY, USA
- Biomedical Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
| | - Yao Wang
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
- Biomedical Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
| |
Collapse
|
12
|
de Borman A, Wittevrongel B, Dauwe I, Carrette E, Meurs A, Van Roost D, Boon P, Van Hulle MM. Imagined speech event detection from electrocorticography and its transfer between speech modes and subjects. Commun Biol 2024; 7:818. [PMID: 38969758 PMCID: PMC11226700 DOI: 10.1038/s42003-024-06518-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 06/27/2024] [Indexed: 07/07/2024] Open
Abstract
Speech brain-computer interfaces aim to support communication-impaired patients by translating neural signals into speech. While impressive progress was achieved in decoding performed, perceived and attempted speech, imagined speech remains elusive, mainly due to the absence of behavioral output. Nevertheless, imagined speech is advantageous since it does not depend on any articulator movements that might become impaired or even lost throughout the stages of a neurodegenerative disease. In this study, we analyzed electrocortigraphy data recorded from 16 participants in response to 3 speech modes: performed, perceived (listening), and imagined speech. We used a linear model to detect speech events and examined the contributions of each frequency band, from delta to high gamma, given the speech mode and electrode location. For imagined speech detection, we observed a strong contribution of gamma bands in the motor cortex, whereas lower frequencies were more prominent in the temporal lobe, in particular of the left hemisphere. Based on the similarities in frequency patterns, we were able to transfer models between speech modes and participants with similar electrode locations.
Collapse
Affiliation(s)
- Aurélie de Borman
- Laboratory for Neuro- and Psychophysiology, KU Leuven, Leuven, Belgium.
| | | | - Ine Dauwe
- Department of Neurology, Ghent University Hospital, Ghent, Belgium
| | - Evelien Carrette
- Department of Neurology, Ghent University Hospital, Ghent, Belgium
| | - Alfred Meurs
- Department of Neurology, Ghent University Hospital, Ghent, Belgium
| | - Dirk Van Roost
- Department of Neurosurgery, Ghent University Hospital, Ghent, Belgium
| | - Paul Boon
- Department of Neurology, Ghent University Hospital, Ghent, Belgium
| | - Marc M Van Hulle
- Laboratory for Neuro- and Psychophysiology, KU Leuven, Leuven, Belgium
- Leuven Brain Institute (LBI), Leuven, Belgium
- Leuven Institute for Artificial Intelligence (Leuven.AI), Leuven, Belgium
| |
Collapse
|
13
|
Ivucic D, Dexheimer M, Soroush PZ, Ries S, Shih J, Krusienski DJ, Schultz T. Speech Sensitivity Analysis of Spatially Distributed Brain Areas Using Stereotactic EEG. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40039855 DOI: 10.1109/embc53108.2024.10782208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Electrocorticographic (ECoG) activity recorded from the speech cortex has been extensively characterized and used in the development of speech neuroprosthetics. A recent shift in clinical brain monitoring from ECoG to stereotactic electroencephalography (sEEG) provides the opportunity to examine the role of deeper brain structures in speech processing. This study investigates spectro-temporal brain patterns generated during a speech task using sEEG data from five epilepsy patients. The analysis shows significant correlations in left and right temporal and motor regions, consistent with prior research in ECoG. Furthermore, correlation effects in rostral frontal areas are observed. A time lag analysis demonstrates distinct and functionally plausible activation patterns. The results further support the viability of sEEG for studying speech processes and provide insights into the involvement of spatially distributed, deeper brain areas.
Collapse
|
14
|
Wu X, Wellington S, Fu Z, Zhang D. Speech decoding from stereo-electroencephalography (sEEG) signals using advanced deep learning methods. J Neural Eng 2024; 21:036055. [PMID: 38885688 DOI: 10.1088/1741-2552/ad593a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 06/17/2024] [Indexed: 06/20/2024]
Abstract
Objective.Brain-computer interfaces (BCIs) are technologies that bypass damaged or disrupted neural pathways and directly decode brain signals to perform intended actions. BCIs for speech have the potential to restore communication by decoding the intended speech directly. Many studies have demonstrated promising results using invasive micro-electrode arrays and electrocorticography. However, the use of stereo-electroencephalography (sEEG) for speech decoding has not been fully recognized.Approach.In this research, recently released sEEG data were used to decode Dutch words spoken by epileptic participants. We decoded speech waveforms from sEEG data using advanced deep-learning methods. Three methods were implemented: a linear regression method, an recurrent neural network (RNN)-based sequence-to-sequence model (RNN), and a transformer model.Main results.Our RNN and transformer models outperformed the linear regression significantly, while no significant difference was found between the two deep-learning methods. Further investigation on individual electrodes showed that the same decoding result can be obtained using only a few of the electrodes.Significance.This study demonstrated that decoding speech from sEEG signals is possible, and the location of the electrodes is critical to the decoding performance.
Collapse
Affiliation(s)
- Xiaolong Wu
- Department of Electronic and Electrical Engineering, University of Bath, Bath, United Kingdom
| | - Scott Wellington
- Department of Electronic and Electrical Engineering, University of Bath, Bath, United Kingdom
| | - Zhichun Fu
- Department of Electronic and Electrical Engineering, University of Bath, Bath, United Kingdom
| | - Dingguo Zhang
- Department of Electronic and Electrical Engineering, University of Bath, Bath, United Kingdom
| |
Collapse
|
15
|
Wandelt SK, Bjånes DA, Pejsa K, Lee B, Liu C, Andersen RA. Representation of internal speech by single neurons in human supramarginal gyrus. Nat Hum Behav 2024; 8:1136-1149. [PMID: 38740984 PMCID: PMC11199147 DOI: 10.1038/s41562-024-01867-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 03/16/2024] [Indexed: 05/16/2024]
Abstract
Speech brain-machine interfaces (BMIs) translate brain signals into words or audio outputs, enabling communication for people having lost their speech abilities due to diseases or injury. While important advances in vocalized, attempted and mimed speech decoding have been achieved, results for internal speech decoding are sparse and have yet to achieve high functionality. Notably, it is still unclear from which brain areas internal speech can be decoded. Here two participants with tetraplegia with implanted microelectrode arrays located in the supramarginal gyrus (SMG) and primary somatosensory cortex (S1) performed internal and vocalized speech of six words and two pseudowords. In both participants, we found significant neural representation of internal and vocalized speech, at the single neuron and population level in the SMG. From recorded population activity in the SMG, the internally spoken and vocalized words were significantly decodable. In an offline analysis, we achieved average decoding accuracies of 55% and 24% for each participant, respectively (chance level 12.5%), and during an online internal speech BMI task, we averaged 79% and 23% accuracy, respectively. Evidence of shared neural representations between internal speech, word reading and vocalized speech processes was found in participant 1. SMG represented words as well as pseudowords, providing evidence for phonetic encoding. Furthermore, our decoder achieved high classification with multiple internal speech strategies (auditory imagination/visual imagination). Activity in S1 was modulated by vocalized but not internal speech in both participants, suggesting no articulator movements of the vocal tract occurred during internal speech production. This work represents a proof-of-concept for a high-performance internal speech BMI.
Collapse
Affiliation(s)
- Sarah K Wandelt
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
- T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA.
| | - David A Bjånes
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA
- Rancho Los Amigos National Rehabilitation Center, Downey, CA, USA
| | - Kelsie Pejsa
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA
| | - Brian Lee
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA
- USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA, USA
| | - Charles Liu
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Rancho Los Amigos National Rehabilitation Center, Downey, CA, USA
- Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA
- USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA, USA
| | - Richard A Andersen
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
16
|
Komeiji S, Mitsuhashi T, Iimura Y, Suzuki H, Sugano H, Shinoda K, Tanaka T. Feasibility of decoding covert speech in ECoG with a Transformer trained on overt speech. Sci Rep 2024; 14:11491. [PMID: 38769115 PMCID: PMC11106343 DOI: 10.1038/s41598-024-62230-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 05/15/2024] [Indexed: 05/22/2024] Open
Abstract
Several attempts for speech brain-computer interfacing (BCI) have been made to decode phonemes, sub-words, words, or sentences using invasive measurements, such as the electrocorticogram (ECoG), during auditory speech perception, overt speech, or imagined (covert) speech. Decoding sentences from covert speech is a challenging task. Sixteen epilepsy patients with intracranially implanted electrodes participated in this study, and ECoGs were recorded during overt speech and covert speech of eight Japanese sentences, each consisting of three tokens. In particular, Transformer neural network model was applied to decode text sentences from covert speech, which was trained using ECoGs obtained during overt speech. We first examined the proposed Transformer model using the same task for training and testing, and then evaluated the model's performance when trained with overt task for decoding covert speech. The Transformer model trained on covert speech achieved an average token error rate (TER) of 46.6% for decoding covert speech, whereas the model trained on overt speech achieved a TER of 46.3% ( p > 0.05 ; d = 0.07 ) . Therefore, the challenge of collecting training data for covert speech can be addressed using overt speech. The performance of covert speech can improve by employing several overt speeches.
Collapse
Affiliation(s)
- Shuji Komeiji
- Department of Electronic and Information Engineering, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo, 184-8588, Japan
| | - Takumi Mitsuhashi
- Department of Neurosurgery, Juntendo University School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo, 113-8421, Japan
| | - Yasushi Iimura
- Department of Neurosurgery, Juntendo University School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo, 113-8421, Japan
| | - Hiroharu Suzuki
- Department of Neurosurgery, Juntendo University School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo, 113-8421, Japan
| | - Hidenori Sugano
- Department of Neurosurgery, Juntendo University School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo, 113-8421, Japan
| | - Koichi Shinoda
- Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8550, Japan
| | - Toshihisa Tanaka
- Department of Electronic and Information Engineering, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo, 184-8588, Japan.
| |
Collapse
|
17
|
Naddaf M. Mind-reading devices are revealing the brain's secrets. Nature 2024; 626:706-708. [PMID: 38378830 DOI: 10.1038/d41586-024-00481-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
|
18
|
Wang R, Chen X, Khalilian-Gourtani A, Yu L, Dugan P, Friedman D, Doyle W, Devinsky O, Wang Y, Flinker A. Distributed feedforward and feedback cortical processing supports human speech production. Proc Natl Acad Sci U S A 2023; 120:e2300255120. [PMID: 37819985 PMCID: PMC10589651 DOI: 10.1073/pnas.2300255120] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 07/22/2023] [Indexed: 10/13/2023] Open
Abstract
Speech production is a complex human function requiring continuous feedforward commands together with reafferent feedback processing. These processes are carried out by distinct frontal and temporal cortical networks, but the degree and timing of their recruitment and dynamics remain poorly understood. We present a deep learning architecture that translates neural signals recorded directly from the cortex to an interpretable representational space that can reconstruct speech. We leverage learned decoding networks to disentangle feedforward vs. feedback processing. Unlike prevailing models, we find a mixed cortical architecture in which frontal and temporal networks each process both feedforward and feedback information in tandem. We elucidate the timing of feedforward and feedback-related processing by quantifying the derived receptive fields. Our approach provides evidence for a surprisingly mixed cortical architecture of speech circuitry together with decoding advances that have important implications for neural prosthetics.
Collapse
Affiliation(s)
- Ran Wang
- Electrical and Computer Engineering Department, New York University, New York, NY11201
| | - Xupeng Chen
- Electrical and Computer Engineering Department, New York University, New York, NY11201
| | | | - Leyao Yu
- Neurology Department, New York University, New York, NY10016
- Biomedical Engineering Department, New York University, New York, NY11201
| | - Patricia Dugan
- Neurology Department, New York University, New York, NY10016
| | - Daniel Friedman
- Neurology Department, New York University, New York, NY10016
| | - Werner Doyle
- Neurosurgery Department, New York University, New York, NY10016
| | - Orrin Devinsky
- Neurology Department, New York University, New York, NY10016
| | - Yao Wang
- Electrical and Computer Engineering Department, New York University, New York, NY11201
- Biomedical Engineering Department, New York University, New York, NY11201
| | - Adeen Flinker
- Neurology Department, New York University, New York, NY10016
- Biomedical Engineering Department, New York University, New York, NY11201
| |
Collapse
|
19
|
Berezutskaya J, Freudenburg ZV, Vansteensel MJ, Aarnoutse EJ, Ramsey NF, van Gerven MAJ. Direct speech reconstruction from sensorimotor brain activity with optimized deep learning models. J Neural Eng 2023; 20:056010. [PMID: 37467739 PMCID: PMC10510111 DOI: 10.1088/1741-2552/ace8be] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 07/12/2023] [Accepted: 07/19/2023] [Indexed: 07/21/2023]
Abstract
Objective.Development of brain-computer interface (BCI) technology is key for enabling communication in individuals who have lost the faculty of speech due to severe motor paralysis. A BCI control strategy that is gaining attention employs speech decoding from neural data. Recent studies have shown that a combination of direct neural recordings and advanced computational models can provide promising results. Understanding which decoding strategies deliver best and directly applicable results is crucial for advancing the field.Approach.In this paper, we optimized and validated a decoding approach based on speech reconstruction directly from high-density electrocorticography recordings from sensorimotor cortex during a speech production task.Main results.We show that (1) dedicated machine learning optimization of reconstruction models is key for achieving the best reconstruction performance; (2) individual word decoding in reconstructed speech achieves 92%-100% accuracy (chance level is 8%); (3) direct reconstruction from sensorimotor brain activity produces intelligible speech.Significance.These results underline the need for model optimization in achieving best speech decoding results and highlight the potential that reconstruction-based speech decoding from sensorimotor cortex can offer for development of next-generation BCI technology for communication.
Collapse
Affiliation(s)
- Julia Berezutskaya
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
- Donders Center for Brain, Cognition and Behaviour, Nijmegen 6525 GD, The Netherlands
| | - Zachary V Freudenburg
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Mariska J Vansteensel
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Erik J Aarnoutse
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Nick F Ramsey
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Marcel A J van Gerven
- Donders Center for Brain, Cognition and Behaviour, Nijmegen 6525 GD, The Netherlands
| |
Collapse
|
20
|
Chen X, Wang R, Khalilian-Gourtani A, Yu L, Dugan P, Friedman D, Doyle W, Devinsky O, Wang Y, Flinker A. A Neural Speech Decoding Framework Leveraging Deep Learning and Speech Synthesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.16.558028. [PMID: 37745380 PMCID: PMC10516019 DOI: 10.1101/2023.09.16.558028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Decoding human speech from neural signals is essential for brain-computer interface (BCI) technologies restoring speech function in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity, and high dimensionality, and the limited publicly available source code. Here, we present a novel deep learning-based neural speech decoding framework that includes an ECoG Decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable Speech Synthesizer that maps speech parameters to spectrograms. We develop a companion audio-to-audio auto-encoder consisting of a Speech Encoder and the same Speech Synthesizer to generate reference speech parameters to facilitate the ECoG Decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Among three neural network architectures for the ECoG Decoder, the 3D ResNet model has the best decoding performance (PCC=0.804) in predicting the original speech spectrogram, closely followed by the SWIN model (PCC=0.796). Our experimental results show that our models can decode speech with high correlation even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. We successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with speech deficits resulting from left hemisphere damage. Further, we use an occlusion analysis to identify cortical regions contributing to speech decoding across our models. Finally, we provide open-source code for our two-stage training pipeline along with associated preprocessing and visualization tools to enable reproducible research and drive research across the speech science and prostheses communities.
Collapse
|
21
|
Thomas TM, Singh A, Bullock LP, Liang D, Morse CW, Scherschligt X, Seymour JP, Tandon N. Decoding articulatory and phonetic components of naturalistic continuous speech from the distributed language network. J Neural Eng 2023; 20:046030. [PMID: 37487487 DOI: 10.1088/1741-2552/ace9fb] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 07/24/2023] [Indexed: 07/26/2023]
Abstract
Objective.The speech production network relies on a widely distributed brain network. However, research and development of speech brain-computer interfaces (speech-BCIs) has typically focused on decoding speech only from superficial subregions readily accessible by subdural grid arrays-typically placed over the sensorimotor cortex. Alternatively, the technique of stereo-electroencephalography (sEEG) enables access to distributed brain regions using multiple depth electrodes with lower surgical risks, especially in patients with brain injuries resulting in aphasia and other speech disorders.Approach.To investigate the decoding potential of widespread electrode coverage in multiple cortical sites, we used a naturalistic continuous speech production task. We obtained neural recordings using sEEG from eight participants while they read aloud sentences. We trained linear classifiers to decode distinct speech components (articulatory components and phonemes) solely based on broadband gamma activity and evaluated the decoding performance using nested five-fold cross-validation.Main Results.We achieved an average classification accuracy of 18.7% across 9 places of articulation (e.g. bilabials, palatals), 26.5% across 5 manner of articulation (MOA) labels (e.g. affricates, fricatives), and 4.81% across 38 phonemes. The highest classification accuracies achieved with a single large dataset were 26.3% for place of articulation, 35.7% for MOA, and 9.88% for phonemes. Electrodes that contributed high decoding power were distributed across multiple sulcal and gyral sites in both dominant and non-dominant hemispheres, including ventral sensorimotor, inferior frontal, superior temporal, and fusiform cortices. Rather than finding a distinct cortical locus for each speech component, we observed neural correlates of both articulatory and phonetic components in multiple hubs of a widespread language production network.Significance.These results reveal the distributed cortical representations whose activity can enable decoding speech components during continuous speech through the use of this minimally invasive recording method, elucidating language neurobiology and neural targets for future speech-BCIs.
Collapse
Affiliation(s)
- Tessy M Thomas
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - Aditya Singh
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - Latané P Bullock
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - Daniel Liang
- Department of Computer Science, Rice University, Houston, TX 77005, United States of America
| | - Cale W Morse
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - Xavier Scherschligt
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - John P Seymour
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Department of Electrical & Computer Engineering, Rice University, Houston, TX 77005, United States of America
| | - Nitin Tandon
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Memorial Hermann Hospital, Texas Medical Center, Houston, TX 77030, United States of America
| |
Collapse
|
22
|
Natraj N, Seko S, Abiri R, Yan H, Graham Y, Tu-Chan A, Chang EF, Ganguly K. Flexible regulation of representations on a drifting manifold enables long-term stable complex neuroprosthetic control. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.11.551770. [PMID: 37645922 PMCID: PMC10462094 DOI: 10.1101/2023.08.11.551770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
The nervous system needs to balance the stability of neural representations with plasticity. It is unclear what is the representational stability of simple actions, particularly those that are well-rehearsed in humans, and how it changes in new contexts. Using an electrocorticography brain-computer interface (BCI), we found that the mesoscale manifold and relative representational distances for a repertoire of simple imagined movements were remarkably stable. Interestingly, however, the manifold's absolute location demonstrated day-to-day drift. Strikingly, representational statistics, especially variance, could be flexibly regulated to increase discernability during BCI control without somatotopic changes. Discernability strengthened with practice and was specific to the BCI, demonstrating remarkable contextual specificity. Accounting for drift, and leveraging the flexibility of representations, allowed neuroprosthetic control of a robotic arm and hand for over 7 months without recalibration. Our study offers insight into how electrocorticography can both track representational statistics across long periods and allow long-term complex neuroprosthetic control.
Collapse
Affiliation(s)
- Nikhilesh Natraj
- Dept. of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California, USA
- UCSF - Veteran Affairs Medical Center, San Francisco, California, USA
| | - Sarah Seko
- Dept. of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California, USA
- UCSF - Veteran Affairs Medical Center, San Francisco, California, USA
| | - Reza Abiri
- Electrical, Computer and Biomedical Engineering, University of Rhode Island, Rhode Island, USA
| | - Hongyi Yan
- Dept. of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California, USA
- UCSF - Veteran Affairs Medical Center, San Francisco, California, USA
| | - Yasmin Graham
- Dept. of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California, USA
- UCSF - Veteran Affairs Medical Center, San Francisco, California, USA
| | - Adelyn Tu-Chan
- Dept. of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California, USA
- UCSF - Veteran Affairs Medical Center, San Francisco, California, USA
| | - Edward F Chang
- Department of Neurological Surgery, Weill Institute for Neuroscience, University of California-San Francisco, San Francisco, California, USA
| | - Karunesh Ganguly
- Dept. of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California, USA
- UCSF - Veteran Affairs Medical Center, San Francisco, California, USA
| |
Collapse
|
23
|
Meng K, Goodarzy F, Kim E, Park YJ, Kim JS, Cook MJ, Chung CK, Grayden DB. Continuous synthesis of artificial speech sounds from human cortical surface recordings during silent speech production. J Neural Eng 2023; 20:046019. [PMID: 37459853 DOI: 10.1088/1741-2552/ace7f6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 07/17/2023] [Indexed: 07/28/2023]
Abstract
Objective. Brain-computer interfaces can restore various forms of communication in paralyzed patients who have lost their ability to articulate intelligible speech. This study aimed to demonstrate the feasibility of closed-loop synthesis of artificial speech sounds from human cortical surface recordings during silent speech production.Approach. Ten participants with intractable epilepsy were temporarily implanted with intracranial electrode arrays over cortical surfaces. A decoding model that predicted audible outputs directly from patient-specific neural feature inputs was trained during overt word reading and immediately tested with overt, mimed and imagined word reading. Predicted outputs were later assessed objectively against corresponding voice recordings and subjectively through human perceptual judgments.Main results. Artificial speech sounds were successfully synthesized during overt and mimed utterances by two participants with some coverage of the precentral gyrus. About a third of these sounds were correctly identified by naïve listeners in two-alternative forced-choice tasks. A similar outcome could not be achieved during imagined utterances by any of the participants. However, neural feature contribution analyses suggested the presence of exploitable activation patterns during imagined speech in the postcentral gyrus and the superior temporal gyrus. In future work, a more comprehensive coverage of cortical surfaces, including posterior parts of the middle frontal gyrus and the inferior frontal gyrus, could improve synthesis performance during imagined speech.Significance.As the field of speech neuroprostheses is rapidly moving toward clinical trials, this study addressed important considerations about task instructions and brain coverage when conducting research on silent speech with non-target participants.
Collapse
Affiliation(s)
- Kevin Meng
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
| | - Farhad Goodarzy
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| | - EuiYoung Kim
- Interdisciplinary Program in Neuroscience, Seoul National University, Seoul, Republic of Korea
| | - Ye Jin Park
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Republic of Korea
| | - June Sic Kim
- Research Institute of Basic Sciences, Seoul National University, Seoul, Republic of Korea
| | - Mark J Cook
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| | - Chun Kee Chung
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Republic of Korea
- Department of Neurosurgery, Seoul National University Hospital, Seoul, Republic of Korea
| | - David B Grayden
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
24
|
Le Godais G, Roussel P, Bocquelet F, Aubert M, Kahane P, Chabardès S, Yvert B. Overt speech decoding from cortical activity: a comparison of different linear methods. Front Hum Neurosci 2023; 17:1124065. [PMID: 37425292 PMCID: PMC10326283 DOI: 10.3389/fnhum.2023.1124065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 05/30/2023] [Indexed: 07/11/2023] Open
Abstract
Introduction Speech BCIs aim at reconstructing speech in real time from ongoing cortical activity. Ideal BCIs would need to reconstruct speech audio signal frame by frame on a millisecond-timescale. Such approaches require fast computation. In this respect, linear decoder are good candidates and have been widely used in motor BCIs. Yet, they have been very seldomly studied for speech reconstruction, and never for reconstruction of articulatory movements from intracranial activity. Here, we compared vanilla linear regression, ridge-regularized linear regressions, and partial least squares regressions for offline decoding of overt speech from cortical activity. Methods Two decoding paradigms were investigated: (1) direct decoding of acoustic vocoder features of speech, and (2) indirect decoding of vocoder features through an intermediate articulatory representation chained with a real-time-compatible DNN-based articulatory-to-acoustic synthesizer. Participant's articulatory trajectories were estimated from an electromagnetic-articulography dataset using dynamic time warping. The accuracy of the decoders was evaluated by computing correlations between original and reconstructed features. Results We found that similar performance was achieved by all linear methods well above chance levels, albeit without reaching intelligibility. Direct and indirect methods achieved comparable performance, with an advantage for direct decoding. Discussion Future work will address the development of an improved neural speech decoder compatible with fast frame-by-frame speech reconstruction from ongoing activity at a millisecond timescale.
Collapse
Affiliation(s)
- Gaël Le Godais
- Univ. Grenoble Alpes, INSERM, U1216, Grenoble Institut Neurosciences, Grenoble, France
| | - Philémon Roussel
- Univ. Grenoble Alpes, INSERM, U1216, Grenoble Institut Neurosciences, Grenoble, France
| | - Florent Bocquelet
- Univ. Grenoble Alpes, INSERM, U1216, Grenoble Institut Neurosciences, Grenoble, France
| | - Marc Aubert
- Univ. Grenoble Alpes, INSERM, U1216, Grenoble Institut Neurosciences, Grenoble, France
| | - Philippe Kahane
- Univ. Grenoble Alpes, INSERM, U1216, Grenoble Institut Neurosciences, Grenoble, France
- CHU Grenoble Alpes, Department of Neurology, Grenoble, France
| | - Stéphan Chabardès
- Univ. Grenoble Alpes, INSERM, U1216, Grenoble Institut Neurosciences, Grenoble, France
- Univ. Grenoble Alpes, CHU Grenoble Alpes, Clinatec, Grenoble, France
| | - Blaise Yvert
- Univ. Grenoble Alpes, INSERM, U1216, Grenoble Institut Neurosciences, Grenoble, France
| |
Collapse
|
25
|
Simistira Liwicki F, Gupta V, Saini R, De K, Abid N, Rakesh S, Wellington S, Wilson H, Liwicki M, Eriksson J. Bimodal electroencephalography-functional magnetic resonance imaging dataset for inner-speech recognition. Sci Data 2023; 10:378. [PMID: 37311807 DOI: 10.1038/s41597-023-02286-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 06/01/2023] [Indexed: 06/15/2023] Open
Abstract
The recognition of inner speech, which could give a 'voice' to patients that have no ability to speak or move, is a challenge for brain-computer interfaces (BCIs). A shortcoming of the available datasets is that they do not combine modalities to increase the performance of inner speech recognition. Multimodal datasets of brain data enable the fusion of neuroimaging modalities with complimentary properties, such as the high spatial resolution of functional magnetic resonance imaging (fMRI) and the temporal resolution of electroencephalography (EEG), and therefore are promising for decoding inner speech. This paper presents the first publicly available bimodal dataset containing EEG and fMRI data acquired nonsimultaneously during inner-speech production. Data were obtained from four healthy, right-handed participants during an inner-speech task with words in either a social or numerical category. Each of the 8-word stimuli were assessed with 40 trials, resulting in 320 trials in each modality for each participant. The aim of this work is to provide a publicly available bimodal dataset on inner speech, contributing towards speech prostheses.
Collapse
Affiliation(s)
- Foteini Simistira Liwicki
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden.
| | - Vibha Gupta
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden
| | - Rajkumar Saini
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden
| | - Kanjar De
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden
| | - Nosheen Abid
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden
| | - Sumit Rakesh
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden
| | | | - Holly Wilson
- University of Bath, Department of Computer Science, Bath, UK
| | - Marcus Liwicki
- Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems LAB, Luleå, Sweden
| | - Johan Eriksson
- Umeå University, Department of Integrative Medical Biology (IMB) and Umeå Center for Functional Brain Imaging (UFBI), Umeå, Sweden
| |
Collapse
|
26
|
Soroush PZ, Herff C, Ries SK, Shih JJ, Schultz T, Krusienski DJ. The nested hierarchy of overt, mouthed, and imagined speech activity evident in intracranial recordings. Neuroimage 2023; 269:119913. [PMID: 36731812 DOI: 10.1016/j.neuroimage.2023.119913] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 01/05/2023] [Accepted: 01/29/2023] [Indexed: 02/01/2023] Open
Abstract
Recent studies have demonstrated that it is possible to decode and synthesize various aspects of acoustic speech directly from intracranial measurements of electrophysiological brain activity. In order to continue progressing toward the development of a practical speech neuroprosthesis for the individuals with speech impairments, better understanding and modeling of imagined speech processes are required. The present study uses intracranial brain recordings from participants that performed a speaking task with trials consisting of overt, mouthed, and imagined speech modes, representing various degrees of decreasing behavioral output. Speech activity detection models are constructed using spatial, spectral, and temporal brain activity features, and the features and model performances are characterized and compared across the three degrees of behavioral output. The results indicate the existence of a hierarchy in which the relevant channels for the lower behavioral output modes form nested subsets of the relevant channels from the higher behavioral output modes. This provides important insights for the elusive goal of developing more effective imagined speech decoding models with respect to the better-established overt speech decoding counterparts.
Collapse
|
27
|
Abrego AM, Khan W, Wright CE, Islam MR, Ghajar MH, Bai X, Tandon N, Seymour JP. Sensing local field potentials with a directional and scalable depth electrode array. J Neural Eng 2023; 20:016041. [PMID: 36630716 DOI: 10.1088/1741-2552/acb230] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 01/11/2023] [Indexed: 01/13/2023]
Abstract
Objective. A variety of electrophysiology tools are available to the neurosurgeon for diagnosis, functional therapy, and neural prosthetics. However, no tool can currently address these three critical needs: (a) access to all cortical regions in a minimally invasive manner; (b) recordings with microscale, mesoscale, and macroscale resolutions simultaneously; and (c) access to spatially distant multiple brain regions that constitute distributed cognitive networks.Approach.We modeled, designed, and demonstrated a novel device for recording local field potentials (LFPs) with the form factor of a stereo-electroencephalographic electrode and combined with radially distributed microelectrodes.Main results. Electro-quasistatic models demonstrate that the lead body amplifies and shields LFP sources based on direction, enablingdirectional sensitivity andscalability, referred to as thedirectional andscalable (DISC) array.In vivo,DISC demonstrated significantly improved signal-to-noise ratio, directional sensitivity, and decoding accuracy from rat barrel cortex recordings during whisker stimulation. Critical for future translation, DISC demonstrated a higher signal to noise ratio (SNR) than virtual ring electrodes and a noise floor approaching that of large ring electrodes in an unshielded environment after common average referencing. DISC also revealed independent, stereoscopic current source density measures whose direction was verified after histology.Significance. Directional sensitivity of LFPs may significantly improve brain-computer interfaces and many diagnostic procedures, including epilepsy foci detection and deep brain targeting.
Collapse
Affiliation(s)
- Amada M Abrego
- Department of Neurosurgery, University of Texas Health Science Center, Houston, TX 77030, United States of America
| | - Wasif Khan
- Department of Neurosurgery, University of Texas Health Science Center, Houston, TX 77030, United States of America
| | - Christopher E Wright
- Department of Neurosurgery, University of Texas Health Science Center, Houston, TX 77030, United States of America
- Department of Bioengineering, Rice University, Houston, TX 77030, United States of America
| | - M Rabiul Islam
- Department of Neurosurgery, University of Texas Health Science Center, Houston, TX 77030, United States of America
| | - Mohammad H Ghajar
- Department of Neurosurgery, University of Texas Health Science Center, Houston, TX 77030, United States of America
| | - Xiaokang Bai
- Department of Neurosurgery, University of Texas Health Science Center, Houston, TX 77030, United States of America
| | - Nitin Tandon
- Department of Neurosurgery, University of Texas Health Science Center, Houston, TX 77030, United States of America
| | - John P Seymour
- Department of Neurosurgery, University of Texas Health Science Center, Houston, TX 77030, United States of America
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77030, United States of America
| |
Collapse
|
28
|
[Artificial intelligence and ethics in healthcare-balancing act or symbiosis?]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2023; 66:176-183. [PMID: 36650296 PMCID: PMC9892090 DOI: 10.1007/s00103-022-03653-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 12/19/2022] [Indexed: 01/19/2023]
Abstract
Artificial intelligence (AI) is becoming increasingly important in healthcare. This development triggers serious concerns that can be summarized by six major "worst-case scenarios". From AI spreading disinformation and propaganda, to a potential new arms race between major powers, to a possible rule of algorithms ("algocracy") based on biased gatekeeper intelligence, the real dangers of an uncontrolled development of AI are by no means to be underestimated, especially in the health sector. However, fear of AI could cause humanity to miss the opportunity to positively shape the development of our society together with an AI that is friendly to us.Use cases in healthcare play a primary role in this discussion, as both the risks and the opportunities of new AI-based systems become particularly clear here. For example, would older people with dementia (PWD) be allowed to entrust aspects of their autonomy to AI-based assistance systems so that they may continue to independently manage other aspects of their daily lives? In this paper, we argue that the classic balancing act between the dangers and opportunities of AI in healthcare can be at least partially overcome by taking a long-term ethical approach toward a symbiotic relationship between humans and AI. We exemplify this approach by showcasing our I‑CARE system, an AI-based recommendation system for tertiary prevention of dementia. This system has been in development since 2015 as the I‑CARE Project at the University of Bremen, where it is still being researched today.
Collapse
|
29
|
Verwoert M, Ottenhoff MC, Goulis S, Colon AJ, Wagner L, Tousseyn S, van Dijk JP, Kubben PL, Herff C. Dataset of Speech Production in intracranial.Electroencephalography. Sci Data 2022; 9:434. [PMID: 35869138 PMCID: PMC9307753 DOI: 10.1038/s41597-022-01542-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 07/08/2022] [Indexed: 11/28/2022] Open
Abstract
Speech production is an intricate process involving a large number of muscles and cognitive processes. The neural processes underlying speech production are not completely understood. As speech is a uniquely human ability, it can not be investigated in animal models. High-fidelity human data can only be obtained in clinical settings and is therefore not easily available to all researchers. Here, we provide a dataset of 10 participants reading out individual words while we measured intracranial EEG from a total of 1103 electrodes. The data, with its high temporal resolution and coverage of a large variety of cortical and sub-cortical brain regions, can help in understanding the speech production process better. Simultaneously, the data can be used to test speech decoding and synthesis approaches from neural data to develop speech Brain-Computer Interfaces and speech neuroprostheses. Measurement(s) | Brain activity | Technology Type(s) | Stereotactic electroencephalography | Sample Characteristic - Organism | Homo sapiens | Sample Characteristic - Environment | Epilepsy monitoring center | Sample Characteristic - Location | The Netherlands |
Collapse
|
30
|
Petrosyan A, Voskoboinikov A, Sukhinin D, Makarova A, Skalnaya A, Arkhipova N, Sinkin M, Ossadtchi A. Speech decoding from a small set of spatially segregated minimally invasive intracranial EEG electrodes with a compact and interpretable neural network. J Neural Eng 2022; 19. [PMID: 36356309 DOI: 10.1088/1741-2552/aca1e1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 11/10/2022] [Indexed: 11/12/2022]
Abstract
Objective. Speech decoding, one of the most intriguing brain-computer interface applications, opens up plentiful opportunities from rehabilitation of patients to direct and seamless communication between human species. Typical solutions rely on invasive recordings with a large number of distributed electrodes implanted through craniotomy. Here we explored the possibility of creating speech prosthesis in a minimally invasive setting with a small number of spatially segregated intracranial electrodes.Approach. We collected one hour of data (from two sessions) in two patients implanted with invasive electrodes. We then used only the contacts that pertained to a single stereotactic electroencephalographic (sEEG) shaft or an electrocorticographic (ECoG) stripe to decode neural activity into 26 words and one silence class. We employed a compact convolutional network-based architecture whose spatial and temporal filter weights allow for a physiologically plausible interpretation.Mainresults. We achieved on average 55% accuracy using only six channels of data recorded with a single minimally invasive sEEG electrode in the first patient and 70% accuracy using only eight channels of data recorded for a single ECoG strip in the second patient in classifying 26+1 overtly pronounced words. Our compact architecture did not require the use of pre-engineered features, learned fast and resulted in a stable, interpretable and physiologically meaningful decision rule successfully operating over a contiguous dataset collected during a different time interval than that used for training. Spatial characteristics of the pivotal neuronal populations corroborate with active and passive speech mapping results and exhibit the inverse space-frequency relationship characteristic of neural activity. Compared to other architectures our compact solution performed on par or better than those recently featured in neural speech decoding literature.Significance. We showcase the possibility of building a speech prosthesis with a small number of electrodes and based on a compact feature engineering free decoder derived from a small amount of training data.
Collapse
Affiliation(s)
- Artur Petrosyan
- Center for Bioelectric Interfaces, Higher School of Economics, Moscow, Russia
| | | | - Dmitrii Sukhinin
- Center for Bioelectric Interfaces, Higher School of Economics, Moscow, Russia
| | - Anna Makarova
- Center for Bioelectric Interfaces, Higher School of Economics, Moscow, Russia
| | | | | | - Mikhail Sinkin
- Moscow State University of Medicine and Dentistry, Scientific Research Institute of First Aid to them. N.V. Sklifosovsky, Moscow, Russia
| | - Alexei Ossadtchi
- Center for Bioelectric Interfaces, Higher School of Economics, Moscow, Russia.,Artificial Intelligence Research Institute, AIRI, Moscow, Russia
| |
Collapse
|
31
|
Valeriani D, Santoro F, Ienca M. The present and future of neural interfaces. Front Neurorobot 2022; 16:953968. [PMID: 36304780 PMCID: PMC9592849 DOI: 10.3389/fnbot.2022.953968] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 07/13/2022] [Indexed: 11/18/2022] Open
Abstract
The 2020's decade will likely witness an unprecedented development and deployment of neurotechnologies for human rehabilitation, personalized use, and cognitive or other enhancement. New materials and algorithms are already enabling active brain monitoring and are allowing the development of biohybrid and neuromorphic systems that can adapt to the brain. Novel brain-computer interfaces (BCIs) have been proposed to tackle a variety of enhancement and therapeutic challenges, from improving decision-making to modulating mood disorders. While these BCIs have generally been developed in an open-loop modality to optimize their internal neural decoders, this decade will increasingly witness their validation in closed-loop systems that are able to continuously adapt to the user's mental states. Therefore, a proactive ethical approach is needed to ensure that these new technological developments go hand in hand with the development of a sound ethical framework. In this perspective article, we summarize recent developments in neural interfaces, ranging from neurohybrid synapses to closed-loop BCIs, and thereby identify the most promising macro-trends in BCI research, such as simulating vs. interfacing the brain, brain recording vs. brain stimulation, and hardware vs. software technology. Particular attention is devoted to central nervous system interfaces, especially those with application in healthcare and human enhancement. Finally, we critically assess the possible futures of neural interfacing and analyze the short- and long-term implications of such neurotechnologies.
Collapse
Affiliation(s)
| | - Francesca Santoro
- Institute for Biological Information Processing - Bioelectronics, IBI-3, Forschungszentrum Juelich, Juelich, Germany
- Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
| | - Marcello Ienca
- College of Humanities, Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland
- *Correspondence: Marcello Ienca
| |
Collapse
|
32
|
Simonyan K, Ehrlich SK, Andersen R, Brumberg J, Guenther F, Hallett M, Howard MA, Millán JDR, Reilly RB, Schultz T, Valeriani D. Brain-Computer Interfaces for Treatment of Focal Dystonia. Mov Disord 2022; 37:1798-1802. [PMID: 35947366 PMCID: PMC9474652 DOI: 10.1002/mds.29178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 06/20/2022] [Accepted: 07/19/2022] [Indexed: 11/12/2022] Open
Abstract
Task-specificity in isolated focal dystonias is a powerful feature that may successfully be targeted with therapeutic brain-computer interfaces. While performing a symptomatic task, the patient actively modulates momentary brain activity (disorder signature) to match activity during an asymptomatic task (target signature), which is expected to translate into symptom reduction.
Collapse
Affiliation(s)
- Kristina Simonyan
- Department of Otolaryngology–Head and Neck Surgery, Massachusetts Eye and Ear and Harvard Medical School, Boston, Massachusetts, USA
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Stefan K. Ehrlich
- Department of Otolaryngology–Head and Neck Surgery, Massachusetts Eye and Ear and Harvard Medical School, Boston, Massachusetts, USA
| | - Richard Andersen
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USA
| | - Jonathan Brumberg
- Department of Speech-Language-Hearing: Sciences & Disorders, University of Kansas, Lawrence, Kansas, USA
| | - Frank Guenther
- Department of Speech, Language, & Hearing Sciences, Boston University, Boston, Massachusetts, USA
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
- Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, USA
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Mark Hallett
- Human Motor Control Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Matthew A. Howard
- Department of Neurosurgery, University of Iowa Carver College of Medicine, Iowa City, Iowa, USA
| | - José del R. Millán
- Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, Texas, USA
- Department of Neurology, University of Texas at Austin, Austin, Texas, USA
| | - Richard B. Reilly
- Center for Biomedical Engineering, Trinity College Institute of Neuroscience, School of Medicine, School of Engineering, Trinity College Dublin and the University of Dublin, Dublin, Ireland
| | - Tanja Schultz
- Faculty 03 Mathematics and Computer Science, University of Bremen, Bremen, Germany
| | - Davide Valeriani
- Department of Otolaryngology–Head and Neck Surgery, Massachusetts Eye and Ear and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
33
|
Cooney C, Folli R, Coyle D. Opportunities, pitfalls and trade-offs in designing protocols for measuring the neural correlates of speech. Neurosci Biobehav Rev 2022; 140:104783. [PMID: 35907491 DOI: 10.1016/j.neubiorev.2022.104783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 07/12/2022] [Accepted: 07/15/2022] [Indexed: 11/25/2022]
Abstract
Decoding speech and speech-related processes directly from the human brain has intensified in studies over recent years as such a decoder has the potential to positively impact people with limited communication capacity due to disease or injury. Additionally, it can present entirely new forms of human-computer interaction and human-machine communication in general and facilitate better neuroscientific understanding of speech processes. Here, we synthesize the literature on neural speech decoding pertaining to how speech decoding experiments have been conducted, coalescing around a necessity for thoughtful experimental design aimed at specific research goals, and robust procedures for evaluating speech decoding paradigms. We examine the use of different modalities for presenting stimuli to participants, methods for construction of paradigms including timings and speech rhythms, and possible linguistic considerations. In addition, novel methods for eliciting naturalistic speech and validating imagined speech task performance in experimental settings are presented based on recent research. We also describe the multitude of terms used to instruct participants on how to produce imagined speech during experiments and propose methods for investigating the effect of these terms on imagined speech decoding. We demonstrate that the range of experimental procedures used in neural speech decoding studies can have unintended consequences which can impact upon the efficacy of the knowledge obtained. The review delineates the strengths and weaknesses of present approaches and poses methodological advances which we anticipate will enhance experimental design, and progress toward the optimal design of movement independent direct speech brain-computer interfaces.
Collapse
Affiliation(s)
- Ciaran Cooney
- Intelligent Systems Research Centre, Ulster University, Derry, UK.
| | - Raffaella Folli
- Institute for Research in Social Sciences, Ulster University, Jordanstown, UK
| | - Damien Coyle
- Intelligent Systems Research Centre, Ulster University, Derry, UK
| |
Collapse
|
34
|
Soroush PZ, Herff C, Ries S, Shih JJ, Schultz T, Krusienski DJ. Contributions of Stereotactic EEG Electrodes in Grey and White Matter to Speech Activity Detection. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:4789-4792. [PMID: 36086071 DOI: 10.1109/embc48229.2022.9871464] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Recent studies have shown it is possible to decode and synthesize speech directly using brain activity recorded from implanted electrodes. While this activity has been extensively examined using electrocorticographic (ECoG) recordings from cortical surface grey matter, stereotactic electroen-cephalography (sEEG) provides comparatively broader coverage and access to deeper brain structures including both grey and white matter. The present study examines the relative and joint contributions of grey and white matter electrodes for speech activity detection in a brain-computer interface.
Collapse
|
35
|
Simistira Liwicki F, Gupta V, Saini R, De K, Liwicki M. Rethinking the Methods and Algorithms for Inner Speech Decoding and Making Them Reproducible. NEUROSCI 2022; 3:226-244. [PMID: 39483370 PMCID: PMC11523721 DOI: 10.3390/neurosci3020017] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 04/12/2022] [Indexed: 11/03/2024] Open
Abstract
This study focuses on the automatic decoding of inner speech using noninvasive methods, such as Electroencephalography (EEG). While inner speech has been a research topic in philosophy and psychology for half a century, recent attempts have been made to decode nonvoiced spoken words by using various brain-computer interfaces. The main shortcomings of existing work are reproducibility and the availability of data and code. In this work, we investigate various methods (using Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU), Long Short-Term Memory Networks (LSTM)) for the detection task of five vowels and six words on a publicly available EEG dataset. The main contributions of this work are (1) subject dependent vs. subject-independent approaches, (2) the effect of different preprocessing steps (Independent Component Analysis (ICA), down-sampling and filtering), and (3) word classification (where we achieve state-of-the-art performance on a publicly available dataset). Overall we achieve a performance accuracy of 35.20% and 29.21% when classifying five vowels and six words, respectively, in a publicly available dataset, using our tuned iSpeech-CNN architecture. All of our code and processed data are publicly available to ensure reproducibility. As such, this work contributes to a deeper understanding and reproducibility of experiments in the area of inner speech detection.
Collapse
Affiliation(s)
- Foteini Simistira Liwicki
- Embedded Intelligent Systems LAB, Machine Learning, Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, 97187 Luleå, Sweden; (V.G.); (R.S.); (K.D.); (M.L.)
| | - Vibha Gupta
- Embedded Intelligent Systems LAB, Machine Learning, Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, 97187 Luleå, Sweden; (V.G.); (R.S.); (K.D.); (M.L.)
| | - Rajkumar Saini
- Embedded Intelligent Systems LAB, Machine Learning, Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, 97187 Luleå, Sweden; (V.G.); (R.S.); (K.D.); (M.L.)
| | - Kanjar De
- Embedded Intelligent Systems LAB, Machine Learning, Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, 97187 Luleå, Sweden; (V.G.); (R.S.); (K.D.); (M.L.)
| | - Marcus Liwicki
- Embedded Intelligent Systems LAB, Machine Learning, Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, 97187 Luleå, Sweden; (V.G.); (R.S.); (K.D.); (M.L.)
| |
Collapse
|
36
|
Wilson BS, Tucci DL, Moses DA, Chang EF, Young NM, Zeng FG, Lesica NA, Bur AM, Kavookjian H, Mussatto C, Penn J, Goodwin S, Kraft S, Wang G, Cohen JM, Ginsburg GS, Dawson G, Francis HW. Harnessing the Power of Artificial Intelligence in Otolaryngology and the Communication Sciences. J Assoc Res Otolaryngol 2022; 23:319-349. [PMID: 35441936 PMCID: PMC9086071 DOI: 10.1007/s10162-022-00846-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 04/02/2022] [Indexed: 02/01/2023] Open
Abstract
Use of artificial intelligence (AI) is a burgeoning field in otolaryngology and the communication sciences. A virtual symposium on the topic was convened from Duke University on October 26, 2020, and was attended by more than 170 participants worldwide. This review presents summaries of all but one of the talks presented during the symposium; recordings of all the talks, along with the discussions for the talks, are available at https://www.youtube.com/watch?v=ktfewrXvEFg and https://www.youtube.com/watch?v=-gQ5qX2v3rg . Each of the summaries is about 2500 words in length and each summary includes two figures. This level of detail far exceeds the brief summaries presented in traditional reviews and thus provides a more-informed glimpse into the power and diversity of current AI applications in otolaryngology and the communication sciences and how to harness that power for future applications.
Collapse
Affiliation(s)
- Blake S. Wilson
- Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710 USA
- Duke Hearing Center, Duke University School of Medicine, Durham, NC 27710 USA
- Department of Electrical & Computer Engineering, Duke University, Durham, NC 27708 USA
- Department of Biomedical Engineering, Duke University, Durham, NC 27708 USA
- Department of Otolaryngology – Head & Neck Surgery, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599 USA
| | - Debara L. Tucci
- Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710 USA
- National Institute On Deafness and Other Communication Disorders, National Institutes of Health, Bethesda, MD 20892 USA
| | - David A. Moses
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA 94143 USA
- UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94117 USA
| | - Edward F. Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA 94143 USA
- UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94117 USA
| | - Nancy M. Young
- Division of Otolaryngology, Ann and Robert H. Lurie Childrens Hospital of Chicago, Chicago, IL 60611 USA
- Department of Otolaryngology - Head and Neck Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL 60611 USA
- Department of Communication, Knowles Hearing Center, Northwestern University, Evanston, IL 60208 USA
| | - Fan-Gang Zeng
- Center for Hearing Research, University of California, Irvine, Irvine, CA 92697 USA
- Department of Anatomy and Neurobiology, University of California, Irvine, Irvine, CA 92697 USA
- Department of Biomedical Engineering, University of California, Irvine, Irvine, CA 92697 USA
- Department of Cognitive Sciences, University of California, Irvine, Irvine, CA 92697 USA
- Department of Otolaryngology – Head and Neck Surgery, University of California, Irvine, CA 92697 USA
| | | | - Andrés M. Bur
- Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Hannah Kavookjian
- Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Caroline Mussatto
- Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Joseph Penn
- Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Sara Goodwin
- Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Shannon Kraft
- Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Guanghui Wang
- Department of Computer Science, Ryerson University, Toronto, ON M5B 2K3 Canada
| | - Jonathan M. Cohen
- Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710 USA
- ENT Department, Kaplan Medical Center, 7661041 Rehovot, Israel
| | - Geoffrey S. Ginsburg
- Department of Biomedical Engineering, Duke University, Durham, NC 27708 USA
- MEDx (Medicine & Engineering at Duke), Duke University, Durham, NC 27708 USA
- Center for Applied Genomics & Precision Medicine, Duke University School of Medicine, Durham, NC 27710 USA
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710 USA
- Department of Pathology, Duke University School of Medicine, Durham, NC 27710 USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710 USA
| | - Geraldine Dawson
- Duke Institute for Brain Sciences, Duke University, Durham, NC 27710 USA
- Duke Center for Autism and Brain Development, Duke University School of Medicine and the Duke Institute for Brain Sciences, NIH Autism Center of Excellence, Durham, NC 27705 USA
- Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC 27701 USA
| | - Howard W. Francis
- Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710 USA
| |
Collapse
|
37
|
Chandler JA, Van der Loos KI, Boehnke S, Beaudry JS, Buchman DZ, Illes J. Brain Computer Interfaces and Communication Disabilities: Ethical, Legal, and Social Aspects of Decoding Speech From the Brain. Front Hum Neurosci 2022; 16:841035. [PMID: 35529778 PMCID: PMC9069963 DOI: 10.3389/fnhum.2022.841035] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 03/03/2022] [Indexed: 11/28/2022] Open
Abstract
A brain-computer interface technology that can decode the neural signals associated with attempted but unarticulated speech could offer a future efficient means of communication for people with severe motor impairments. Recent demonstrations have validated this approach. Here we assume that it will be possible in future to decode imagined (i.e., attempted but unarticulated) speech in people with severe motor impairments, and we consider the characteristics that could maximize the social utility of a BCI for communication. As a social interaction, communication involves the needs and goals of both speaker and listener, particularly in contexts that have significant potential consequences. We explore three high-consequence legal situations in which neurally-decoded speech could have implications: Testimony, where decoded speech is used as evidence; Consent and Capacity, where it may be used as a means of agency and participation such as consent to medical treatment; and Harm, where such communications may be networked or may cause harm to others. We then illustrate how design choices might impact the social and legal acceptability of these technologies.
Collapse
Affiliation(s)
- Jennifer A. Chandler
- Bertram Loeb Research Chair, Faculty of Law, University of Ottawa, Ottawa, ON, Canada
- *Correspondence: Jennifer A. Chandler,
| | | | - Susan Boehnke
- Centre for Neuroscience Studies, Queen’s University, Kingston, ON, Canada
| | - Jonas S. Beaudry
- Institute for Health and Social Policy (IHSP) and Faculty of Law, McGill University, Montreal, QC, Canada
| | - Daniel Z. Buchman
- Centre for Addiction and Mental Health, Dalla Lana School of Public Health, Krembil Research Institute, University of Toronto Joint Centre for Bioethics, Toronto, ON, Canada
| | - Judy Illes
- Division of Neurology, Department of Medicine, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
38
|
Wandelt SK, Kellis S, Bjånes DA, Pejsa K, Lee B, Liu C, Andersen RA. Decoding grasp and speech signals from the cortical grasp circuit in a tetraplegic human. Neuron 2022; 110:1777-1787.e3. [PMID: 35364014 DOI: 10.1016/j.neuron.2022.03.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 02/01/2022] [Accepted: 03/08/2022] [Indexed: 02/04/2023]
Abstract
The cortical grasp network encodes planning and execution of grasps and processes spoken and written aspects of language. High-level cortical areas within this network are attractive implant sites for brain-machine interfaces (BMIs). While a tetraplegic patient performed grasp motor imagery and vocalized speech, neural activity was recorded from the supramarginal gyrus (SMG), ventral premotor cortex (PMv), and somatosensory cortex (S1). In SMG and PMv, five imagined grasps were well represented by firing rates of neuronal populations during visual cue presentation. During motor imagery, these grasps were significantly decodable from all brain areas. During speech production, SMG encoded both spoken grasp types and the names of five colors. Whereas PMv neurons significantly modulated their activity during grasping, SMG's neural population broadly encoded features of both motor imagery and speech. Together, these results indicate that brain signals from high-level areas of the human cortex could be used for grasping and speech BMI applications.
Collapse
Affiliation(s)
- Sarah K Wandelt
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA 91125, USA.
| | - Spencer Kellis
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA 91125, USA; Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA 90033, USA; USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA 90033, USA
| | - David A Bjånes
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA 91125, USA
| | - Kelsie Pejsa
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA 91125, USA
| | - Brian Lee
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA 90033, USA; USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA 90033, USA
| | - Charles Liu
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA 90033, USA; USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA 90033, USA; Rancho Los Amigos National Rehabilitation Center, Downey, CA 90242, USA
| | - Richard A Andersen
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
39
|
Proix T, Delgado Saa J, Christen A, Martin S, Pasley BN, Knight RT, Tian X, Poeppel D, Doyle WK, Devinsky O, Arnal LH, Mégevand P, Giraud AL. Imagined speech can be decoded from low- and cross-frequency intracranial EEG features. Nat Commun 2022; 13:48. [PMID: 35013268 PMCID: PMC8748882 DOI: 10.1038/s41467-021-27725-3] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 12/03/2021] [Indexed: 01/19/2023] Open
Abstract
Reconstructing intended speech from neural activity using brain-computer interfaces holds great promises for people with severe speech production deficits. While decoding overt speech has progressed, decoding imagined speech has met limited success, mainly because the associated neural signals are weak and variable compared to overt speech, hence difficult to decode by learning algorithms. We obtained three electrocorticography datasets from 13 patients, with electrodes implanted for epilepsy evaluation, who performed overt and imagined speech production tasks. Based on recent theories of speech neural processing, we extracted consistent and specific neural features usable for future brain computer interfaces, and assessed their performance to discriminate speech items in articulatory, phonetic, and vocalic representation spaces. While high-frequency activity provided the best signal for overt speech, both low- and higher-frequency power and local cross-frequency contributed to imagined speech decoding, in particular in phonetic and vocalic, i.e. perceptual, spaces. These findings show that low-frequency power and cross-frequency dynamics contain key information for imagined speech decoding.
Collapse
Affiliation(s)
- Timothée Proix
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland.
| | - Jaime Delgado Saa
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Andy Christen
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Stephanie Martin
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Brian N Pasley
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, USA
| | - Robert T Knight
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, USA
- Department of Psychology, University of California, Berkeley, Berkeley, USA
| | - Xing Tian
- Division of Arts and Sciences, New York University Shanghai, Shanghai, China
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
- NYU-ECNU Institute of Brain and Cognitive Science at NYU Shanghai, Shanghai, China
| | - David Poeppel
- Department of Psychology, New York University, New York, NY, USA
- Ernst Strüngmann Institute for Neuroscience, Frankfurt, Germany
| | - Werner K Doyle
- Department of Neurology, New York University Grossman School of Medicine, New York, NY, USA
| | - Orrin Devinsky
- Department of Neurology, New York University Grossman School of Medicine, New York, NY, USA
| | - Luc H Arnal
- Institut de l'Audition, Institut Pasteur, INSERM, F-75012, Paris, France
| | - Pierre Mégevand
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Division of Neurology, Geneva University Hospitals, Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| |
Collapse
|
40
|
Luo S, Rabbani Q, Crone NE. Brain-Computer Interface: Applications to Speech Decoding and Synthesis to Augment Communication. Neurotherapeutics 2022; 19:263-273. [PMID: 35099768 PMCID: PMC9130409 DOI: 10.1007/s13311-022-01190-2] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2022] [Indexed: 01/03/2023] Open
Abstract
Damage or degeneration of motor pathways necessary for speech and other movements, as in brainstem strokes or amyotrophic lateral sclerosis (ALS), can interfere with efficient communication without affecting brain structures responsible for language or cognition. In the worst-case scenario, this can result in the locked in syndrome (LIS), a condition in which individuals cannot initiate communication and can only express themselves by answering yes/no questions with eye blinks or other rudimentary movements. Existing augmentative and alternative communication (AAC) devices that rely on eye tracking can improve the quality of life for people with this condition, but brain-computer interfaces (BCIs) are also increasingly being investigated as AAC devices, particularly when eye tracking is too slow or unreliable. Moreover, with recent and ongoing advances in machine learning and neural recording technologies, BCIs may offer the only means to go beyond cursor control and text generation on a computer, to allow real-time synthesis of speech, which would arguably offer the most efficient and expressive channel for communication. The potential for BCI speech synthesis has only recently been realized because of seminal studies of the neuroanatomical and neurophysiological underpinnings of speech production using intracranial electrocorticographic (ECoG) recordings in patients undergoing epilepsy surgery. These studies have shown that cortical areas responsible for vocalization and articulation are distributed over a large area of ventral sensorimotor cortex, and that it is possible to decode speech and reconstruct its acoustics from ECoG if these areas are recorded with sufficiently dense and comprehensive electrode arrays. In this article, we review these advances, including the latest neural decoding strategies that range from deep learning models to the direct concatenation of speech units. We also discuss state-of-the-art vocoders that are integral in constructing natural-sounding audio waveforms for speech BCIs. Finally, this review outlines some of the challenges ahead in directly synthesizing speech for patients with LIS.
Collapse
Affiliation(s)
- Shiyu Luo
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Qinwan Rabbani
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Nathan E Crone
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|