1
|
Silva AB, Liu JR, Metzger SL, Bhaya-Grossman I, Dougherty ME, Seaton MP, Littlejohn KT, Tu-Chan A, Ganguly K, Moses DA, Chang EF. A bilingual speech neuroprosthesis driven by cortical articulatory representations shared between languages. Nat Biomed Eng 2024:10.1038/s41551-024-01207-5. [PMID: 38769157 DOI: 10.1038/s41551-024-01207-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 04/01/2024] [Indexed: 05/22/2024]
Abstract
Advancements in decoding speech from brain activity have focused on decoding a single language. Hence, the extent to which bilingual speech production relies on unique or shared cortical activity across languages has remained unclear. Here, we leveraged electrocorticography, along with deep-learning and statistical natural-language models of English and Spanish, to record and decode activity from speech-motor cortex of a Spanish-English bilingual with vocal-tract and limb paralysis into sentences in either language. This was achieved without requiring the participant to manually specify the target language. Decoding models relied on shared vocal-tract articulatory representations across languages, which allowed us to build a syllable classifier that generalized across a shared set of English and Spanish syllables. Transfer learning expedited training of the bilingual decoder by enabling neural data recorded in one language to improve decoding in the other language. Overall, our findings suggest shared cortical articulatory representations that persist after paralysis and enable the decoding of multiple languages without the need to train separate language-specific decoders.
Collapse
Affiliation(s)
- Alexander B Silva
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
- University of California, Berkeley - University of California, San Francisco Graduate Program in Bioengineering, Berkeley, CA, USA
| | - Jessie R Liu
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
- University of California, Berkeley - University of California, San Francisco Graduate Program in Bioengineering, Berkeley, CA, USA
| | - Sean L Metzger
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
- University of California, Berkeley - University of California, San Francisco Graduate Program in Bioengineering, Berkeley, CA, USA
| | - Ilina Bhaya-Grossman
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
- University of California, Berkeley - University of California, San Francisco Graduate Program in Bioengineering, Berkeley, CA, USA
| | - Maximilian E Dougherty
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
| | - Margaret P Seaton
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
| | - Kaylo T Littlejohn
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Adelyn Tu-Chan
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Karunesh Ganguly
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - David A Moses
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, USA.
- University of California, Berkeley - University of California, San Francisco Graduate Program in Bioengineering, Berkeley, CA, USA.
| |
Collapse
|
2
|
Wandelt SK, Bjånes DA, Pejsa K, Lee B, Liu C, Andersen RA. Representation of internal speech by single neurons in human supramarginal gyrus. Nat Hum Behav 2024:10.1038/s41562-024-01867-y. [PMID: 38740984 DOI: 10.1038/s41562-024-01867-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 03/16/2024] [Indexed: 05/16/2024]
Abstract
Speech brain-machine interfaces (BMIs) translate brain signals into words or audio outputs, enabling communication for people having lost their speech abilities due to diseases or injury. While important advances in vocalized, attempted and mimed speech decoding have been achieved, results for internal speech decoding are sparse and have yet to achieve high functionality. Notably, it is still unclear from which brain areas internal speech can be decoded. Here two participants with tetraplegia with implanted microelectrode arrays located in the supramarginal gyrus (SMG) and primary somatosensory cortex (S1) performed internal and vocalized speech of six words and two pseudowords. In both participants, we found significant neural representation of internal and vocalized speech, at the single neuron and population level in the SMG. From recorded population activity in the SMG, the internally spoken and vocalized words were significantly decodable. In an offline analysis, we achieved average decoding accuracies of 55% and 24% for each participant, respectively (chance level 12.5%), and during an online internal speech BMI task, we averaged 79% and 23% accuracy, respectively. Evidence of shared neural representations between internal speech, word reading and vocalized speech processes was found in participant 1. SMG represented words as well as pseudowords, providing evidence for phonetic encoding. Furthermore, our decoder achieved high classification with multiple internal speech strategies (auditory imagination/visual imagination). Activity in S1 was modulated by vocalized but not internal speech in both participants, suggesting no articulator movements of the vocal tract occurred during internal speech production. This work represents a proof-of-concept for a high-performance internal speech BMI.
Collapse
Affiliation(s)
- Sarah K Wandelt
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
- T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA.
| | - David A Bjånes
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA
- Rancho Los Amigos National Rehabilitation Center, Downey, CA, USA
| | - Kelsie Pejsa
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA
| | - Brian Lee
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA
- USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA, USA
| | - Charles Liu
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Rancho Los Amigos National Rehabilitation Center, Downey, CA, USA
- Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA
- USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA, USA
| | - Richard A Andersen
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
3
|
Wu H, Cai C, Ming W, Chen W, Zhu Z, Feng C, Jiang H, Zheng Z, Sawan M, Wang T, Zhu J. Speech decoding using cortical and subcortical electrophysiological signals. Front Neurosci 2024; 18:1345308. [PMID: 38486966 PMCID: PMC10937352 DOI: 10.3389/fnins.2024.1345308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 02/12/2024] [Indexed: 03/17/2024] Open
Abstract
Introduction Language impairments often result from severe neurological disorders, driving the development of neural prosthetics utilizing electrophysiological signals to restore comprehensible language. Previous decoding efforts primarily focused on signals from the cerebral cortex, neglecting subcortical brain structures' potential contributions to speech decoding in brain-computer interfaces. Methods In this study, stereotactic electroencephalography (sEEG) was employed to investigate subcortical structures' role in speech decoding. Two native Mandarin Chinese speakers, undergoing sEEG implantation for epilepsy treatment, participated. Participants read Chinese text, with 1-30, 30-70, and 70-150 Hz frequency band powers of sEEG signals extracted as key features. A deep learning model based on long short-term memory assessed the contribution of different brain structures to speech decoding, predicting consonant articulatory place, manner, and tone within single syllable. Results Cortical signals excelled in articulatory place prediction (86.5% accuracy), while cortical and subcortical signals performed similarly for articulatory manner (51.5% vs. 51.7% accuracy). Subcortical signals provided superior tone prediction (58.3% accuracy). The superior temporal gyrus was consistently relevant in speech decoding for consonants and tone. Combining cortical and subcortical inputs yielded the highest prediction accuracy, especially for tone. Discussion This study underscores the essential roles of both cortical and subcortical structures in different aspects of speech decoding.
Collapse
Affiliation(s)
- Hemmings Wu
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Clinical Research Center for Neurological Disease of Zhejiang Province, Hangzhou, China
| | - Chengwei Cai
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Wenjie Ming
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Department of Neurology, Epilepsy Center, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Wangyu Chen
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhoule Zhu
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Chen Feng
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Hongjie Jiang
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhe Zheng
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Mohamad Sawan
- CenBRAIN Lab, School of Engineering, Westlake University, Hangzhou, China
| | - Ting Wang
- School of Foreign Languages, Tongji University, Shanghai, China
- Center for Speech and Language Processing, Tongji University, Shanghai, China
| | - Junming Zhu
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
4
|
Vitória MA, Fernandes FG, van den Boom M, Ramsey N, Raemaekers M. Decoding Single and Paired Phonemes Using 7T Functional MRI. Brain Topogr 2024:10.1007/s10548-024-01034-6. [PMID: 38261272 DOI: 10.1007/s10548-024-01034-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 01/12/2024] [Indexed: 01/24/2024]
Abstract
Several studies have shown that mouth movements related to the pronunciation of individual phonemes are represented in the sensorimotor cortex. This would theoretically allow for brain computer interfaces that are capable of decoding continuous speech by training classifiers based on the activity in the sensorimotor cortex related to the production of individual phonemes. To address this, we investigated the decodability of trials with individual and paired phonemes (pronounced consecutively with one second interval) using activity in the sensorimotor cortex. Fifteen participants pronounced 3 different phonemes and 3 combinations of two of the same phonemes in a 7T functional MRI experiment. We confirmed that support vector machine (SVM) classification of single and paired phonemes was possible. Importantly, by combining classifiers trained on single phonemes, we were able to classify paired phonemes with an accuracy of 53% (33% chance level), demonstrating that activity of isolated phonemes is present and distinguishable in combined phonemes. A SVM searchlight analysis showed that the phoneme representations are widely distributed in the ventral sensorimotor cortex. These findings provide insights about the neural representations of single and paired phonemes. Furthermore, it supports the notion that speech BCI may be feasible based on machine learning algorithms trained on individual phonemes using intracranial electrode grids.
Collapse
Affiliation(s)
- Maria Araújo Vitória
- Brain Center Rudolf Magnus, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Francisco Guerreiro Fernandes
- Brain Center Rudolf Magnus, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Max van den Boom
- Brain Center Rudolf Magnus, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - Nick Ramsey
- Brain Center Rudolf Magnus, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Mathijs Raemaekers
- Brain Center Rudolf Magnus, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands.
| |
Collapse
|
5
|
Chen X, Wang R, Khalilian-Gourtani A, Yu L, Dugan P, Friedman D, Doyle W, Devinsky O, Wang Y, Flinker A. A Neural Speech Decoding Framework Leveraging Deep Learning and Speech Synthesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.16.558028. [PMID: 37745380 PMCID: PMC10516019 DOI: 10.1101/2023.09.16.558028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Decoding human speech from neural signals is essential for brain-computer interface (BCI) technologies restoring speech function in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity, and high dimensionality, and the limited publicly available source code. Here, we present a novel deep learning-based neural speech decoding framework that includes an ECoG Decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable Speech Synthesizer that maps speech parameters to spectrograms. We develop a companion audio-to-audio auto-encoder consisting of a Speech Encoder and the same Speech Synthesizer to generate reference speech parameters to facilitate the ECoG Decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Among three neural network architectures for the ECoG Decoder, the 3D ResNet model has the best decoding performance (PCC=0.804) in predicting the original speech spectrogram, closely followed by the SWIN model (PCC=0.796). Our experimental results show that our models can decode speech with high correlation even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. We successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with speech deficits resulting from left hemisphere damage. Further, we use an occlusion analysis to identify cortical regions contributing to speech decoding across our models. Finally, we provide open-source code for our two-stage training pipeline along with associated preprocessing and visualization tools to enable reproducible research and drive research across the speech science and prostheses communities.
Collapse
|
6
|
Meng K, Goodarzy F, Kim E, Park YJ, Kim JS, Cook MJ, Chung CK, Grayden DB. Continuous synthesis of artificial speech sounds from human cortical surface recordings during silent speech production. J Neural Eng 2023; 20:046019. [PMID: 37459853 DOI: 10.1088/1741-2552/ace7f6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 07/17/2023] [Indexed: 07/28/2023]
Abstract
Objective. Brain-computer interfaces can restore various forms of communication in paralyzed patients who have lost their ability to articulate intelligible speech. This study aimed to demonstrate the feasibility of closed-loop synthesis of artificial speech sounds from human cortical surface recordings during silent speech production.Approach. Ten participants with intractable epilepsy were temporarily implanted with intracranial electrode arrays over cortical surfaces. A decoding model that predicted audible outputs directly from patient-specific neural feature inputs was trained during overt word reading and immediately tested with overt, mimed and imagined word reading. Predicted outputs were later assessed objectively against corresponding voice recordings and subjectively through human perceptual judgments.Main results. Artificial speech sounds were successfully synthesized during overt and mimed utterances by two participants with some coverage of the precentral gyrus. About a third of these sounds were correctly identified by naïve listeners in two-alternative forced-choice tasks. A similar outcome could not be achieved during imagined utterances by any of the participants. However, neural feature contribution analyses suggested the presence of exploitable activation patterns during imagined speech in the postcentral gyrus and the superior temporal gyrus. In future work, a more comprehensive coverage of cortical surfaces, including posterior parts of the middle frontal gyrus and the inferior frontal gyrus, could improve synthesis performance during imagined speech.Significance.As the field of speech neuroprostheses is rapidly moving toward clinical trials, this study addressed important considerations about task instructions and brain coverage when conducting research on silent speech with non-target participants.
Collapse
Affiliation(s)
- Kevin Meng
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
| | - Farhad Goodarzy
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| | - EuiYoung Kim
- Interdisciplinary Program in Neuroscience, Seoul National University, Seoul, Republic of Korea
| | - Ye Jin Park
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Republic of Korea
| | - June Sic Kim
- Research Institute of Basic Sciences, Seoul National University, Seoul, Republic of Korea
| | - Mark J Cook
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| | - Chun Kee Chung
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Republic of Korea
- Department of Neurosurgery, Seoul National University Hospital, Seoul, Republic of Korea
| | - David B Grayden
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| |
Collapse
|