1
|
Angrick M, Luo S, Rabbani Q, Candrea DN, Shah S, Milsap GW, Anderson WS, Gordon CR, Rosenblatt KR, Clawson L, Tippett DC, Maragakis N, Tenore FV, Fifer MS, Hermansky H, Ramsey NF, Crone NE. Online speech synthesis using a chronically implanted brain-computer interface in an individual with ALS. Sci Rep 2024; 14:9617. [PMID: 38671062 PMCID: PMC11053081 DOI: 10.1038/s41598-024-60277-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 04/21/2024] [Indexed: 04/28/2024] Open
Abstract
Brain-computer interfaces (BCIs) that reconstruct and synthesize speech using brain activity recorded with intracranial electrodes may pave the way toward novel communication interfaces for people who have lost their ability to speak, or who are at high risk of losing this ability, due to neurological disorders. Here, we report online synthesis of intelligible words using a chronically implanted brain-computer interface (BCI) in a man with impaired articulation due to ALS, participating in a clinical trial (ClinicalTrials.gov, NCT03567213) exploring different strategies for BCI communication. The 3-stage approach reported here relies on recurrent neural networks to identify, decode and synthesize speech from electrocorticographic (ECoG) signals acquired across motor, premotor and somatosensory cortices. We demonstrate a reliable BCI that synthesizes commands freely chosen and spoken by the participant from a vocabulary of 6 keywords previously used for decoding commands to control a communication board. Evaluation of the intelligibility of the synthesized speech indicates that 80% of the words can be correctly recognized by human listeners. Our results show that a speech-impaired individual with ALS can use a chronically implanted BCI to reliably produce synthesized words while preserving the participant's voice profile, and provide further evidence for the stability of ECoG for speech-based BCIs.
Collapse
Affiliation(s)
- Miguel Angrick
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Shiyu Luo
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Qinwan Rabbani
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Daniel N Candrea
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Samyak Shah
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Griffin W Milsap
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD, USA
| | - William S Anderson
- Department of Neurosurgery, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Chad R Gordon
- Department of Neurosurgery, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Section of Neuroplastic and Reconstructive Surgery, Department of Plastic Surgery, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kathryn R Rosenblatt
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Anesthesiology & Critical Care Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Lora Clawson
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Donna C Tippett
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Otolaryngology-Head and Neck Surgery, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Physical Medicine and Rehabilitation, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Nicholas Maragakis
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Francesco V Tenore
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD, USA
| | - Matthew S Fifer
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD, USA
| | - Hynek Hermansky
- Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD, USA
- Human Language Technology Center of Excellence, The Johns Hopkins University, Baltimore, MD, USA
| | - Nick F Ramsey
- UMC Utrecht Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Nathan E Crone
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
2
|
Angrick M, Luo S, Rabbani Q, Candrea DN, Shah S, Milsap GW, Anderson WS, Gordon CR, Rosenblatt KR, Clawson L, Maragakis N, Tenore FV, Fifer MS, Hermansky H, Ramsey NF, Crone NE. Online speech synthesis using a chronically implanted brain-computer interface in an individual with ALS. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.30.23291352. [PMID: 37425721 PMCID: PMC10327279 DOI: 10.1101/2023.06.30.23291352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Recent studies have shown that speech can be reconstructed and synthesized using only brain activity recorded with intracranial electrodes, but until now this has only been done using retrospective analyses of recordings from able-bodied patients temporarily implanted with electrodes for epilepsy surgery. Here, we report online synthesis of intelligible words using a chronically implanted brain-computer interface (BCI) in a clinical trial participant (ClinicalTrials.gov, NCT03567213) with dysarthria due to amyotrophic lateral sclerosis (ALS). We demonstrate a reliable BCI that synthesizes commands freely chosen and spoken by the user from a vocabulary of 6 keywords originally designed to allow intuitive selection of items on a communication board. Our results show for the first time that a speech-impaired individual with ALS can use a chronically implanted BCI to reliably produce synthesized words that are intelligible to human listeners while preserving the participants voice profile.
Collapse
Affiliation(s)
- Miguel Angrick
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Shiyu Luo
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Qinwan Rabbani
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Daniel N Candrea
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Samyak Shah
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Griffin W Milsap
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD, USA
| | - William S Anderson
- Department of Neurosurgery, The Johns Hopkins University School of Medicine, Baltimore, MD
| | - Chad R Gordon
- Department of Neurosurgery, The Johns Hopkins University School of Medicine, Baltimore, MD
- Section of Neuroplastic and Reconstructive Surgery, Department of Plastic Surgery, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kathryn R Rosenblatt
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Anesthesiology & Critical Care Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Lora Clawson
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Nicholas Maragakis
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Francesco V Tenore
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD, USA
| | - Matthew S Fifer
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD, USA
| | - Hynek Hermansky
- Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD, USA
- Human Language Technology Center of Excellence, The Johns Hopkins University, Baltimore, MD, USA
| | - Nick F Ramsey
- UMC Utrecht Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Nathan E Crone
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
3
|
Luo S, Rabbani Q, Crone NE. Brain-Computer Interface: Applications to Speech Decoding and Synthesis to Augment Communication. Neurotherapeutics 2022; 19:263-273. [PMID: 35099768 PMCID: PMC9130409 DOI: 10.1007/s13311-022-01190-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2022] [Indexed: 01/03/2023] Open
Abstract
Damage or degeneration of motor pathways necessary for speech and other movements, as in brainstem strokes or amyotrophic lateral sclerosis (ALS), can interfere with efficient communication without affecting brain structures responsible for language or cognition. In the worst-case scenario, this can result in the locked in syndrome (LIS), a condition in which individuals cannot initiate communication and can only express themselves by answering yes/no questions with eye blinks or other rudimentary movements. Existing augmentative and alternative communication (AAC) devices that rely on eye tracking can improve the quality of life for people with this condition, but brain-computer interfaces (BCIs) are also increasingly being investigated as AAC devices, particularly when eye tracking is too slow or unreliable. Moreover, with recent and ongoing advances in machine learning and neural recording technologies, BCIs may offer the only means to go beyond cursor control and text generation on a computer, to allow real-time synthesis of speech, which would arguably offer the most efficient and expressive channel for communication. The potential for BCI speech synthesis has only recently been realized because of seminal studies of the neuroanatomical and neurophysiological underpinnings of speech production using intracranial electrocorticographic (ECoG) recordings in patients undergoing epilepsy surgery. These studies have shown that cortical areas responsible for vocalization and articulation are distributed over a large area of ventral sensorimotor cortex, and that it is possible to decode speech and reconstruct its acoustics from ECoG if these areas are recorded with sufficiently dense and comprehensive electrode arrays. In this article, we review these advances, including the latest neural decoding strategies that range from deep learning models to the direct concatenation of speech units. We also discuss state-of-the-art vocoders that are integral in constructing natural-sounding audio waveforms for speech BCIs. Finally, this review outlines some of the challenges ahead in directly synthesizing speech for patients with LIS.
Collapse
Affiliation(s)
- Shiyu Luo
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Qinwan Rabbani
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Nathan E Crone
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
4
|
Milsap G, Collard M, Coogan C, Rabbani Q, Wang Y, Crone NE. Keyword Spotting Using Human Electrocorticographic Recordings. Front Neurosci 2019; 13:60. [PMID: 30837823 PMCID: PMC6389788 DOI: 10.3389/fnins.2019.00060] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 01/21/2019] [Indexed: 11/13/2022] Open
Abstract
Neural keyword spotting could form the basis of a speech brain-computer-interface for menu-navigation if it can be done with low latency and high specificity comparable to the “wake-word” functionality of modern voice-activated AI assistant technologies. This study investigated neural keyword spotting using motor representations of speech via invasively-recorded electrocorticographic signals as a proof-of-concept. Neural matched filters were created from monosyllabic consonant-vowel utterances: one keyword utterance, and 11 similar non-keyword utterances. These filters were used in an analog to the acoustic keyword spotting problem, applied for the first time to neural data. The filter templates were cross-correlated with the neural signal, capturing temporal dynamics of neural activation across cortical sites. Neural vocal activity detection (VAD) was used to identify utterance times and a discriminative classifier was used to determine if these utterances were the keyword or non-keyword speech. Model performance appeared to be highly related to electrode placement and spatial density. Vowel height (/a/ vs /i/) was poorly discriminated in recordings from sensorimotor cortex, but was highly discriminable using neural features from superior temporal gyrus during self-monitoring. The best performing neural keyword detection (5 keyword detections with two false-positives across 60 utterances) and neural VAD (100% sensitivity, ~1 false detection per 10 utterances) came from high-density (2 mm electrode diameter and 5 mm pitch) recordings from ventral sensorimotor cortex, suggesting the spatial fidelity and extent of high-density ECoG arrays may be sufficient for the purpose of speech brain-computer-interfaces.
Collapse
Affiliation(s)
- Griffin Milsap
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Maxwell Collard
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Christopher Coogan
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Qinwan Rabbani
- Department of Electrical Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Yujing Wang
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, United States.,Fischell Department of Bioengineering, University of Maryland College Park, College Park, MD, United States
| | - Nathan E Crone
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| |
Collapse
|
5
|
Rabbani Q, Milsap G, Crone NE. The Potential for a Speech Brain-Computer Interface Using Chronic Electrocorticography. Neurotherapeutics 2019; 16:144-165. [PMID: 30617653 PMCID: PMC6361062 DOI: 10.1007/s13311-018-00692-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
A brain-computer interface (BCI) is a technology that uses neural features to restore or augment the capabilities of its user. A BCI for speech would enable communication in real time via neural correlates of attempted or imagined speech. Such a technology would potentially restore communication and improve quality of life for locked-in patients and other patients with severe communication disorders. There have been many recent developments in neural decoders, neural feature extraction, and brain recording modalities facilitating BCI for the control of prosthetics and in automatic speech recognition (ASR). Indeed, ASR and related fields have developed significantly over the past years, and many lend many insights into the requirements, goals, and strategies for speech BCI. Neural speech decoding is a comparatively new field but has shown much promise with recent studies demonstrating semantic, auditory, and articulatory decoding using electrocorticography (ECoG) and other neural recording modalities. Because the neural representations for speech and language are widely distributed over cortical regions spanning the frontal, parietal, and temporal lobes, the mesoscopic scale of population activity captured by ECoG surface electrode arrays may have distinct advantages for speech BCI, in contrast to the advantages of microelectrode arrays for upper-limb BCI. Nevertheless, there remain many challenges for the translation of speech BCIs to clinical populations. This review discusses and outlines the current state-of-the-art for speech BCI and explores what a speech BCI using chronic ECoG might entail.
Collapse
Affiliation(s)
- Qinwan Rabbani
- Department of Electrical Engineering, The Johns Hopkins University Whiting School of Engineering, Baltimore, MD, USA.
| | - Griffin Milsap
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Nathan E Crone
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
6
|
Decoding spoken phonemes from sensorimotor cortex with high-density ECoG grids. Neuroimage 2017; 180:301-311. [PMID: 28993231 DOI: 10.1016/j.neuroimage.2017.10.011] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Revised: 10/04/2017] [Accepted: 10/06/2017] [Indexed: 12/19/2022] Open
Abstract
For people who cannot communicate due to severe paralysis or involuntary movements, technology that decodes intended speech from the brain may offer an alternative means of communication. If decoding proves to be feasible, intracranial Brain-Computer Interface systems can be developed which are designed to translate decoded speech into computer generated speech or to instructions for controlling assistive devices. Recent advances suggest that such decoding may be feasible from sensorimotor cortex, but it is not clear how this challenge can be approached best. One approach is to identify and discriminate elements of spoken language, such as phonemes. We investigated feasibility of decoding four spoken phonemes from the sensorimotor face area, using electrocorticographic signals obtained with high-density electrode grids. Several decoding algorithms including spatiotemporal matched filters, spatial matched filters and support vector machines were compared. Phonemes could be classified correctly at a level of over 75% with spatiotemporal matched filters. Support Vector machine analysis reached a similar level, but spatial matched filters yielded significantly lower scores. The most informative electrodes were clustered along the central sulcus. Highest scores were achieved from time windows centered around voice onset time, but a 500 ms window before onset time could also be classified significantly. The results suggest that phoneme production involves a sequence of robust and reproducible activity patterns on the cortical surface. Importantly, decoding requires inclusion of temporal information to capture the rapid shifts of robust patterns associated with articulator muscle group contraction during production of a phoneme. The high classification scores are likely to be enabled by the use of high density grids, and by the use of discrete phonemes. Implications for use in Brain-Computer Interfaces are discussed.
Collapse
|