1
|
Wang R, Chen ZS. Large-scale foundation models and generative AI for BigData neuroscience. Neurosci Res 2025; 215:3-14. [PMID: 38897235 PMCID: PMC11649861 DOI: 10.1016/j.neures.2024.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 04/15/2024] [Accepted: 05/15/2024] [Indexed: 06/21/2024]
Abstract
Recent advances in machine learning have led to revolutionary breakthroughs in computer games, image and natural language understanding, and scientific discovery. Foundation models and large-scale language models (LLMs) have recently achieved human-like intelligence thanks to BigData. With the help of self-supervised learning (SSL) and transfer learning, these models may potentially reshape the landscapes of neuroscience research and make a significant impact on the future. Here we present a mini-review on recent advances in foundation models and generative AI models as well as their applications in neuroscience, including natural language and speech, semantic memory, brain-machine interfaces (BMIs), and data augmentation. We argue that this paradigm-shift framework will open new avenues for many neuroscience research directions and discuss the accompanying challenges and opportunities.
Collapse
Affiliation(s)
- Ran Wang
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Zhe Sage Chen
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA; Department of Neuroscience and Physiology, Neuroscience Institute, New York University Grossman School of Medicine, New York, NY 10016, USA; Department of Biomedical Engineering, New York University Tandon School of Engineering, Brooklyn, NY 11201, USA.
| |
Collapse
|
2
|
Tsunada J, Eliades SJ. Frontal-auditory cortical interactions and sensory prediction during vocal production in marmoset monkeys. Curr Biol 2025; 35:2307-2322.e3. [PMID: 40250436 DOI: 10.1016/j.cub.2025.03.077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 02/20/2025] [Accepted: 03/28/2025] [Indexed: 04/20/2025]
Abstract
The control of speech and vocal production involves the calculation of error between the intended vocal output and the resulting auditory feedback. This model has been supported by evidence that the auditory cortex (AC) is suppressed immediately before and during vocal production yet remains sensitive to differences between vocal output and altered auditory feedback. This suppression has been suggested to be the result of top-down signals about the intended vocal output, potentially originating from frontal cortical (FC) areas. However, whether FC is the source of suppressive and predictive signaling to AC during vocalization remains unknown. Here, we simultaneously recorded neural activity from both AC and FC of marmoset monkeys during self-initiated vocalizations. We found increases in neural activity in both brain areas from 1 to 0.5 s before vocal production (early pre-vocal period), specifically changes in both multi-unit activity and theta-band power. Connectivity analysis using Granger causality demonstrated that FC sends directed signaling to AC during this early pre-vocal period. Importantly, early pre-vocal activity correlated with both vocalization-induced suppression in AC as well as the structure and acoustics of subsequent calls, such as fundamental frequency. Furthermore, bidirectional auditory-frontal interactions emerged during experimentally altered vocal feedback and predicted subsequent compensatory vocal behavior. These results suggest that FC communicates with AC during vocal production, with frontal-to-auditory signals that may reflect the transmission of sensory prediction information before vocalization and bidirectional signaling during vocalization suggestive of error detection that could drive feedback-dependent vocal control.
Collapse
Affiliation(s)
- Joji Tsunada
- Beijing Institute for Brain Research, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 102206, China; Chinese Institute for Brain Research, Beijing 102206, China; Department of Veterinary Medicine, Faculty of Agriculture, Iwate University, Morioka 0208550, Iwate, Japan.
| | - Steven J Eliades
- Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710, USA
| |
Collapse
|
3
|
Khan S, Kallis L, Mee H, El Hadwe S, Barone D, Hutchinson P, Kolias A. Invasive Brain-Computer Interface for Communication: A Scoping Review. Brain Sci 2025; 15:336. [PMID: 40309789 PMCID: PMC12026362 DOI: 10.3390/brainsci15040336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Revised: 03/10/2025] [Accepted: 03/19/2025] [Indexed: 05/02/2025] Open
Abstract
BACKGROUND The rapid expansion of the brain-computer interface for patients with neurological deficits has garnered significant interest, and for patients, it provides an additional route where conventional rehabilitation has its limits. This has particularly been the case for patients who lose the ability to communicate. Circumventing neural injuries by recording from the intact cortex and subcortex has the potential to allow patients to communicate and restore self-expression. Discoveries over the last 10-15 years have been possible through advancements in technology, neuroscience, and computing. By examining studies involving intracranial brain-computer interfaces that aim to restore communication, we aimed to explore the advances made and explore where the technology is heading. METHODS For this scoping review, we systematically searched PubMed and OVID Embase. After processing the articles, the search yielded 41 articles that we included in this review. RESULTS The articles predominantly assessed patients who had either suffered from amyotrophic lateral sclerosis, cervical cord injury, or brainstem stroke, resulting in tetraplegia and, in some cases, difficulty speaking. Of the intracranial implants, ten had ALS, six had brainstem stroke, and thirteen had a spinal cord injury. Stereoelectroencephalography was also used, but the results, whilst promising, are still in their infancy. Studies involving patients who were moving cursors on a screen could improve the speed of movement by optimising the interface and utilising better decoding methods. In recent years, intracortical devices have been successfully used for accurate speech-to-text and speech-to-audio decoding in patients who are unable to speak. CONCLUSIONS Here, we summarise the progress made by BCIs used for communication. Speech decoding directly from the cortex can provide a novel therapeutic method to restore full, embodied communication to patients suffering from tetraplegia who otherwise cannot communicate.
Collapse
Affiliation(s)
- Shujhat Khan
- Department of Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK; (S.K.); (H.M.); (S.E.H.); (D.B.); (P.H.)
| | - Leonie Kallis
- Department of Medicine, University of Cambridge, Trinity Ln, Cambridge CB2 1TN, UK;
| | - Harry Mee
- Department of Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK; (S.K.); (H.M.); (S.E.H.); (D.B.); (P.H.)
- Department of Rehabilitation, Addenbrookes Hospital, Hills Rd., Cambridge CB2 0QQ, UK
| | - Salim El Hadwe
- Department of Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK; (S.K.); (H.M.); (S.E.H.); (D.B.); (P.H.)
- Bioelectronics Laboratory, Department of Electrical Engineering, University of Cambridge, Cambridge CB2 1PZ, UK
| | - Damiano Barone
- Department of Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK; (S.K.); (H.M.); (S.E.H.); (D.B.); (P.H.)
- Department of Neurosurgery, Houston Methodist, Houston, TX 77079, USA
| | - Peter Hutchinson
- Department of Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK; (S.K.); (H.M.); (S.E.H.); (D.B.); (P.H.)
- Department of Neurosurgery, Addenbrookes Hospital, Hills Rd., Cambridge CB2 0QQ, UK
| | - Angelos Kolias
- Department of Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK; (S.K.); (H.M.); (S.E.H.); (D.B.); (P.H.)
- Department of Neurosurgery, Addenbrookes Hospital, Hills Rd., Cambridge CB2 0QQ, UK
| |
Collapse
|
4
|
Georgieva-Bozhkova K, Konstantinova D, Nenova-Nogalcheva A, Nedelchev D. The impact of diastema on the articulation of speech sounds. Folia Med (Plovdiv) 2025; 67. [PMID: 40270173 DOI: 10.3897/folmed.67.e144621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Accepted: 01/20/2025] [Indexed: 04/25/2025] Open
Abstract
INTRODUCTION The faculty of speech in humans is a distinguishing trait that sets them apart from all other biological species. Speech production is typically achieved through the processes of phonation and articulation. The acoustic method of studying speech is an individual auditory approach based on hearing as a biological analyzer.
Collapse
|
5
|
Prakash PR, Lei T, Flint RD, Hsieh JK, Fitzgerald Z, Mugler E, Templer J, Goldrick MA, Tate MC, Rosenow J, Glaser J, Slutzky MW. Decoding speech intent from non-frontal cortical areas. J Neural Eng 2025; 22:016024. [PMID: 39808939 PMCID: PMC11822885 DOI: 10.1088/1741-2552/adaa20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 01/08/2025] [Accepted: 01/14/2025] [Indexed: 01/16/2025]
Abstract
Objective. Brain machine interfaces (BMIs) that can restore speech have predominantly focused on decoding speech signals from the speech motor cortices. A few studies have shown some information outside the speech motor cortices, such as in parietal and temporal lobes, that also may be useful for BMIs. The ability to use information from outside the frontal lobe could be useful not only for people with locked-in syndrome, but also to people with frontal lobe damage, which can cause nonfluent aphasia or apraxia of speech. However, temporal and parietal lobes are predominantly involved in perceptive speech processing and comprehension. Therefore, to be able to use signals from these areas in a speech BMI, it is important to ascertain that they are related to production. Here, using intracranial recordings, we sought evidence for whether, when and where neural information related to speech intent could be found in the temporal and parietal corticesApproach. Using intracranial recordings, we examined neural activity across temporal and parietal cortices to identify signals associated with speech intent. We employed causal information to distinguish speech intent from resting states and other language-related processes, such as comprehension and working memory. Neural signals were analyzed for their spatial distribution and temporal dynamics to determine their relevance to speech production.Main results. Causal information enabled us to distinguish speech intent from resting state and other processes involved in language processing or working memory. Information related to speech intent was distributed widely across the temporal and parietal lobes, including superior temporal, medial temporal, angular, and supramarginal gyri.Significance. Loss of communication due to neurological diseases can be devastating. While speech BMIs have made strides in decoding speech from frontal lobe signals, our study reveals that the temporal and parietal cortices contain information about speech production intent that can be causally decoded prior to the onset of voice. This information is distributed across a large network. This information can be used to improve current speech BMIs and potentially expand the patient population for speech BMIs to include people with frontal lobe damage from stroke or traumatic brain injury.
Collapse
Affiliation(s)
- Prashanth Ravi Prakash
- Departments of Biomedical Engineering, Northwestern University, Chicago, IL 60611, United States of America
| | - Tianhao Lei
- Neurology, Northwestern University, Chicago, IL 60611, United States of America
| | - Robert D Flint
- Neurology, Northwestern University, Chicago, IL 60611, United States of America
| | - Jason K Hsieh
- Department of Neurosurgery, Neurological Institute, Cleveland Clinic Foundation, Cleveland, OH, United States of America
| | - Zachary Fitzgerald
- Neurology, Northwestern University, Chicago, IL 60611, United States of America
| | - Emily Mugler
- Neurology, Northwestern University, Chicago, IL 60611, United States of America
| | - Jessica Templer
- Neurology, Northwestern University, Chicago, IL 60611, United States of America
| | - Matthew A Goldrick
- Linguistics, Northwestern University, Chicago, IL 60611, United States of America
| | - Matthew C Tate
- Neurosurgery, Northwestern University, Chicago, IL 60611, United States of America
| | - Joshua Rosenow
- Neurosurgery, Northwestern University, Chicago, IL 60611, United States of America
| | - Joshua Glaser
- Neurology, Northwestern University, Chicago, IL 60611, United States of America
| | - Marc W Slutzky
- Departments of Biomedical Engineering, Northwestern University, Chicago, IL 60611, United States of America
- Neurology, Northwestern University, Chicago, IL 60611, United States of America
- Neuroscience, Northwestern University, Chicago, IL 60611, United States of America
- Physical Medicine & Rehabilitation, Northwestern University, Chicago, IL 60611, United States of America
| |
Collapse
|
6
|
Chen K, Zhou S, Lu S, Qin Y, Li X, Li Y, Liu T, Zhang M, Xu K, Shi H, Lv X, Yuan K, Shi H, Qin D. A systematic review of the efficacy of repetitive transcranial magnetic stimulation in treating dysarthria in patients with Parkinson's disease. Front Aging Neurosci 2025; 17:1501640. [PMID: 39980794 PMCID: PMC11841439 DOI: 10.3389/fnagi.2025.1501640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 01/15/2025] [Indexed: 02/22/2025] Open
Abstract
Objective To analyze the literature on the efficacy of repetitive transcranial magnetic stimulation (rTMS) in treating dysarthria in patients with Parkinson's disease (PD) and provide a reference for targeted clinical treatment of dysarthria in PD patients. Methods A systematic search was conducted in English and Chinese databases, including Embase, Cochrane, Medline, PubMed, CNKI, Wanfang, Chinese Biomedical Literature Database, and VIP Database, for relevant literature on rTMS treatment for dysarthria in PD patients. The search timeframe was from the inception of each database to October 2023. Literature was screened according to inclusion and exclusion criteria. Two researchers extracted information on study subjects, age, intervention methods, intervention duration, intervention frequency, evaluation indicators, and intervention results from the included literature. The modified Jadad scale was used to evaluate the quality of the literature. Results A total of seven studies were included, mainly focusing on the frequency, duration, and stimulation site of rTMS for dysarthria in PD patients. Six studies indicated that rTMS treatment improved dysarthria in PD patients. Conclusion Repetitive transcranial magnetic stimulation has a positive effect on improving dysarthria in PD patients, but further research is needed to determine its efficacy.
Collapse
Affiliation(s)
- Kerong Chen
- Department of Rehabilitation Medicine, The Third People's Hospital of Yunnan Province, Kunming, Yunnan, China
| | - Sitong Zhou
- Key Laboratory of Traditional Chinese Medicine for Prevention and Treatment of Neuropsychiatric Diseases, Yunnan University of Chinese Medicine, Kunming, Yunnan, China
| | - Shiyu Lu
- The People's Hospital of Mengzi, The Affiliated Hospital of Yunnan University of Chinese Medicine, Mengzi, Honghe, China
| | - Yuliang Qin
- Key Laboratory of Traditional Chinese Medicine for Prevention and Treatment of Neuropsychiatric Diseases, Yunnan University of Chinese Medicine, Kunming, Yunnan, China
| | - Xinyao Li
- Key Laboratory of Traditional Chinese Medicine for Prevention and Treatment of Neuropsychiatric Diseases, Yunnan University of Chinese Medicine, Kunming, Yunnan, China
| | - Yi Li
- Department of Rehabilitation Medicine, The Third People's Hospital of Yunnan Province, Kunming, Yunnan, China
| | - Tianyun Liu
- Department of Rehabilitation Medicine, The Third People's Hospital of Yunnan Province, Kunming, Yunnan, China
| | - Mei Zhang
- Department of Rehabilitation Medicine, The Third People's Hospital of Yunnan Province, Kunming, Yunnan, China
| | - Kun Xu
- Department of Rehabilitation Medicine, The Third People's Hospital of Yunnan Province, Kunming, Yunnan, China
| | - Hongping Shi
- Department of Rehabilitation Medicine, The Third People's Hospital of Yunnan Province, Kunming, Yunnan, China
| | - Xiaoman Lv
- Key Laboratory of Traditional Chinese Medicine for Prevention and Treatment of Neuropsychiatric Diseases, Yunnan University of Chinese Medicine, Kunming, Yunnan, China
| | - Kai Yuan
- Second Clinical Medical College, Yunnan University of Chinese Medicine, Kunming, Yunnan, China
| | - Hongling Shi
- Department of Rehabilitation Medicine, The Third People's Hospital of Yunnan Province, Kunming, Yunnan, China
| | - Dongdong Qin
- Key Laboratory of Traditional Chinese Medicine for Prevention and Treatment of Neuropsychiatric Diseases, Yunnan University of Chinese Medicine, Kunming, Yunnan, China
| |
Collapse
|
7
|
Chen J, Chen X, Wang R, Le C, Khalilian-Gourtani A, Jensen E, Dugan P, Doyle W, Devinsky O, Friedman D, Flinker A, Wang Y. Transformer-based neural speech decoding from surface and depth electrode signals. J Neural Eng 2025; 22:016017. [PMID: 39819752 PMCID: PMC11773629 DOI: 10.1088/1741-2552/adab21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 11/24/2024] [Accepted: 01/15/2025] [Indexed: 01/19/2025]
Abstract
Objective.This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e. Electrocorticographic (ECoG) or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface ECoG and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements. The model should not have subject-specific layers and the trained model should perform well on participants unseen during training.Approach.We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train subject-specific models using data from a single participant and multi-subject models exploiting data from multiple participants.Main results.The subject-specific models using only low-density 8 × 8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC = 0.817), overN= 43 participants, significantly outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N= 39) led to further improvement (PCC = 0.838). For participants with only sEEG electrodes (N= 9), subject-specific models still enjoy comparable performance with an average PCC = 0.798. A single multi-subject model trained on ECoG data from 15 participants yielded comparable results (PCC = 0.837) as 15 models trained individually for these participants (PCC = 0.831). Furthermore, the multi-subject models achieved high performance on unseen participants, with an average PCC = 0.765 in leave-one-out cross-validation.Significance.The proposed SwinTW decoder enables future speech decoding approaches to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. The success of the single multi-subject model when tested on participants within the training cohort demonstrates that the model architecture is capable of exploiting data from multiple participants with diverse electrode placements. The architecture's flexibility in training with both single-subject and multi-subject data, as well as grid and non-grid electrodes, ensures its broad applicability. Importantly, the generalizability of the multi-subject models in our study population suggests that a model trained using paired acoustic and neural data from multiple patients can potentially be applied to new patients with speech disability where acoustic-neural training data is not feasible.
Collapse
Affiliation(s)
- Junbo Chen
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
| | - Xupeng Chen
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
| | - Ran Wang
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
| | - Chenqian Le
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
| | | | - Erika Jensen
- Neurology Department, New York University, 223 East 34th Street, Manhattan, NY 10016, United States of America
| | - Patricia Dugan
- Neurology Department, New York University, 223 East 34th Street, Manhattan, NY 10016, United States of America
| | - Werner Doyle
- Neurosurgery Department, New York University, 550 1st Avenue, Manhattan, NY 10016, United States of America
| | - Orrin Devinsky
- Neurology Department, New York University, 223 East 34th Street, Manhattan, NY 10016, United States of America
| | - Daniel Friedman
- Neurology Department, New York University, 223 East 34th Street, Manhattan, NY 10016, United States of America
| | - Adeen Flinker
- Neurology Department, New York University, 223 East 34th Street, Manhattan, NY 10016, United States of America
- Biomedical Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
| | - Yao Wang
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
- Biomedical Engineering Department, New York University, 370 Jay Street, Brooklyn, NY 11201, United States of America
| |
Collapse
|
8
|
He Q, Yang Y, Ge P, Li S, Chai X, Luo Z, Zhao J. The brain nebula: minimally invasive brain-computer interface by endovascular neural recording and stimulation. J Neurointerv Surg 2024; 16:1237-1243. [PMID: 38388478 PMCID: PMC11671944 DOI: 10.1136/jnis-2023-021296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 01/19/2024] [Indexed: 02/24/2024]
Abstract
A brain-computer interface (BCI) serves as a direct communication channel between brain activity and external devices, typically a computer or robotic limb. Advances in technology have led to the increasing use of intracranial electrical recording or stimulation in the treatment of conditions such as epilepsy, depression, and movement disorders. This indicates that BCIs can offer clinical neurological rehabilitation for patients with disabilities and functional impairments. They also provide a means to restore consciousness and functionality for patients with sequelae from major brain diseases. Whether invasive or non-invasive, the collected cortical or deep signals can be decoded and translated for communication. This review aims to provide an overview of the advantages of endovascular BCIs compared with conventional BCIs, along with insights into the specific anatomical regions under study. Given the rapid progress, we also provide updates on ongoing clinical trials and the prospects for current research involving endovascular electrodes.
Collapse
Affiliation(s)
- Qiheng He
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- Brain Computer Interface Transitional Research Center, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Yi Yang
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- Brain Computer Interface Transitional Research Center, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Center for Neurological Disorders, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
- National Research Center for Rehabilitation Technical Aids, Beijing, China
- Chinese Institute for Brain Research, Beijing, People's Republic of China
- Beijing Institute of Brain Disorders, Beijing, People's Republic of China
| | - Peicong Ge
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Sining Li
- Tianjin Key Laboratory of Brain Science and Intelligent Rehabilitation, College of Artificial Intelligence, Nankai University, Tianjin, China
| | - Xiaoke Chai
- Brain Computer Interface Transitional Research Center, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Zhongqiu Luo
- Department of Neurosurgery, Shenzhen Qianhai Shekou Free Trade Zone Hospital, Shenzhen, China
| | - Jizong Zhao
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Center for Neurological Disorders, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
| |
Collapse
|
9
|
Chen J, Chen X, Wang R, Le C, Khalilian-Gourtani A, Jensen E, Dugan P, Doyle W, Devinsky O, Friedman D, Flinker A, Wang Y. Subject-Agnostic Transformer-Based Neural Speech Decoding from Surface and Depth Electrode Signals. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.11.584533. [PMID: 38559163 PMCID: PMC10980022 DOI: 10.1101/2024.03.11.584533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Objective This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e., Electrocorticographic or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface (ECoG) and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements and the trained model should perform well on participants unseen during training. Approach We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train subject-specific models using data from a single participant and multi-patient models exploiting data from multiple participants. Main Results The subject-specific models using only low-density 8x8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC=0.817), over N=43 participants, outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N=39) led to further improvement (PCC=0.838). For participants with only sEEG electrodes (N=9), subject-specific models still enjoy comparable performance with an average PCC=0.798. The multi-subject models achieved high performance on unseen participants, with an average PCC=0.765 in leave-one-out cross-validation. Significance The proposed SwinTW decoder enables future speech neuroprostheses to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. Importantly, the generalizability of the multi-patient models suggests that such a model can be applied to new patients that do not have paired acoustic and neural data, providing an advance in neuroprostheses for people with speech disability, where acoustic-neural training data is not feasible.
Collapse
Affiliation(s)
- Junbo Chen
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
| | - Xupeng Chen
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
| | - Ran Wang
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
| | - Chenqian Le
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
| | | | - Erika Jensen
- Neurology Department, New York University, 223 East 34th Street, Manhattan, 10016, NY, USA
| | - Patricia Dugan
- Neurology Department, New York University, 223 East 34th Street, Manhattan, 10016, NY, USA
| | - Werner Doyle
- Neurosurgery Department, New York University, 550 1st Avenue, Manhattan, 10016, NY, USA
| | - Orrin Devinsky
- Neurology Department, New York University, 223 East 34th Street, Manhattan, 10016, NY, USA
| | - Daniel Friedman
- Neurology Department, New York University, 223 East 34th Street, Manhattan, 10016, NY, USA
| | - Adeen Flinker
- Neurology Department, New York University, 223 East 34th Street, Manhattan, 10016, NY, USA
- Biomedical Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
| | - Yao Wang
- Electrical and Computer Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
- Biomedical Engineering Department, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA
| |
Collapse
|
10
|
Maskeliūnas R, Damaševičius R, Kulikajevas A, Pribuišis K, Uloza V. Alaryngeal Speech Enhancement for Noisy Environments Using a Pareto Denoising Gated LSTM. J Voice 2024:S0892-1997(24)00228-5. [PMID: 39107213 DOI: 10.1016/j.jvoice.2024.07.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/13/2024] [Accepted: 07/15/2024] [Indexed: 08/09/2024]
Abstract
Loss of the larynx significantly alters natural voice production, requiring alternative communication modalities and rehabilitation methods to restore speech intelligibility and improve the quality of life of affected individuals. This paper explores advances in alaryngeal speech enhancement to improve signal quality and reduce background noise, focusing on individuals who have undergone laryngectomy. In this study, speech samples were obtained from 23 Lithuanian males who had undergone laryngectomy with secondary implantation of the tracheoesophageal prosthesis (TEP). Pareto-optimized gated long short-term memory was trained on tracheoesophageal speech data to recognize complex temporal connections and contextual information in speech signals. The system was able to distinguish between actual speech and various forms of noise and artifacts, resulting in a 25% drop in the mean signal-to-noise ratio compared to other approaches. According to acoustic analysis, the system significantly decreased the number of unvoiced frames (proportion of voiced frames) from 40% to 10% while maintaining stable proportions of voiced frames (proportion of voiced speech frames) and average voicing evidence (average voice evidence in voiced frames), indicating the accuracy of the approach in selectively attenuating noise and undesired speech artifacts while preserving important speech information.
Collapse
Affiliation(s)
- Rytis Maskeliūnas
- Centre of Real Time Computer Systems, Kaunas University of Technology, Kaunas, Lithuania.
| | - Robertas Damaševičius
- Centre of Real Time Computer Systems, Kaunas University of Technology, Kaunas, Lithuania
| | - Audrius Kulikajevas
- Centre of Real Time Computer Systems, Kaunas University of Technology, Kaunas, Lithuania
| | - Kipras Pribuišis
- Department of Otolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| | - Virgilijus Uloza
- Department of Otolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania
| |
Collapse
|
11
|
Wu H, Cai C, Ming W, Chen W, Zhu Z, Feng C, Jiang H, Zheng Z, Sawan M, Wang T, Zhu J. Speech decoding using cortical and subcortical electrophysiological signals. Front Neurosci 2024; 18:1345308. [PMID: 38486966 PMCID: PMC10937352 DOI: 10.3389/fnins.2024.1345308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 02/12/2024] [Indexed: 03/17/2024] Open
Abstract
Introduction Language impairments often result from severe neurological disorders, driving the development of neural prosthetics utilizing electrophysiological signals to restore comprehensible language. Previous decoding efforts primarily focused on signals from the cerebral cortex, neglecting subcortical brain structures' potential contributions to speech decoding in brain-computer interfaces. Methods In this study, stereotactic electroencephalography (sEEG) was employed to investigate subcortical structures' role in speech decoding. Two native Mandarin Chinese speakers, undergoing sEEG implantation for epilepsy treatment, participated. Participants read Chinese text, with 1-30, 30-70, and 70-150 Hz frequency band powers of sEEG signals extracted as key features. A deep learning model based on long short-term memory assessed the contribution of different brain structures to speech decoding, predicting consonant articulatory place, manner, and tone within single syllable. Results Cortical signals excelled in articulatory place prediction (86.5% accuracy), while cortical and subcortical signals performed similarly for articulatory manner (51.5% vs. 51.7% accuracy). Subcortical signals provided superior tone prediction (58.3% accuracy). The superior temporal gyrus was consistently relevant in speech decoding for consonants and tone. Combining cortical and subcortical inputs yielded the highest prediction accuracy, especially for tone. Discussion This study underscores the essential roles of both cortical and subcortical structures in different aspects of speech decoding.
Collapse
Affiliation(s)
- Hemmings Wu
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Clinical Research Center for Neurological Disease of Zhejiang Province, Hangzhou, China
| | - Chengwei Cai
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Wenjie Ming
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Department of Neurology, Epilepsy Center, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Wangyu Chen
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhoule Zhu
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Chen Feng
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Hongjie Jiang
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhe Zheng
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Mohamad Sawan
- CenBRAIN Lab, School of Engineering, Westlake University, Hangzhou, China
| | - Ting Wang
- School of Foreign Languages, Tongji University, Shanghai, China
- Center for Speech and Language Processing, Tongji University, Shanghai, China
| | - Junming Zhu
- Department of Neurosurgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
12
|
Khanna AR, Muñoz W, Kim YJ, Kfir Y, Paulk AC, Jamali M, Cai J, Mustroph ML, Caprara I, Hardstone R, Mejdell M, Meszéna D, Zuckerman A, Schweitzer J, Cash S, Williams ZM. Single-neuronal elements of speech production in humans. Nature 2024; 626:603-610. [PMID: 38297120 PMCID: PMC10866697 DOI: 10.1038/s41586-023-06982-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 12/14/2023] [Indexed: 02/02/2024]
Abstract
Humans are capable of generating extraordinarily diverse articulatory movement combinations to produce meaningful speech. This ability to orchestrate specific phonetic sequences, and their syllabification and inflection over subsecond timescales allows us to produce thousands of word sounds and is a core component of language1,2. The fundamental cellular units and constructs by which we plan and produce words during speech, however, remain largely unknown. Here, using acute ultrahigh-density Neuropixels recordings capable of sampling across the cortical column in humans, we discover neurons in the language-dominant prefrontal cortex that encoded detailed information about the phonetic arrangement and composition of planned words during the production of natural speech. These neurons represented the specific order and structure of articulatory events before utterance and reflected the segmentation of phonetic sequences into distinct syllables. They also accurately predicted the phonetic, syllabic and morphological components of upcoming words and showed a temporally ordered dynamic. Collectively, we show how these mixtures of cells are broadly organized along the cortical column and how their activity patterns transition from articulation planning to production. We also demonstrate how these cells reliably track the detailed composition of consonant and vowel sounds during perception and how they distinguish processes specifically related to speaking from those related to listening. Together, these findings reveal a remarkably structured organization and encoding cascade of phonetic representations by prefrontal neurons in humans and demonstrate a cellular process that can support the production of speech.
Collapse
Affiliation(s)
- Arjun R Khanna
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - William Muñoz
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Yoav Kfir
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Angelique C Paulk
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Mohsen Jamali
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Jing Cai
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Martina L Mustroph
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Irene Caprara
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Richard Hardstone
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Mackenna Mejdell
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Domokos Meszéna
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Jeffrey Schweitzer
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Sydney Cash
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Ziv M Williams
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Harvard-MIT Division of Health Sciences and Technology, Boston, MA, USA.
- Harvard Medical School, Program in Neuroscience, Boston, MA, USA.
| |
Collapse
|
13
|
Tsunada J, Eliades SJ. Frontal-Auditory Cortical Interactions and Sensory Prediction During Vocal Production in Marmoset Monkeys. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.28.577656. [PMID: 38352422 PMCID: PMC10862695 DOI: 10.1101/2024.01.28.577656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
The control of speech and vocal production involves the calculation of error between the intended vocal output and the resulting auditory feedback. Consistent with this model, recent evidence has demonstrated that the auditory cortex is suppressed immediately before and during vocal production, yet is still sensitive to differences between vocal output and altered auditory feedback. This suppression has been suggested to be the result of top-down signals containing information about the intended vocal output, potentially originating from motor or other frontal cortical areas. However, whether such frontal areas are the source of suppressive and predictive signaling to the auditory cortex during vocalization is unknown. Here, we simultaneously recorded neural activity from both the auditory and frontal cortices of marmoset monkeys while they produced self-initiated vocalizations. We found increases in neural activity in both brain areas preceding the onset of vocal production, notably changes in both multi-unit activity and local field potential theta-band power. Connectivity analysis using Granger causality demonstrated that frontal cortex sends directed signaling to the auditory cortex during this pre-vocal period. Importantly, this pre-vocal activity predicted both vocalization-induced suppression of the auditory cortex as well as the acoustics of subsequent vocalizations. These results suggest that frontal cortical areas communicate with the auditory cortex preceding vocal production, with frontal-auditory signals that may reflect the transmission of sensory prediction information. This interaction between frontal and auditory cortices may contribute to mechanisms that calculate errors between intended and actual vocal outputs during vocal communication.
Collapse
Affiliation(s)
- Joji Tsunada
- Chinese Institute for Brain Research, Beijing, China
- Department of Veterinary Medicine, Faculty of Agriculture, Iwate University, Morioka, Iwate, Japan
| | - Steven J. Eliades
- Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710, USA
| |
Collapse
|