1
|
Fuhrer J, Glette K, Ivanovic J, Larsson PG, Bekinschtein T, Kochen S, Knight RT, Tørresen J, Solbakk AK, Endestad T, Blenkmann A. Direct brain recordings reveal implicit encoding of structure in random auditory streams. Sci Rep 2025; 15:14725. [PMID: 40289162 PMCID: PMC12034823 DOI: 10.1038/s41598-025-98865-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Accepted: 04/15/2025] [Indexed: 04/30/2025] Open
Abstract
The brain excels at processing sensory input, even in rich or chaotic environments. Mounting evidence attributes this to sophisticated internal models of the environment that draw on statistical structures in the unfolding sensory input. Understanding how and where such modeling proceeds is a core question in statistical learning and predictive processing. In this context, we address the role of transitional probabilities as an implicit structure supporting the encoding of the temporal structure of a random auditory stream. Leveraging information-theoretical principles and the high spatiotemporal resolution of intracranial electroencephalography, we analyzed the trial-by-trial high-frequency activity representation of transitional probabilities. This unique approach enabled us to demonstrate how the brain automatically and continuously encodes structure in random stimuli and revealed the involvement of a network outside of the auditory system, including hippocampal, frontal, and temporal regions. Our work provides a comprehensive picture of the neural correlates of automatic encoding of implicit structure that can be the crucial substrate for the swift detection of patterns and unexpected events in the environment.
Collapse
Affiliation(s)
- Julian Fuhrer
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway.
- Department of Informatics, University of Oslo, Oslo, Norway.
- Centre for Precision Psychiatry, Division of Mental Health and Addiction, University of Oslo and Oslo University Hospital, Oslo, Norway.
| | - Kyrre Glette
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Jugoslav Ivanovic
- Department of Neurosurgery, Oslo University Hospital, Rikshospitalet, Oslo, Norway
| | - Pål Gunnar Larsson
- Department of Neurosurgery, Oslo University Hospital, Rikshospitalet, Oslo, Norway
| | - Tristan Bekinschtein
- Cambridge Consciousness and Cognition Lab, Department of Psychology, University of Cambridge, Cambridge, UK
| | - Silvia Kochen
- ENyS-CONICET-Univ Jauretche, Buenos Aires, Argentina
| | - Robert T Knight
- Helen Wills Neuroscience Institute and Department of Psychology, University of California, Berkeley, USA
| | - Jim Tørresen
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Anne-Kristin Solbakk
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Psychology, University of Oslo, Oslo, Norway
- Department of Neuropsychology, Helgeland Hospital, Mosjøen, Norway
| | - Tor Endestad
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Cambridge Consciousness and Cognition Lab, Department of Psychology, University of Cambridge, Cambridge, UK
- Department of Psychology, University of Oslo, Oslo, Norway
- Department of Neuropsychology, Helgeland Hospital, Mosjøen, Norway
| | - Alejandro Blenkmann
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Psychology, University of Oslo, Oslo, Norway
| |
Collapse
|
2
|
Keshishian M, Mischler G, Thomas S, Kingsbury B, Bickel S, Mehta AD, Mesgarani N. Parallel hierarchical encoding of linguistic representations in the human auditory cortex and recurrent automatic speech recognition systems. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.30.635775. [PMID: 39975377 PMCID: PMC11838305 DOI: 10.1101/2025.01.30.635775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
The human brain's ability to transform acoustic speech signals into rich linguistic representations has inspired advancements in automatic speech recognition (ASR) systems. While ASR systems now achieve human-level performance under controlled conditions, prior research on their parallels with the brain has been limited by the use of biologically implausible models, narrow feature sets, and comparisons that primarily emphasize predictability of brain activity without fully exploring shared underlying representations. Additionally, studies comparing the brain to text-based language models overlook the acoustic stages of speech processing, an essential part in transforming sound to meaning. Leveraging high-resolution intracranial recordings and a recurrent ASR model, this study bridges these gaps by uncovering a striking correspondence in the hierarchical encoding of linguistic features, from low-level acoustic signals to high-level semantic processing. Specifically, we demonstrate that neural activity in distinct regions of the auditory cortex aligns with representations in corresponding layers of the ASR model and, crucially, that both systems encode similar features at each stage of processing-from acoustic to phonetic, lexical, and semantic information. These findings suggest that both systems, despite their distinct architectures, converge on similar strategies for language processing, providing insight in the optimal computational principles underlying linguistic representation and the shared constraints shaping human and artificial speech processing.
Collapse
Affiliation(s)
- Menoua Keshishian
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Zuckerman Institute, Columbia University, New York, NY, USA
| | - Gavin Mischler
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Zuckerman Institute, Columbia University, New York, NY, USA
| | | | | | - Stephan Bickel
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, USA
| | - Ashesh D. Mehta
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Zuckerman Institute, Columbia University, New York, NY, USA
| |
Collapse
|
3
|
Fan T, Decker W, Schneider J. The Domain-Specific Neural Basis of Auditory Statistical Learning in 5-7-Year-Old Children. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:981-1007. [PMID: 39483699 PMCID: PMC11527419 DOI: 10.1162/nol_a_00156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 08/17/2024] [Indexed: 11/03/2024]
Abstract
Statistical learning (SL) is the ability to rapidly track statistical regularities and learn patterns in the environment. Recent studies show that SL is constrained by domain-specific features, rather than being a uniform learning mechanism across domains and modalities. This domain-specificity has been reflected at the neural level, as SL occurs in regions primarily involved in processing of specific modalities or domains of input. However, our understanding of how SL is constrained by domain-specific features in the developing brain is severely lacking. The present study aims to identify the functional neural profiles of auditory SL of linguistic and nonlinguistic regularities among children. Thirty children between 5 and 7 years old completed an auditory fMRI SL task containing interwoven sequences of structured and random syllable/tone sequences. Using traditional group univariate analyses and a group-constrained subject-specific analysis, frontal and temporal cortices showed significant activation when processing structured versus random sequences across both linguistic and nonlinguistic domains. However, conjunction analyses failed to identify overlapping neural indices across domains. These findings are the first to compare brain regions supporting SL of linguistic and nonlinguistic regularities in the developing brain and indicate that auditory SL among developing children may be constrained by domain-specific features.
Collapse
Affiliation(s)
- Tengwen Fan
- Department of Communications Sciences and Disorders, Louisiana State University, Baton Rouge, LA, USA
| | - Will Decker
- Department of Communications Sciences and Disorders, Louisiana State University, Baton Rouge, LA, USA
- Department of Psychology, Georgia Tech University, Atlanta, GA, USA
| | - Julie Schneider
- Department of Communications Sciences and Disorders, Louisiana State University, Baton Rouge, LA, USA
- School of Education and Information Studies, University of California, Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
4
|
Norman-Haignere SV, Keshishian MK, Devinsky O, Doyle W, McKhann GM, Schevon CA, Flinker A, Mesgarani N. Temporal integration in human auditory cortex is predominantly yoked to absolute time, not structure duration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.23.614358. [PMID: 39386565 PMCID: PMC11463558 DOI: 10.1101/2024.09.23.614358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Sound structures such as phonemes and words have highly variable durations. Thus, there is a fundamental difference between integrating across absolute time (e.g., 100 ms) vs. sound structure (e.g., phonemes). Auditory and cognitive models have traditionally cast neural integration in terms of time and structure, respectively, but the extent to which cortical computations reflect time or structure remains unknown. To answer this question, we rescaled the duration of all speech structures using time stretching/compression and measured integration windows in the human auditory cortex using a new experimental/computational method applied to spatiotemporally precise intracranial recordings. We observed significantly longer integration windows for stretched speech, but this lengthening was very small (~5%) relative to the change in structure durations, even in non-primary regions strongly implicated in speech-specific processing. These findings demonstrate that time-yoked computations dominate throughout the human auditory cortex, placing important constraints on neurocomputational models of structure processing.
Collapse
Affiliation(s)
- Sam V Norman-Haignere
- University of Rochester Medical Center, Department of Biostatistics and Computational Biology
- University of Rochester Medical Center, Department of Neuroscience
- University of Rochester, Department of Brain and Cognitive Sciences
- University of Rochester, Department of Biomedical Engineering
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
| | - Menoua K. Keshishian
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
- Department of Electrical Engineering, Columbia University
| | - Orrin Devinsky
- Department of Neurology, NYU Langone Medical Center
- Comprehensive Epilepsy Center, NYU Langone Medical Center
| | - Werner Doyle
- Comprehensive Epilepsy Center, NYU Langone Medical Center
- Department of Neurosurgery, NYU Langone Medical Center
| | - Guy M. McKhann
- Department of Neurological Surgery, Columbia University Irving Medical Center
| | | | - Adeen Flinker
- Department of Neurology, NYU Langone Medical Center
- Comprehensive Epilepsy Center, NYU Langone Medical Center
- Department of Biomedical Engineering, NYU Tandon School of Engineering
| | - Nima Mesgarani
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
- Department of Electrical Engineering, Columbia University
| |
Collapse
|
5
|
Clarke A, Tyler LK, Marslen-Wilson W. Hearing what is being said: the distributed neural substrate for early speech interpretation. LANGUAGE, COGNITION AND NEUROSCIENCE 2024; 39:1097-1116. [PMID: 39439863 PMCID: PMC11493057 DOI: 10.1080/23273798.2024.2345308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 03/26/2024] [Indexed: 10/25/2024]
Abstract
Speech comprehension is remarkable for the immediacy with which the listener hears what is being said. Here, we focus on the neural underpinnings of this process in isolated spoken words. We analysed source-localised MEG data for nouns using Representational Similarity Analysis to probe the spatiotemporal coordinates of phonology, lexical form, and the semantics of emerging word candidates. Phonological model fit was detectable within 40-50 ms, engaging a bilateral network including superior and middle temporal cortex and extending into anterior temporal and inferior parietal regions. Lexical form emerged within 60-70 ms, and model fit to semantics from 100-110 ms. Strikingly, the majority of vertices in a central core showed model fit to all three dimensions, consistent with a distributed neural substrate for early speech analysis. The early interpretation of speech seems to be conducted in a unified integrative representational space, in conflict with conventional views of a linguistically stratified representational hierarchy.
Collapse
Affiliation(s)
- Alex Clarke
- Department of Psychology, University of Cambridge, Cambridge, UK
| | | | | |
Collapse
|
6
|
Crinnion AM, Luthra S, Gaston P, Magnuson JS. Resolving competing predictions in speech: How qualitatively different cues and cue reliability contribute to phoneme identification. Atten Percept Psychophys 2024; 86:942-961. [PMID: 38383914 PMCID: PMC11233028 DOI: 10.3758/s13414-024-02849-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/17/2024] [Indexed: 02/23/2024]
Abstract
Listeners have many sources of information available in interpreting speech. Numerous theoretical frameworks and paradigms have established that various constraints impact the processing of speech sounds, but it remains unclear how listeners might simultaneously consider multiple cues, especially those that differ qualitatively (i.e., with respect to timing and/or modality) or quantitatively (i.e., with respect to cue reliability). Here, we establish that cross-modal identity priming can influence the interpretation of ambiguous phonemes (Exp. 1, N = 40) and show that two qualitatively distinct cues - namely, cross-modal identity priming and auditory co-articulatory context - have additive effects on phoneme identification (Exp. 2, N = 40). However, we find no effect of quantitative variation in a cue - specifically, changes in the reliability of the priming cue did not influence phoneme identification (Exp. 3a, N = 40; Exp. 3b, N = 40). Overall, we find that qualitatively distinct cues can additively influence phoneme identification. While many existing theoretical frameworks address constraint integration to some degree, our results provide a step towards understanding how information that differs in both timing and modality is integrated in online speech perception.
Collapse
Affiliation(s)
| | | | | | - James S Magnuson
- University of Connecticut, Storrs, CT, USA
- BCBL. Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
- Ikerbasque. Basque Foundation for Science, Bilbao, Spain
| |
Collapse
|
7
|
Regev TI, Kim HS, Chen X, Affourtit J, Schipper AE, Bergen L, Mahowald K, Fedorenko E. High-level language brain regions process sublexical regularities. Cereb Cortex 2024; 34:bhae077. [PMID: 38494886 PMCID: PMC11486690 DOI: 10.1093/cercor/bhae077] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 02/05/2024] [Accepted: 02/07/2024] [Indexed: 03/19/2024] Open
Abstract
A network of left frontal and temporal brain regions supports language processing. This "core" language network stores our knowledge of words and constructions as well as constraints on how those combine to form sentences. However, our linguistic knowledge additionally includes information about phonemes and how they combine to form phonemic clusters, syllables, and words. Are phoneme combinatorics also represented in these language regions? Across five functional magnetic resonance imaging experiments, we investigated the sensitivity of high-level language processing brain regions to sublexical linguistic regularities by examining responses to diverse nonwords-sequences of phonemes that do not constitute real words (e.g. punes, silory, flope). We establish robust responses in the language network to visually (experiment 1a, n = 605) and auditorily (experiments 1b, n = 12, and 1c, n = 13) presented nonwords. In experiment 2 (n = 16), we find stronger responses to nonwords that are more well-formed, i.e. obey the phoneme-combinatorial constraints of English. Finally, in experiment 3 (n = 14), we provide suggestive evidence that the responses in experiments 1 and 2 are not due to the activation of real words that share some phonology with the nonwords. The results suggest that sublexical regularities are stored and processed within the same fronto-temporal network that supports lexical and syntactic processes.
Collapse
Affiliation(s)
- Tamar I Regev
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Hee So Kim
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Xuanyi Chen
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- Department of Cognitive Sciences, Rice University, Houston, TX 77005, United States
| | - Josef Affourtit
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Abigail E Schipper
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
| | - Leon Bergen
- Department of Linguistics, University of California San Diego, San Diego CA 92093, United States
| | - Kyle Mahowald
- Department of Linguistics, University of Texas at Austin, Austin, TX 78712, United States
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- The Harvard Program in Speech and Hearing Bioscience and Technology, Boston, MA 02115, United States
| |
Collapse
|
8
|
Tolkacheva V, Brownsett SLE, McMahon KL, de Zubicaray GI. Perceiving and misperceiving speech: lexical and sublexical processing in the superior temporal lobes. Cereb Cortex 2024; 34:bhae087. [PMID: 38494418 PMCID: PMC10944697 DOI: 10.1093/cercor/bhae087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/19/2024] Open
Abstract
Listeners can use prior knowledge to predict the content of noisy speech signals, enhancing perception. However, this process can also elicit misperceptions. For the first time, we employed a prime-probe paradigm and transcranial magnetic stimulation to investigate causal roles for the left and right posterior superior temporal gyri (pSTG) in the perception and misperception of degraded speech. Listeners were presented with spectrotemporally degraded probe sentences preceded by a clear prime. To produce misperceptions, we created partially mismatched pseudo-sentence probes via homophonic nonword transformations (e.g. The little girl was excited to lose her first tooth-Tha fittle girmn wam expited du roos har derst cooth). Compared to a control site (vertex), inhibitory stimulation of the left pSTG selectively disrupted priming of real but not pseudo-sentences. Conversely, inhibitory stimulation of the right pSTG enhanced priming of misperceptions with pseudo-sentences, but did not influence perception of real sentences. These results indicate qualitatively different causal roles for the left and right pSTG in perceiving degraded speech, supporting bilateral models that propose engagement of the right pSTG in sublexical processing.
Collapse
Affiliation(s)
- Valeriya Tolkacheva
- Queensland University of Technology, School of Psychology and Counselling, O Block, Kelvin Grove, Queensland, 4059, Australia
| | - Sonia L E Brownsett
- Queensland Aphasia Research Centre, School of Health and Rehabilitation Sciences, University of Queensland, Surgical Treatment and Rehabilitation Services, Herston, Queensland, 4006, Australia
- Centre of Research Excellence in Aphasia Recovery and Rehabilitation, La Trobe University, Melbourne, Health Sciences Building 1, 1 Kingsbury Drive, Bundoora, Victoria, 3086, Australia
| | - Katie L McMahon
- Herston Imaging Research Facility, Royal Brisbane & Women’s Hospital, Building 71/918, Royal Brisbane & Women’s Hospital, Herston, Queensland, 4006, Australia
- Queensland University of Technology, School of Clinical Sciences and Centre for Biomedical Technologies, 60 Musk Avenue, Kelvin Grove, Queensland, 4059, Australia
| | - Greig I de Zubicaray
- Queensland University of Technology, School of Psychology and Counselling, O Block, Kelvin Grove, Queensland, 4059, Australia
| |
Collapse
|
9
|
Sankaran N, Leonard MK, Theunissen F, Chang EF. Encoding of melody in the human auditory cortex. SCIENCE ADVANCES 2024; 10:eadk0010. [PMID: 38363839 PMCID: PMC10871532 DOI: 10.1126/sciadv.adk0010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 01/17/2024] [Indexed: 02/18/2024]
Abstract
Melody is a core component of music in which discrete pitches are serially arranged to convey emotion and meaning. Perception varies along several pitch-based dimensions: (i) the absolute pitch of notes, (ii) the difference in pitch between successive notes, and (iii) the statistical expectation of each note given prior context. How the brain represents these dimensions and whether their encoding is specialized for music remains unknown. We recorded high-density neurophysiological activity directly from the human auditory cortex while participants listened to Western musical phrases. Pitch, pitch-change, and expectation were selectively encoded at different cortical sites, indicating a spatial map for representing distinct melodic dimensions. The same participants listened to spoken English, and we compared responses to music and speech. Cortical sites selective for music encoded expectation, while sites that encoded pitch and pitch-change in music used the same neural code to represent equivalent properties of speech. Findings reveal how the perception of melody recruits both music-specific and general-purpose sound representations.
Collapse
Affiliation(s)
- Narayan Sankaran
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Matthew K. Leonard
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Frederic Theunissen
- Department of Psychology, University of California, Berkeley, 2121 Berkeley Way, Berkeley, CA 94720, USA
| | - Edward F. Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| |
Collapse
|
10
|
Leonard MK, Gwilliams L, Sellers KK, Chung JE, Xu D, Mischler G, Mesgarani N, Welkenhuysen M, Dutta B, Chang EF. Large-scale single-neuron speech sound encoding across the depth of human cortex. Nature 2024; 626:593-602. [PMID: 38093008 PMCID: PMC10866713 DOI: 10.1038/s41586-023-06839-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 11/06/2023] [Indexed: 01/31/2024]
Abstract
Understanding the neural basis of speech perception requires that we study the human brain both at the scale of the fundamental computational unit of neurons and in their organization across the depth of cortex. Here we used high-density Neuropixels arrays1-3 to record from 685 neurons across cortical layers at nine sites in a high-level auditory region that is critical for speech, the superior temporal gyrus4,5, while participants listened to spoken sentences. Single neurons encoded a wide range of speech sound cues, including features of consonants and vowels, relative vocal pitch, onsets, amplitude envelope and sequence statistics. Neurons at each cross-laminar recording exhibited dominant tuning to a primary speech feature while also containing a substantial proportion of neurons that encoded other features contributing to heterogeneous selectivity. Spatially, neurons at similar cortical depths tended to encode similar speech features. Activity across all cortical layers was predictive of high-frequency field potentials (electrocorticography), providing a neuronal origin for macroelectrode recordings from the cortical surface. Together, these results establish single-neuron tuning across the cortical laminae as an important dimension of speech encoding in human superior temporal gyrus.
Collapse
Affiliation(s)
- Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Laura Gwilliams
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Kristin K Sellers
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Jason E Chung
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Duo Xu
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Gavin Mischler
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - Nima Mesgarani
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | | | | | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
11
|
Li Y, Anumanchipalli GK, Mohamed A, Chen P, Carney LH, Lu J, Wu J, Chang EF. Dissecting neural computations in the human auditory pathway using deep neural networks for speech. Nat Neurosci 2023; 26:2213-2225. [PMID: 37904043 PMCID: PMC10689246 DOI: 10.1038/s41593-023-01468-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/13/2023] [Indexed: 11/01/2023]
Abstract
The human auditory system extracts rich linguistic abstractions from speech signals. Traditional approaches to understanding this complex process have used linear feature-encoding models, with limited success. Artificial neural networks excel in speech recognition tasks and offer promising computational models of speech processing. We used speech representations in state-of-the-art deep neural network (DNN) models to investigate neural coding from the auditory nerve to the speech cortex. Representations in hierarchical layers of the DNN correlated well with the neural activity throughout the ascending auditory system. Unsupervised speech models performed at least as well as other purely supervised or fine-tuned models. Deeper DNN layers were better correlated with the neural activity in the higher-order auditory cortex, with computations aligned with phonemic and syllabic structures in speech. Accordingly, DNN models trained on either English or Mandarin predicted cortical responses in native speakers of each language. These results reveal convergence between DNN model representations and the biological auditory pathway, offering new approaches for modeling neural coding in the auditory cortex.
Collapse
Affiliation(s)
- Yuanning Li
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Gopala K Anumanchipalli
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
- Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | | | - Peili Chen
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materialsand Devices, ShanghaiTech University, Shanghai, China
| | - Laurel H Carney
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
| | - Junfeng Lu
- Neurologic Surgery Department, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China
- Brain Function Laboratory, Neurosurgical Institute, Fudan University, Shanghai, China
| | - Jinsong Wu
- Neurologic Surgery Department, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China
- Brain Function Laboratory, Neurosurgical Institute, Fudan University, Shanghai, China
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
12
|
Sankaran N, Leonard MK, Theunissen F, Chang EF. Encoding of melody in the human auditory cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.17.562771. [PMID: 37905047 PMCID: PMC10614915 DOI: 10.1101/2023.10.17.562771] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Melody is a core component of music in which discrete pitches are serially arranged to convey emotion and meaning. Perception of melody varies along several pitch-based dimensions: (1) the absolute pitch of notes, (2) the difference in pitch between successive notes, and (3) the higher-order statistical expectation of each note conditioned on its prior context. While humans readily perceive melody, how these dimensions are collectively represented in the brain and whether their encoding is specialized for music remains unknown. Here, we recorded high-density neurophysiological activity directly from the surface of human auditory cortex while Western participants listened to Western musical phrases. Pitch, pitch-change, and expectation were selectively encoded at different cortical sites, indicating a spatial code for representing distinct dimensions of melody. The same participants listened to spoken English, and we compared evoked responses to music and speech. Cortical sites selective for music were systematically driven by the encoding of expectation. In contrast, sites that encoded pitch and pitch-change used the same neural code to represent equivalent properties of speech. These findings reveal the multidimensional nature of melody encoding, consisting of both music-specific and domain-general sound representations in auditory cortex. Teaser The human brain contains both general-purpose and music-specific neural populations for processing distinct attributes of melody.
Collapse
|
13
|
Wang J, Wang X, Zou J, Duan J, Shen Z, Xu N, Chen Y, Zhang J, He H, Bi Y, Ding N. Neural substrate underlying the learning of a passage with unfamiliar vocabulary and syntax. Cereb Cortex 2023; 33:10036-10046. [PMID: 37491998 DOI: 10.1093/cercor/bhad263] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 06/26/2023] [Accepted: 06/27/2023] [Indexed: 07/27/2023] Open
Abstract
Speech comprehension is a complex process involving multiple stages, such as decoding of phonetic units, recognizing words, and understanding sentences and passages. In this study, we identify cortical networks beyond basic phonetic processing using a novel passage learning paradigm. Participants learn to comprehend a story composed of syllables of their native language, but containing unfamiliar vocabulary and syntax. Three learning methods are employed, each resulting in some degree of learning within a 12-min learning session. Functional magnetic resonance imaging results reveal that, when listening to the same story, the classic temporal-frontal language network is significantly enhanced by learning. Critically, activation of the left anterior and posterior temporal lobe correlates with the learning outcome that is assessed behaviorally through, e.g. word recognition and passage comprehension tests. This study demonstrates that a brief learning session is sufficient to induce neural plasticity in the left temporal lobe, which underlies the transformation from phonetic units to the units of meaning, such as words and sentences.
Collapse
Affiliation(s)
- Jing Wang
- Key Laboratory for Biomedical Engineering of Ministry of Education, Center for Brain Imaging Science and Technology, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Xiaosha Wang
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Jiajie Zou
- Key Laboratory for Biomedical Engineering of Ministry of Education, Center for Brain Imaging Science and Technology, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Jipeng Duan
- Key Laboratory for Biomedical Engineering of Ministry of Education, Center for Brain Imaging Science and Technology, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Zhuowen Shen
- Key Laboratory for Biomedical Engineering of Ministry of Education, Center for Brain Imaging Science and Technology, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Nannan Xu
- School of Linguistic Sciences and Arts, Jiangsu Normal University, Xuzhou 221009, China
| | - Yan Chen
- Key Laboratory for Biomedical Engineering of Ministry of Education, Center for Brain Imaging Science and Technology, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Jianfeng Zhang
- Key Laboratory for Biomedical Engineering of Ministry of Education, Center for Brain Imaging Science and Technology, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Hongjian He
- Key Laboratory for Biomedical Engineering of Ministry of Education, Center for Brain Imaging Science and Technology, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Yanchao Bi
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, Center for Brain Imaging Science and Technology, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
- MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
14
|
Gong XL, Huth AG, Deniz F, Johnson K, Gallant JL, Theunissen FE. Phonemic segmentation of narrative speech in human cerebral cortex. Nat Commun 2023; 14:4309. [PMID: 37463907 PMCID: PMC10354060 DOI: 10.1038/s41467-023-39872-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 06/29/2023] [Indexed: 07/20/2023] Open
Abstract
Speech processing requires extracting meaning from acoustic patterns using a set of intermediate representations based on a dynamic segmentation of the speech stream. Using whole brain mapping obtained in fMRI, we investigate the locus of cortical phonemic processing not only for single phonemes but also for short combinations made of diphones and triphones. We find that phonemic processing areas are much larger than previously described: they include not only the classical areas in the dorsal superior temporal gyrus but also a larger region in the lateral temporal cortex where diphone features are best represented. These identified phonemic regions overlap with the lexical retrieval region, but we show that short word retrieval is not sufficient to explain the observed responses to diphones. Behavioral studies have shown that phonemic processing and lexical retrieval are intertwined. Here, we also have identified candidate regions within the speech cortical network where this joint processing occurs.
Collapse
Affiliation(s)
- Xue L Gong
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, 94720, CA, USA.
| | - Alexander G Huth
- Departments of Neuroscience and Computer Science, University of Texas, Austin, Austin, 78712, TX, USA
| | - Fatma Deniz
- Faculty of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, 10587, Berlin, Germany
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, Berkeley, 94720, CA, USA
| | - Jack L Gallant
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, 94720, CA, USA
- Department of Psychology, University of California, Berkeley, Berkeley, 94720, CA, USA
| | - Frédéric E Theunissen
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, 94720, CA, USA.
- Department of Psychology, University of California, Berkeley, Berkeley, 94720, CA, USA.
- Department of Integrative Biology, University of California, Berkeley, Berkeley, 94720, CA, USA.
| |
Collapse
|
15
|
Raghavan VS, O’Sullivan J, Bickel S, Mehta AD, Mesgarani N. Distinct neural encoding of glimpsed and masked speech in multitalker situations. PLoS Biol 2023; 21:e3002128. [PMID: 37279203 PMCID: PMC10243639 DOI: 10.1371/journal.pbio.3002128] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 04/19/2023] [Indexed: 06/08/2023] Open
Abstract
Humans can easily tune in to one talker in a multitalker environment while still picking up bits of background speech; however, it remains unclear how we perceive speech that is masked and to what degree non-target speech is processed. Some models suggest that perception can be achieved through glimpses, which are spectrotemporal regions where a talker has more energy than the background. Other models, however, require the recovery of the masked regions. To clarify this issue, we directly recorded from primary and non-primary auditory cortex (AC) in neurosurgical patients as they attended to one talker in multitalker speech and trained temporal response function models to predict high-gamma neural activity from glimpsed and masked stimulus features. We found that glimpsed speech is encoded at the level of phonetic features for target and non-target talkers, with enhanced encoding of target speech in non-primary AC. In contrast, encoding of masked phonetic features was found only for the target, with a greater response latency and distinct anatomical organization compared to glimpsed phonetic features. These findings suggest separate mechanisms for encoding glimpsed and masked speech and provide neural evidence for the glimpsing model of speech perception.
Collapse
Affiliation(s)
- Vinay S Raghavan
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| | - James O’Sullivan
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| | - Stephan Bickel
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, New York, United States of America
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
- Department of Neurology, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Ashesh D. Mehta
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, New York, United States of America
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| |
Collapse
|
16
|
Keshishian M, Akkol S, Herrero J, Bickel S, Mehta AD, Mesgarani N. Joint, distributed and hierarchically organized encoding of linguistic features in the human auditory cortex. Nat Hum Behav 2023; 7:740-753. [PMID: 36864134 PMCID: PMC10417567 DOI: 10.1038/s41562-023-01520-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 01/05/2023] [Indexed: 03/04/2023]
Abstract
The precise role of the human auditory cortex in representing speech sounds and transforming them to meaning is not yet fully understood. Here we used intracranial recordings from the auditory cortex of neurosurgical patients as they listened to natural speech. We found an explicit, temporally ordered and anatomically distributed neural encoding of multiple linguistic features, including phonetic, prelexical phonotactics, word frequency, and lexical-phonological and lexical-semantic information. Grouping neural sites on the basis of their encoded linguistic features revealed a hierarchical pattern, with distinct representations of prelexical and postlexical features distributed across various auditory areas. While sites with longer response latencies and greater distance from the primary auditory cortex encoded higher-level linguistic features, the encoding of lower-level features was preserved and not discarded. Our study reveals a cumulative mapping of sound to meaning and provides empirical evidence for validating neurolinguistic and psycholinguistic models of spoken word recognition that preserve the acoustic variations in speech.
Collapse
Affiliation(s)
- Menoua Keshishian
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Serdar Akkol
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | - Jose Herrero
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
- Department of Neurosurgery, Hofstra-Northwell School of Medicine, Manhasset, NY, USA
| | - Stephan Bickel
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
- Department of Neurosurgery, Hofstra-Northwell School of Medicine, Manhasset, NY, USA
| | - Ashesh D Mehta
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
- Department of Neurosurgery, Hofstra-Northwell School of Medicine, Manhasset, NY, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, USA.
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA.
| |
Collapse
|
17
|
Asabuki T, Kokate P, Fukai T. Neural circuit mechanisms of hierarchical sequence learning tested on large-scale recording data. PLoS Comput Biol 2022; 18:e1010214. [PMID: 35727828 PMCID: PMC9249189 DOI: 10.1371/journal.pcbi.1010214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 07/01/2022] [Accepted: 05/16/2022] [Indexed: 11/24/2022] Open
Abstract
The brain performs various cognitive functions by learning the spatiotemporal salient features of the environment. This learning requires unsupervised segmentation of hierarchically organized spike sequences, but the underlying neural mechanism is only poorly understood. Here, we show that a recurrent gated network of neurons with dendrites can efficiently solve difficult segmentation tasks. In this model, multiplicative recurrent connections learn a context-dependent gating of dendro-somatic information transfers to minimize error in the prediction of somatic responses by the dendrites. Consequently, these connections filter the redundant input features represented by the dendrites but unnecessary in the given context. The model was tested on both synthetic and real neural data. In particular, the model was successful for segmenting multiple cell assemblies repeating in large-scale calcium imaging data containing thousands of cortical neurons. Our results suggest that recurrent gating of dendro-somatic signal transfers is crucial for cortical learning of context-dependent segmentation tasks.
Collapse
Affiliation(s)
- Toshitake Asabuki
- Neural Coding and Brain Computing Unit, Okinawa Institute of Science and Technology, Onna-son, Okinawa, Japan
| | - Prajakta Kokate
- Neural Coding and Brain Computing Unit, Okinawa Institute of Science and Technology, Onna-son, Okinawa, Japan
| | - Tomoki Fukai
- Neural Coding and Brain Computing Unit, Okinawa Institute of Science and Technology, Onna-son, Okinawa, Japan
| |
Collapse
|
18
|
Norman-Haignere SV, Long LK, Devinsky O, Doyle W, Irobunda I, Merricks EM, Feldstein NA, McKhann GM, Schevon CA, Flinker A, Mesgarani N. Multiscale temporal integration organizes hierarchical computation in human auditory cortex. Nat Hum Behav 2022; 6:455-469. [PMID: 35145280 PMCID: PMC8957490 DOI: 10.1038/s41562-021-01261-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 11/18/2021] [Indexed: 01/11/2023]
Abstract
To derive meaning from sound, the brain must integrate information across many timescales. What computations underlie multiscale integration in human auditory cortex? Evidence suggests that auditory cortex analyses sound using both generic acoustic representations (for example, spectrotemporal modulation tuning) and category-specific computations, but the timescales over which these putatively distinct computations integrate remain unclear. To answer this question, we developed a general method to estimate sensory integration windows-the time window when stimuli alter the neural response-and applied our method to intracranial recordings from neurosurgical patients. We show that human auditory cortex integrates hierarchically across diverse timescales spanning from ~50 to 400 ms. Moreover, we find that neural populations with short and long integration windows exhibit distinct functional properties: short-integration electrodes (less than ~200 ms) show prominent spectrotemporal modulation selectivity, while long-integration electrodes (greater than ~200 ms) show prominent category selectivity. These findings reveal how multiscale integration organizes auditory computation in the human brain.
Collapse
Affiliation(s)
- Sam V Norman-Haignere
- Zuckerman Mind, Brain, Behavior Institute, Columbia University,HHMI Postdoctoral Fellow of the Life Sciences Research Foundation
| | - Laura K. Long
- Zuckerman Mind, Brain, Behavior Institute, Columbia University,Doctoral Program in Neurobiology and Behavior, Columbia University
| | - Orrin Devinsky
- Department of Neurology, NYU Langone Medical Center,Comprehensive Epilepsy Center, NYU Langone Medical Center
| | - Werner Doyle
- Comprehensive Epilepsy Center, NYU Langone Medical Center,Department of Neurosurgery, NYU Langone Medical Center
| | - Ifeoma Irobunda
- Department of Neurology, Columbia University Irving Medical Center
| | | | - Neil A. Feldstein
- Department of Neurological Surgery, Columbia University Irving Medical Center
| | - Guy M. McKhann
- Department of Neurological Surgery, Columbia University Irving Medical Center
| | | | - Adeen Flinker
- Department of Neurology, NYU Langone Medical Center,Comprehensive Epilepsy Center, NYU Langone Medical Center,Department of Biomedical Engineering, NYU Tandon School of Engineering
| | - Nima Mesgarani
- Zuckerman Mind, Brain, Behavior Institute, Columbia University,Doctoral Program in Neurobiology and Behavior, Columbia University,Department of Electrical Engineering, Columbia University
| |
Collapse
|
19
|
Kaestner E, Wu X, Friedman D, Dugan P, Devinsky O, Carlson C, Doyle W, Thesen T, Halgren E. The Precentral Gyrus Contributions to the Early Time-Course of Grapheme-to-Phoneme Conversion. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2022; 3:18-45. [PMID: 37215328 PMCID: PMC10158576 DOI: 10.1162/nol_a_00047] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 06/16/2021] [Indexed: 05/24/2023]
Abstract
As part of silent reading models, visual orthographic information is transduced into an auditory phonological code in a process of grapheme-to-phoneme conversion (GPC). This process is often identified with lateral temporal-parietal regions associated with auditory phoneme encoding. However, the role of articulatory phonemic representations and the precentral gyrus in GPC is ambiguous. Though the precentral gyrus is implicated in many functional MRI studies of reading, it is not clear if the time course of activity in this region is consistent with the precentral gyrus being involved in GPC. We recorded cortical electrophysiology during a bimodal match/mismatch task from eight patients with perisylvian subdural electrodes to examine the time course of neural activity during a task that necessitated GPC. Patients made a match/mismatch decision between a 3-letter string and the following auditory bi-phoneme. We characterized the distribution and timing of evoked broadband high gamma (70-170 Hz) as well as phase-locking between electrodes. The precentral gyrus emerged with a high concentration of broadband high gamma responses to visual and auditory language as well as mismatch effects. The pars opercularis, supramarginal gyrus, and superior temporal gyrus were also involved. The precentral gyrus showed strong phase-locking with the caudal fusiform gyrus during letter-string presentation and with surrounding perisylvian cortex during the bimodal visual-auditory comparison period. These findings hint at a role for precentral cortex in transducing visual into auditory codes during silent reading.
Collapse
Affiliation(s)
- Erik Kaestner
- Center for Multimodal Imaging and Genetics, University of California, San Diego, USA
| | - Xiaojing Wu
- Department of Neurology, NYU Langone School of Medicine, New York, USA
| | - Daniel Friedman
- Department of Neurology, NYU Langone School of Medicine, New York, USA
| | - Patricia Dugan
- Department of Neurology, NYU Langone School of Medicine, New York, USA
| | - Orrin Devinsky
- Department of Neurology, NYU Langone School of Medicine, New York, USA
| | - Chad Carlson
- Department of Neurology, Medical College of Wisconsin, Milwaukee, USA
| | - Werner Doyle
- Department of Neurology, NYU Langone School of Medicine, New York, USA
- Department of Neurosurgery, NYU Langone School of Medicine, New York, USA
| | - Thomas Thesen
- Department of Neurology, NYU Langone School of Medicine, New York, USA
| | - Eric Halgren
- Department of Neurosciences, University of California at San Diego, La Jolla, USA
- Department of Radiology, University of California at San Diego, La Jolla, USA
| |
Collapse
|
20
|
Abstract
Human speech perception results from neural computations that transform external acoustic speech signals into internal representations of words. The superior temporal gyrus (STG) contains the nonprimary auditory cortex and is a critical locus for phonological processing. Here, we describe how speech sound representation in the STG relies on fundamentally nonlinear and dynamical processes, such as categorization, normalization, contextual restoration, and the extraction of temporal structure. A spatial mosaic of local cortical sites on the STG exhibits complex auditory encoding for distinct acoustic-phonetic and prosodic features. We propose that as a population ensemble, these distributed patterns of neural activity give rise to abstract, higher-order phonemic and syllabic representations that support speech perception. This review presents a multi-scale, recurrent model of phonological processing in the STG, highlighting the critical interface between auditory and language systems.
Collapse
Affiliation(s)
- Ilina Bhaya-Grossman
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA;
- Joint Graduate Program in Bioengineering, University of California, Berkeley and San Francisco, California 94720, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA;
| |
Collapse
|
21
|
Landemard A, Bimbard C, Demené C, Shamma S, Norman-Haignere S, Boubenec Y. Distinct higher-order representations of natural sounds in human and ferret auditory cortex. eLife 2021; 10:e65566. [PMID: 34792467 PMCID: PMC8601661 DOI: 10.7554/elife.65566] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 10/22/2021] [Indexed: 11/29/2022] Open
Abstract
Little is known about how neural representations of natural sounds differ across species. For example, speech and music play a unique role in human hearing, yet it is unclear how auditory representations of speech and music differ between humans and other animals. Using functional ultrasound imaging, we measured responses in ferrets to a set of natural and spectrotemporally matched synthetic sounds previously tested in humans. Ferrets showed similar lower-level frequency and modulation tuning to that observed in humans. But while humans showed substantially larger responses to natural vs. synthetic speech and music in non-primary regions, ferret responses to natural and synthetic sounds were closely matched throughout primary and non-primary auditory cortex, even when tested with ferret vocalizations. This finding reveals that auditory representations in humans and ferrets diverge sharply at late stages of cortical processing, potentially driven by higher-order processing demands in speech and music.
Collapse
Affiliation(s)
- Agnès Landemard
- Laboratoire des Systèmes Perceptifs, Département d’Études Cognitives, École Normale Supérieure PSL Research University, CNRSParisFrance
| | - Célian Bimbard
- Laboratoire des Systèmes Perceptifs, Département d’Études Cognitives, École Normale Supérieure PSL Research University, CNRSParisFrance
- University College LondonLondonUnited Kingdom
| | - Charlie Demené
- Physics for Medicine Paris, Inserm, ESPCI Paris, PSL Research University, CNRSParisFrance
| | - Shihab Shamma
- Laboratoire des Systèmes Perceptifs, Département d’Études Cognitives, École Normale Supérieure PSL Research University, CNRSParisFrance
- Institute for Systems Research, Department of Electrical and Computer Engineering, University of MarylandCollege ParkUnited States
| | - Sam Norman-Haignere
- Laboratoire des Systèmes Perceptifs, Département d’Études Cognitives, École Normale Supérieure PSL Research University, CNRSParisFrance
- HHMI Postdoctoral Fellow of the Life Sciences Research FoundationBaltimoreUnited States
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
| | - Yves Boubenec
- Laboratoire des Systèmes Perceptifs, Département d’Études Cognitives, École Normale Supérieure PSL Research University, CNRSParisFrance
| |
Collapse
|
22
|
Yang Y, Ahmadipour P, Shanechi MM. Adaptive latent state modeling of brain network dynamics with real-time learning rate optimization. J Neural Eng 2021; 18. [PMID: 33254159 DOI: 10.1088/1741-2552/abcefd] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 11/30/2020] [Indexed: 12/29/2022]
Abstract
Objective. Dynamic latent state models are widely used to characterize the dynamics of brain network activity for various neural signal types. To date, dynamic latent state models have largely been developed for stationary brain network dynamics. However, brain network dynamics can be non-stationary for example due to learning, plasticity or recording instability. To enable modeling these non-stationarities, two problems need to be resolved. First, novel methods should be developed that can adaptively update the parameters of latent state models, which is difficult due to the state being latent. Second, new methods are needed to optimize the adaptation learning rate, which specifies how fast new neural observations update the model parameters and can significantly influence adaptation accuracy.Approach. We develop a Rate Optimized-adaptive Linear State-Space Modeling (RO-adaptive LSSM) algorithm that solves these two problems. First, to enable adaptation, we derive a computation- and memory-efficient adaptive LSSM fitting algorithm that updates the LSSM parameters recursively and in real time in the presence of the latent state. Second, we develop a real-time learning rate optimization algorithm. We use comprehensive simulations of a broad range of non-stationary brain network dynamics to validate both algorithms, which together constitute the RO-adaptive LSSM.Main results. We show that the adaptive LSSM fitting algorithm can accurately track the broad simulated non-stationary brain network dynamics. We also find that the learning rate significantly affects the LSSM fitting accuracy. Finally, we show that the real-time learning rate optimization algorithm can run in parallel with the adaptive LSSM fitting algorithm. Doing so, the combined RO-adaptive LSSM algorithm rapidly converges to the optimal learning rate and accurately tracks non-stationarities.Significance. These algorithms can be used to study time-varying neural dynamics underlying various brain functions and enhance future neurotechnologies such as brain-machine interfaces and closed-loop brain stimulation systems.
Collapse
Affiliation(s)
- Yuxiao Yang
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States of America.,These authors contributed equally to this work
| | - Parima Ahmadipour
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States of America.,These authors contributed equally to this work
| | - Maryam M Shanechi
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States of America.,Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, United States of America
| |
Collapse
|
23
|
Adaptation to mis-pronounced speech: evidence for a prefrontal-cortex repair mechanism. Sci Rep 2021; 11:97. [PMID: 33420193 PMCID: PMC7794353 DOI: 10.1038/s41598-020-79640-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 11/23/2020] [Indexed: 11/30/2022] Open
Abstract
Speech is a complex and ambiguous acoustic signal that varies significantly within and across speakers. Despite the processing challenge that such variability poses, humans adapt to systematic variations in pronunciation rapidly. The goal of this study is to uncover the neurobiological bases of the attunement process that enables such fluent comprehension. Twenty-four native English participants listened to words spoken by a “canonical” American speaker and two non-canonical speakers, and performed a word-picture matching task, while magnetoencephalography was recorded. Non-canonical speech was created by including systematic phonological substitutions within the word (e.g. [s] → [sh]). Activity in the auditory cortex (superior temporal gyrus) was greater in response to substituted phonemes, and, critically, this was not attenuated by exposure. By contrast, prefrontal regions showed an interaction between the presence of a substitution and the amount of exposure: activity decreased for canonical speech over time, whereas responses to non-canonical speech remained consistently elevated. Grainger causality analyses further revealed that prefrontal responses serve to modulate activity in auditory regions, suggesting the recruitment of top-down processing to decode non-canonical pronunciations. In sum, our results suggest that the behavioural deficit in processing mispronounced phonemes may be due to a disruption to the typical exchange of information between the prefrontal and auditory cortices as observed for canonical speech.
Collapse
|
24
|
Koskinen M, Kurimo M, Gross J, Hyvärinen A, Hari R. Brain activity reflects the predictability of word sequences in listened continuous speech. Neuroimage 2020; 219:116936. [PMID: 32474080 DOI: 10.1016/j.neuroimage.2020.116936] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 04/24/2020] [Accepted: 05/07/2020] [Indexed: 11/17/2022] Open
Abstract
Natural speech builds on contextual relations that can prompt predictions of upcoming utterances. To study the neural underpinnings of such predictive processing we asked 10 healthy adults to listen to a 1-h-long audiobook while their magnetoencephalographic (MEG) brain activity was recorded. We correlated the MEG signals with acoustic speech envelope, as well as with estimates of Bayesian word probability with and without the contextual word sequence (N-gram and Unigram, respectively), with a focus on time-lags. The MEG signals of auditory and sensorimotor cortices were strongly coupled to the speech envelope at the rates of syllables (4-8 Hz) and of prosody and intonation (0.5-2 Hz). The probability structure of word sequences, independently of the acoustical features, affected the ≤ 2-Hz signals extensively in auditory and rolandic regions, in precuneus, occipital cortices, and lateral and medial frontal regions. Fine-grained temporal progression patterns occurred across brain regions 100-1000 ms after word onsets. Although the acoustic effects were observed in both hemispheres, the contextual influences were statistically significantly lateralized to the left hemisphere. These results serve as a brain signature of the predictability of word sequences in listened continuous speech, confirming and extending previous results to demonstrate that deeply-learned knowledge and recent contextual information are employed dynamically and in a left-hemisphere-dominant manner in predicting the forthcoming words in natural speech.
Collapse
Affiliation(s)
- Miika Koskinen
- Medicum, Faculty of Medicine, P.O. Box 63, FI-00014, University of Helsinki, Finland; Department of Neuroscience and Biomedical Engineering, P.O. Box 12200, FI-00076, Aalto University, Finland; Institute of Neuroscience and Psychology, University of Glasgow, 58 Hillhead Street, Glasgow, G12 8QB, UK; MEG Core, Aalto NeuroImaging, FI-00076, Aalto University, Finland.
| | - Mikko Kurimo
- Department of Signal Processing and Acoustics, P.O. Box 13000, FI-00076, Aalto University, Finland
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, 58 Hillhead Street, Glasgow, G12 8QB, UK; Institute for Biomagnetism and Biosignalanalysis, University of Muenster, 48149, Muenster, Germany
| | - Aapo Hyvärinen
- Department of Computer Science, P.O. Box 68, FI-00014, University of Helsinki, Finland
| | - Riitta Hari
- Department of Neuroscience and Biomedical Engineering, P.O. Box 12200, FI-00076, Aalto University, Finland; Department of Art, P.O. Box 31000, FI-00076, Aalto University, Finland
| |
Collapse
|
25
|
Statistical learning for vocal sequence acquisition in a songbird. Sci Rep 2020; 10:2248. [PMID: 32041978 PMCID: PMC7010765 DOI: 10.1038/s41598-020-58983-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Accepted: 01/17/2020] [Indexed: 01/31/2023] Open
Abstract
Birdsong is a learned communicative behavior that consists of discrete acoustic elements (“syllables”) that are sequenced in a controlled manner. While the learning of the acoustic structure of syllables has been extensively studied, relatively little is known about sequence learning in songbirds. Statistical learning could contribute to the acquisition of vocal sequences, and we investigated the nature and extent of sequence learning at various levels of song organization in the Bengalese finch, Lonchura striata var. domestica. We found that, under semi-natural conditions, pupils (sons) significantly reproduced the sequence statistics of their tutor’s (father’s) songs at multiple levels of organization (e.g., syllable repertoire, prevalence, and transitions). For example, the probability of syllable transitions at “branch points” (relatively complex sequences that are followed by multiple types of transitions) were significantly correlated between the songs of tutors and pupils. We confirmed the contribution of learning to sequence similarities between fathers and sons by experimentally tutoring juvenile Bengalese finches with the songs of unrelated tutors. We also discovered that the extent and fidelity of sequence similarities between tutors and pupils were significantly predicted by the prevalence of sequences in the tutor’s song and that distinct types of sequence modifications (e.g., syllable additions or deletions) followed distinct patterns. Taken together, these data provide compelling support for the role of statistical learning in vocal production learning and identify factors that could modulate the extent of vocal sequence learning.
Collapse
|
26
|
Neophytou D, Oviedo HV. Using Neural Circuit Interrogation in Rodents to Unravel Human Speech Decoding. Front Neural Circuits 2020; 14:2. [PMID: 32116569 PMCID: PMC7009302 DOI: 10.3389/fncir.2020.00002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 01/09/2020] [Indexed: 01/21/2023] Open
Abstract
The neural circuits responsible for social communication are among the least understood in the brain. Human studies have made great progress in advancing our understanding of the global computations required for processing speech, and animal models offer the opportunity to discover evolutionarily conserved mechanisms for decoding these signals. In this review article, we describe some of the most well-established speech decoding computations from human studies and describe animal research designed to reveal potential circuit mechanisms underlying these processes. Human and animal brains must perform the challenging tasks of rapidly recognizing, categorizing, and assigning communicative importance to sounds in a noisy environment. The instructions to these functions are found in the precise connections neurons make with one another. Therefore, identifying circuit-motifs in the auditory cortices and linking them to communicative functions is pivotal. We review recent advances in human recordings that have revealed the most basic unit of speech decoded by neurons is a phoneme, and consider circuit-mapping studies in rodents that have shown potential connectivity schemes to achieve this. Finally, we discuss other potentially important processing features in humans like lateralization, sensitivity to fine temporal features, and hierarchical processing. The goal is for animal studies to investigate neurophysiological and anatomical pathways responsible for establishing behavioral phenotypes that are shared between humans and animals. This can be accomplished by establishing cell types, connectivity patterns, genetic pathways and critical periods that are relevant in the development and function of social communication.
Collapse
Affiliation(s)
- Demetrios Neophytou
- Biology Department, The City College of New York, New York, NY, United States
| | - Hysell V Oviedo
- Biology Department, The City College of New York, New York, NY, United States.,CUNY Graduate Center, New York, NY, United States
| |
Collapse
|
27
|
O'Sullivan J, Herrero J, Smith E, Schevon C, McKhann GM, Sheth SA, Mehta AD, Mesgarani N. Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception. Neuron 2019; 104:1195-1209.e3. [PMID: 31648900 DOI: 10.1016/j.neuron.2019.09.007] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 07/11/2019] [Accepted: 09/06/2019] [Indexed: 11/15/2022]
Abstract
Humans can easily focus on one speaker in a multi-talker acoustic environment, but how different areas of the human auditory cortex (AC) represent the acoustic components of mixed speech is unknown. We obtained invasive recordings from the primary and nonprimary AC in neurosurgical patients as they listened to multi-talker speech. We found that neural sites in the primary AC responded to individual speakers in the mixture and were relatively unchanged by attention. In contrast, neural sites in the nonprimary AC were less discerning of individual speakers but selectively represented the attended speaker. Moreover, the encoding of the attended speaker in the nonprimary AC was invariant to the degree of acoustic overlap with the unattended speaker. Finally, this emergent representation of attended speech in the nonprimary AC was linearly predictable from the primary AC responses. Our results reveal the neural computations underlying the hierarchical formation of auditory objects in human AC during multi-talker speech perception.
Collapse
Affiliation(s)
- James O'Sullivan
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - Jose Herrero
- Department of Neurosurgery, Hofstra-Northwell School of Medicine and Feinstein Institute for Medical Research, Manhasset, New York, NY, USA
| | - Elliot Smith
- Department of Neurological Surgery, The Neurological Institute, New York, NY, USA; Department of Neurosurgery, University of Utah, Salt Lake City, UT, USA
| | - Catherine Schevon
- Department of Neurological Surgery, The Neurological Institute, New York, NY, USA
| | - Guy M McKhann
- Department of Neurological Surgery, The Neurological Institute, New York, NY, USA
| | - Sameer A Sheth
- Department of Neurological Surgery, The Neurological Institute, New York, NY, USA; Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Ashesh D Mehta
- Department of Neurosurgery, Hofstra-Northwell School of Medicine and Feinstein Institute for Medical Research, Manhasset, New York, NY, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, USA.
| |
Collapse
|
28
|
Yi HG, Leonard MK, Chang EF. The Encoding of Speech Sounds in the Superior Temporal Gyrus. Neuron 2019; 102:1096-1110. [PMID: 31220442 PMCID: PMC6602075 DOI: 10.1016/j.neuron.2019.04.023] [Citation(s) in RCA: 223] [Impact Index Per Article: 37.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 04/08/2019] [Accepted: 04/16/2019] [Indexed: 01/02/2023]
Abstract
The human superior temporal gyrus (STG) is critical for extracting meaningful linguistic features from speech input. Local neural populations are tuned to acoustic-phonetic features of all consonants and vowels and to dynamic cues for intonational pitch. These populations are embedded throughout broader functional zones that are sensitive to amplitude-based temporal cues. Beyond speech features, STG representations are strongly modulated by learned knowledge and perceptual goals. Currently, a major challenge is to understand how these features are integrated across space and time in the brain during natural speech comprehension. We present a theory that temporally recurrent connections within STG generate context-dependent phonological representations, spanning longer temporal sequences relevant for coherent percepts of syllables, words, and phrases.
Collapse
Affiliation(s)
- Han Gyol Yi
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA.
| |
Collapse
|
29
|
Martin S, Millán JDR, Knight RT, Pasley BN. The use of intracranial recordings to decode human language: Challenges and opportunities. BRAIN AND LANGUAGE 2019; 193:73-83. [PMID: 27377299 PMCID: PMC5203979 DOI: 10.1016/j.bandl.2016.06.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 06/16/2016] [Accepted: 06/16/2016] [Indexed: 06/06/2023]
Abstract
Decoding speech from intracranial recordings serves two main purposes: understanding the neural correlates of speech processing and decoding speech features for targeting speech neuroprosthetic devices. Intracranial recordings have high spatial and temporal resolution, and thus offer a unique opportunity to investigate and decode the electrophysiological dynamics underlying speech processing. In this review article, we describe current approaches to decoding different features of speech perception and production - such as spectrotemporal, phonetic, phonotactic, semantic, and articulatory components - using intracranial recordings. A specific section is devoted to the decoding of imagined speech, and potential applications to speech prosthetic devices. We outline the challenges in decoding human language, as well as the opportunities in scientific and neuroengineering applications.
Collapse
Affiliation(s)
- Stephanie Martin
- Defitech Chair in Brain Machine Interface, Center for Neuroprosthetics, Ecole Polytechnique Fédérale de Lausanne, Switzerland; Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| | - José Del R Millán
- Defitech Chair in Brain Machine Interface, Center for Neuroprosthetics, Ecole Polytechnique Fédérale de Lausanne, Switzerland
| | - Robert T Knight
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA; Department of Psychology, University of California, Berkeley, CA, USA
| | - Brian N Pasley
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA.
| |
Collapse
|
30
|
Leonard MK, Cai R, Babiak MC, Ren A, Chang EF. The peri-Sylvian cortical network underlying single word repetition revealed by electrocortical stimulation and direct neural recordings. BRAIN AND LANGUAGE 2019; 193:58-72. [PMID: 27450996 PMCID: PMC5790638 DOI: 10.1016/j.bandl.2016.06.001] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 03/23/2016] [Accepted: 06/15/2016] [Indexed: 06/02/2023]
Abstract
Verbal repetition requires the coordination of auditory, memory, linguistic, and motor systems. To date, the basic dynamics of neural information processing in this deceptively simple behavior are largely unknown. Here, we examined the neural processes underlying verbal repetition using focal interruption (electrocortical stimulation) in 58 patients undergoing awake craniotomies, and neurophysiological recordings (electrocorticography) in 8 patients while they performed a single word repetition task. Electrocortical stimulation revealed that sub-components of the left peri-Sylvian network involved in single word repetition could be differentially interrupted, producing transient perceptual deficits, paraphasic errors, or speech arrest. Electrocorticography revealed the detailed spatio-temporal dynamics of cortical activation, involving a highly-ordered, but overlapping temporal progression of cortical high gamma (75-150Hz) activity throughout the peri-Sylvian cortex. We observed functionally distinct serial and parallel cortical processing corresponding to successive stages of general auditory processing (posterior superior temporal gyrus), speech-specific auditory processing (middle and posterior superior temporal gyrus), working memory (inferior frontal cortex), and motor articulation (sensorimotor cortex). Together, these methods reveal the dynamics of coordinated activity across peri-Sylvian cortex during verbal repetition.
Collapse
Affiliation(s)
- Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, United States; Center for Integrative Neuroscience, University of California, San Francisco, United States
| | - Ruofan Cai
- Department of Neurological Surgery, University of California, San Francisco, United States; Center for Integrative Neuroscience, University of California, San Francisco, United States
| | - Miranda C Babiak
- Department of Neurological Surgery, University of California, San Francisco, United States; Center for Integrative Neuroscience, University of California, San Francisco, United States
| | - Angela Ren
- Department of Neurological Surgery, University of California, San Francisco, United States; Center for Integrative Neuroscience, University of California, San Francisco, United States
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, United States; Center for Integrative Neuroscience, University of California, San Francisco, United States; Department of Physiology, University of California, San Francisco, United States.
| |
Collapse
|
31
|
Di Liberto GM, Wong D, Melnik GA, de Cheveigné A. Low-frequency cortical responses to natural speech reflect probabilistic phonotactics. Neuroimage 2019; 196:237-247. [PMID: 30991126 DOI: 10.1016/j.neuroimage.2019.04.037] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 03/18/2019] [Accepted: 04/11/2019] [Indexed: 11/19/2022] Open
Abstract
Humans comprehend speech despite the various challenges such as mispronunciation and noisy environments. Our auditory system is robust to these thanks to the integration of the sensory input with prior knowledge and expectations built on language-specific regularities. One such regularity regards the permissible phoneme sequences, which determine the likelihood that a word belongs to a given language (phonotactic probability; "blick" is more likely to be an English word than "bnick"). Previous research demonstrated that violations of these rules modulate brain-evoked responses. However, several fundamental questions remain unresolved, especially regarding the neural encoding and integration strategy of phonotactics in naturalistic conditions, when there are no (or few) violations. Here, we used linear modelling to assess the influence of phonotactic probabilities on the brain responses to narrative speech measured with non-invasive EEG. We found that the relationship between continuous speech and EEG responses is best described when the stimulus descriptor includes phonotactic probabilities. This indicates that low-frequency cortical signals (<9 Hz) reflect the integration of phonotactic information during natural speech perception, providing us with a measure of phonotactic processing at the individual subject-level. Furthermore, phonotactics-related signals showed the strongest speech-EEG interactions at latencies of 100-500 ms, supporting a pre-lexical role of phonotactic information.
Collapse
Affiliation(s)
- Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, France.
| | - Daniel Wong
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, France
| | - Gerda Ana Melnik
- Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, France; Laboratoire de Sciences Cognitives et Psycholinguistique, ENS, EHESS, CNRS, France
| | - Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, France; UCL Ear Institute, London, United Kingdom
| |
Collapse
|
32
|
Maheu M, Dehaene S, Meyniel F. Brain signatures of a multiscale process of sequence learning in humans. eLife 2019; 8:41541. [PMID: 30714904 PMCID: PMC6361584 DOI: 10.7554/elife.41541] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 01/18/2019] [Indexed: 01/08/2023] Open
Abstract
Extracting the temporal structure of sequences of events is crucial for perception, decision-making, and language processing. Here, we investigate the mechanisms by which the brain acquires knowledge of sequences and the possibility that successive brain responses reflect the progressive extraction of sequence statistics at different timescales. We measured brain activity using magnetoencephalography in humans exposed to auditory sequences with various statistical regularities, and we modeled this activity as theoretical surprise levels using several learning models. Successive brain waves related to different types of statistical inferences. Early post-stimulus brain waves denoted a sensitivity to a simple statistic, the frequency of items estimated over a long timescale (habituation). Mid-latency and late brain waves conformed qualitatively and quantitatively to the computational properties of a more complex inference: the learning of recent transition probabilities. Our findings thus support the existence of multiple computational systems for sequence processing involving statistical inferences at multiple scales.
Collapse
Affiliation(s)
- Maxime Maheu
- Cognitive Neuroimaging Unit, CEA DRF/JOLIOT, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center, Gif-sur-Yvette, France.,Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Stanislas Dehaene
- Cognitive Neuroimaging Unit, CEA DRF/JOLIOT, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center, Gif-sur-Yvette, France.,Collège de France, Paris, France
| | - Florent Meyniel
- Cognitive Neuroimaging Unit, CEA DRF/JOLIOT, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center, Gif-sur-Yvette, France
| |
Collapse
|
33
|
McCloy DR, Lee AKC. Investigating the fit between phonological feature systems and brain responses to speech using EEG. LANGUAGE, COGNITION AND NEUROSCIENCE 2019; 34:662-676. [PMID: 32984429 PMCID: PMC7518517 DOI: 10.1080/23273798.2019.1569246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 01/03/2019] [Indexed: 06/11/2023]
Abstract
This paper describes a technique to assess the correspondence between patterns of similarity in the brain's response to speech sounds and the patterns of similarity encoded in phonological feature systems, by quantifying the recoverability of phonological features from the neural data using supervised learning. The technique is applied to EEG recordings collected during passive listening to consonant-vowel syllables. Three published phonological feature systems are compared, and are shown to differ in their ability to recover certain speech sound contrasts from the neural data. For the phonological feature system that best reflects patterns of similarity in the neural data, a leave-one-out analysis indicates some consistency across subjects in which features have greatest impact on the fit, but considerable across-subject heterogeneity remains in the rank ordering of features in this regard.
Collapse
Affiliation(s)
- Daniel R McCloy
- University of Washington, Institute for Learning and Brain Sciences, Seattle, WA, United States
| | - Adrian K C Lee
- University of Washington, Institute for Learning and Brain Sciences, Seattle, WA, United States
| |
Collapse
|
34
|
Khoshkhoo S, Leonard MK, Mesgarani N, Chang EF. Neural correlates of sine-wave speech intelligibility in human frontal and temporal cortex. BRAIN AND LANGUAGE 2018; 187:83-91. [PMID: 29397190 PMCID: PMC6067983 DOI: 10.1016/j.bandl.2018.01.007] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Revised: 12/06/2017] [Accepted: 01/20/2018] [Indexed: 05/09/2023]
Abstract
Auditory speech comprehension is the result of neural computations that occur in a broad network that includes the temporal lobe auditory cortex and the left inferior frontal cortex. It remains unclear how representations in this network differentially contribute to speech comprehension. Here, we recorded high-density direct cortical activity during a sine-wave speech (SWS) listening task to examine detailed neural speech representations when the exact same acoustic input is comprehended versus not comprehended. Listeners heard SWS sentences (pre-exposure), followed by clear versions of the same sentences, which revealed the content of the sounds (exposure), and then the same SWS sentences again (post-exposure). Across all three task phases, high-gamma neural activity in the superior temporal gyrus was similar, distinguishing different words based on bottom-up acoustic features. In contrast, frontal regions showed a more pronounced and sudden increase in activity only when the input was comprehended, which corresponded with stronger representational separability among spatiotemporal activity patterns evoked by different words. We observed this effect only in participants who were not able to comprehend the stimuli during the pre-exposure phase, indicating a relationship between frontal high-gamma activity and speech understanding. Together, these results demonstrate that both frontal and temporal cortical networks are involved in spoken language understanding, and that under certain listening conditions, frontal regions are involved in discriminating speech sounds.
Collapse
Affiliation(s)
- Sattar Khoshkhoo
- School of Medicine, University of California, San Francisco, 505 Parnassus Ave., San Francisco, CA 94143, United States
| | - Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, 505 Parnassus Ave., San Francisco, CA 94143, United States; Center for Integrative Neuroscience, University of California, San Francisco, 675 Nelson Rising Ln., Room 535, San Francisco, CA 94158, United States; Weill Institute for Neurosciences, University of California, San Francisco, 675 Nelson Rising Ln., Room 535, San Francisco, CA 94158, United States
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, Mudd Building, Room 1339, 500 W 120th St., New York, NY 10027, United States
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 505 Parnassus Ave., San Francisco, CA 94143, United States; Center for Integrative Neuroscience, University of California, San Francisco, 675 Nelson Rising Ln., Room 535, San Francisco, CA 94158, United States; Weill Institute for Neurosciences, University of California, San Francisco, 675 Nelson Rising Ln., Room 535, San Francisco, CA 94158, United States.
| |
Collapse
|
35
|
The effects of periodic interruptions on cortical entrainment to speech. Neuropsychologia 2018; 121:58-68. [DOI: 10.1016/j.neuropsychologia.2018.10.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 09/19/2018] [Accepted: 10/24/2018] [Indexed: 11/21/2022]
|
36
|
Nourski KV, Steinschneider M, Rhone AE, Kawasaki H, Howard MA, Banks MI. Processing of auditory novelty across the cortical hierarchy: An intracranial electrophysiology study. Neuroimage 2018; 183:412-424. [PMID: 30114466 DOI: 10.1016/j.neuroimage.2018.08.027] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 08/02/2018] [Accepted: 08/12/2018] [Indexed: 11/15/2022] Open
Abstract
Under the predictive coding hypothesis, specific spatiotemporal patterns of cortical activation are postulated to occur during sensory processing as expectations generate feedback predictions and prediction errors generate feedforward signals. Establishing experimental evidence for this information flow within cortical hierarchy has been difficult, especially in humans, due to spatial and temporal limitations of non-invasive measures of cortical activity. This study investigated cortical responses to auditory novelty using the local/global deviant paradigm, which engages the hierarchical network underlying auditory predictive coding over short ('local deviance'; LD) and long ('global deviance'; GD) time scales. Electrocorticographic responses to auditory stimuli were obtained in neurosurgical patients from regions of interest (ROIs) including auditory, auditory-related and prefrontal cortex. LD and GD effects were assayed in averaged evoked potential (AEP) and high gamma (70-150 Hz) signals, the former likely dominated by local synaptic currents and the latter largely reflecting local spiking activity. AEP LD effects were distributed across all ROIs, with greatest percentage of significant sites in core and non-core auditory cortex. High gamma LD effects were localized primarily to auditory cortex in the superior temporal plane and on the lateral surface of the superior temporal gyrus (STG). LD effects exhibited progressively longer latencies in core, non-core, auditory-related and prefrontal cortices, consistent with feedforward signaling. The spatial distribution of AEP GD effects overlapped that of LD effects, but high gamma GD effects were more restricted to non-core areas. High gamma GD effects had shortest latencies in STG and preceded AEP GD effects in most ROIs. This latency profile, along with the paucity of high gamma GD effects in the superior temporal plane, suggest that the STG plays a prominent role in initiating novelty detection signals over long time scales. Thus, the data demonstrate distinct patterns of information flow in human cortex associated with auditory novelty detection over multiple time scales.
Collapse
Affiliation(s)
- Kirill V Nourski
- Department of Neurosurgery, The University of Iowa, Iowa City, IA 52242, USA; Iowa Neuroscience Institute, The University of Iowa, Iowa City, IA 52242, USA.
| | - Mitchell Steinschneider
- Departments of Neurology and Neuroscience, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Ariane E Rhone
- Department of Neurosurgery, The University of Iowa, Iowa City, IA 52242, USA
| | - Hiroto Kawasaki
- Department of Neurosurgery, The University of Iowa, Iowa City, IA 52242, USA
| | - Matthew A Howard
- Department of Neurosurgery, The University of Iowa, Iowa City, IA 52242, USA; Iowa Neuroscience Institute, The University of Iowa, Iowa City, IA 52242, USA; Pappajohn Biomedical Institute, The University of Iowa, Iowa City, IA 52242, USA
| | - Matthew I Banks
- Department of Anesthesiology and Neuroscience, University of Wisconsin - Madison, Madison, WI 53705, USA
| |
Collapse
|
37
|
Hasson U, Egidi G, Marelli M, Willems RM. Grounding the neurobiology of language in first principles: The necessity of non-language-centric explanations for language comprehension. Cognition 2018; 180:135-157. [PMID: 30053570 PMCID: PMC6145924 DOI: 10.1016/j.cognition.2018.06.018] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 06/05/2018] [Accepted: 06/24/2018] [Indexed: 12/26/2022]
Abstract
Recent decades have ushered in tremendous progress in understanding the neural basis of language. Most of our current knowledge on language and the brain, however, is derived from lab-based experiments that are far removed from everyday language use, and that are inspired by questions originating in linguistic and psycholinguistic contexts. In this paper we argue that in order to make progress, the field needs to shift its focus to understanding the neurobiology of naturalistic language comprehension. We present here a new conceptual framework for understanding the neurobiological organization of language comprehension. This framework is non-language-centered in the computational/neurobiological constructs it identifies, and focuses strongly on context. Our core arguments address three general issues: (i) the difficulty in extending language-centric explanations to discourse; (ii) the necessity of taking context as a serious topic of study, modeling it formally and acknowledging the limitations on external validity when studying language comprehension outside context; and (iii) the tenuous status of the language network as an explanatory construct. We argue that adopting this framework means that neurobiological studies of language will be less focused on identifying correlations between brain activity patterns and mechanisms postulated by psycholinguistic theories. Instead, they will be less self-referential and increasingly more inclined towards integration of language with other cognitive systems, ultimately doing more justice to the neurobiological organization of language and how it supports language as it is used in everyday life.
Collapse
Affiliation(s)
- Uri Hasson
- Center for Mind/Brain Sciences, The University of Trento, Trento, Italy; Center for Practical Wisdom, The University of Chicago, Chicago, IL, United States.
| | - Giovanna Egidi
- Center for Mind/Brain Sciences, The University of Trento, Trento, Italy
| | - Marco Marelli
- Department of Psychology, University of Milano-Bicocca, Milano, Italy; NeuroMI - Milan Center for Neuroscience, Milano, Italy
| | - Roel M Willems
- Centre for Language Studies & Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands; Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| |
Collapse
|
38
|
Kaestner E, Morgan AM, Snider J, Zhan M, Jiang X, Levy R, Ferreira VS, Thesen T, Halgren E. Toward a Database of Intracranial Electrophysiology during Natural Language Presentation. LANGUAGE, COGNITION AND NEUROSCIENCE 2018; 35:729-738. [PMID: 35528322 PMCID: PMC9074941 DOI: 10.1080/23273798.2018.1500262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Accepted: 07/05/2018] [Indexed: 06/14/2023]
Abstract
Intracranial electrophysiology (iEEG) studies using cognitive tasks contribute to the understanding of the neural basis of language. However, though iEEG is recorded continuously during clinical treatment, due to patient considerations task time is limited. To increase the usefulness of iEEG recordings for language study, we provided patients with a tablet pre-loaded with media filled with natural language, wirelessly synchronized to clinical iEEG. This iEEG data collected and time-locked to natural language presentation is particularly applicable for studying the neural basis of combining words into larger contexts. We validate this approach with pilot analyses involving words heard during a movie, tagging syntactic properties and verb contextual probabilities. Event-related averages of high-frequency power (70-170Hz) identified bilateral perisylvian electrodes with differential responses to syntactic class and a linear regression identified activity associated with contextual probabilities, demonstrating the usefulness of aligning media to iEEG. We imagine future multi-site collaborations building an 'intracranial neurolinguistic corpus'.
Collapse
Affiliation(s)
- Erik Kaestner
- Department of Neurosciences, University of California at San Diego, La Jolla, California
| | - Adam Milton Morgan
- Department of Psychology, University of California at San Diego, La Jolla, California
| | - Joseph Snider
- Institute for Neural Computation, University of California at San Diego, La Jolla, California
| | - Meilin Zhan
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Xi Jiang
- Department of Neurosciences, University of California at San Diego, La Jolla, California
| | - Roger Levy
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Victor S Ferreira
- Department of Psychology, University of California at San Diego, La Jolla, California
| | - Thomas Thesen
- Department of Neurology, New York University Comprehensive Epilepsy Center, New York, New York
| | - Eric Halgren
- Department of Neurosciences, University of California at San Diego, La Jolla, California
- Department of Radiology, University of California at San Diego, La Jolla, California
| |
Collapse
|
39
|
Kikuchi Y, Sedley W, Griffiths TD, Petkov CI. Evolutionarily conserved neural signatures involved in sequencing predictions and their relevance for language. Curr Opin Behav Sci 2018; 21:145-153. [PMID: 30057937 PMCID: PMC6058086 DOI: 10.1016/j.cobeha.2018.05.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Predicting the occurrence of future events from prior ones is vital for animal perception and cognition. Although how such sequence learning (a form of relational knowledge) relates to particular operations in language remains controversial, recent evidence shows that sequence learning is disrupted in frontal lobe damage associated with aphasia. Also, neural sequencing predictions at different temporal scales resemble those involved in language operations occurring at similar scales. Furthermore, comparative work in humans and monkeys highlights evolutionarily conserved frontal substrates and predictive oscillatory signatures in the temporal lobe processing learned sequences of speech signals. Altogether this evidence supports a relational knowledge hypothesis of language evolution, proposing that language processes in humans are functionally integrated with an ancestral neural system for predictive sequence learning.
Collapse
Affiliation(s)
- Yukiko Kikuchi
- Institute of Neuroscience, Newcastle University Medical School, Newcastle Upon Tyne, UK
- Centre for Behaviour and Evolution, Newcastle University, Newcastle Upon Tyne, UK
| | - William Sedley
- Institute of Neuroscience, Newcastle University Medical School, Newcastle Upon Tyne, UK
| | - Timothy D Griffiths
- Institute of Neuroscience, Newcastle University Medical School, Newcastle Upon Tyne, UK
- Wellcome Trust Centre for Neuroimaging, University College London, UK
- Department of Neurosurgery, University of Iowa, Iowa City, USA
| | - Christopher I Petkov
- Institute of Neuroscience, Newcastle University Medical School, Newcastle Upon Tyne, UK
- Centre for Behaviour and Evolution, Newcastle University, Newcastle Upon Tyne, UK
| |
Collapse
|
40
|
Himberger KD, Chien HY, Honey CJ. Principles of Temporal Processing Across the Cortical Hierarchy. Neuroscience 2018; 389:161-174. [PMID: 29729293 DOI: 10.1016/j.neuroscience.2018.04.030] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 04/17/2018] [Accepted: 04/19/2018] [Indexed: 12/20/2022]
Abstract
The world is richly structured on multiple spatiotemporal scales. In order to represent spatial structure, many machine-learning models repeat a set of basic operations at each layer of a hierarchical architecture. These iterated spatial operations - including pooling, normalization and pattern completion - enable these systems to recognize and predict spatial structure, while robust to changes in the spatial scale, contrast and noisiness of the input signal. Because our brains also process temporal information that is rich and occurs across multiple time scales, might the brain employ an analogous set of operations for temporal information processing? Here we define a candidate set of temporal operations, and we review evidence that they are implemented in the mammalian cerebral cortex in a hierarchical manner. We conclude that multiple consecutive stages of cortical processing can be understood to perform temporal pooling, temporal normalization and temporal pattern completion.
Collapse
Affiliation(s)
- Kevin D Himberger
- Department of Psychological & Brain Sciences, Johns Hopkins University, Baltimore, MD, United States
| | - Hsiang-Yun Chien
- Department of Psychological & Brain Sciences, Johns Hopkins University, Baltimore, MD, United States
| | - Christopher J Honey
- Department of Psychological & Brain Sciences, Johns Hopkins University, Baltimore, MD, United States.
| |
Collapse
|
41
|
Holdgraf CR, Rieger JW, Micheli C, Martin S, Knight RT, Theunissen FE. Encoding and Decoding Models in Cognitive Electrophysiology. Front Syst Neurosci 2017; 11:61. [PMID: 29018336 PMCID: PMC5623038 DOI: 10.3389/fnsys.2017.00061] [Citation(s) in RCA: 76] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 08/07/2017] [Indexed: 11/13/2022] Open
Abstract
Cognitive neuroscience has seen rapid growth in the size and complexity of data recorded from the human brain as well as in the computational tools available to analyze this data. This data explosion has resulted in an increased use of multivariate, model-based methods for asking neuroscience questions, allowing scientists to investigate multiple hypotheses with a single dataset, to use complex, time-varying stimuli, and to study the human brain under more naturalistic conditions. These tools come in the form of "Encoding" models, in which stimulus features are used to model brain activity, and "Decoding" models, in which neural features are used to generated a stimulus output. Here we review the current state of encoding and decoding models in cognitive electrophysiology and provide a practical guide toward conducting experiments and analyses in this emerging field. Our examples focus on using linear models in the study of human language and audition. We show how to calculate auditory receptive fields from natural sounds as well as how to decode neural recordings to predict speech. The paper aims to be a useful tutorial to these approaches, and a practical introduction to using machine learning and applied statistics to build models of neural activity. The data analytic approaches we discuss may also be applied to other sensory modalities, motor systems, and cognitive systems, and we cover some examples in these areas. In addition, a collection of Jupyter notebooks is publicly available as a complement to the material covered in this paper, providing code examples and tutorials for predictive modeling in python. The aim is to provide a practical understanding of predictive modeling of human brain data and to propose best-practices in conducting these analyses.
Collapse
Affiliation(s)
- Christopher R. Holdgraf
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
- Office of the Vice Chancellor for Research, Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, United States
| | - Jochem W. Rieger
- Department of Psychology, Carl-von-Ossietzky University, Oldenburg, Germany
| | - Cristiano Micheli
- Department of Psychology, Carl-von-Ossietzky University, Oldenburg, Germany
- Institut des Sciences Cognitives Marc Jeannerod, Lyon, France
| | - Stephanie Martin
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
- Defitech Chair in Brain-Machine Interface, Center for Neuroprosthetics, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Robert T. Knight
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
| | - Frederic E. Theunissen
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
- Department of Psychology, University of California, Berkeley, Berkeley, CA, United States
| |
Collapse
|
42
|
The cortical dynamics of speaking: Lexical and phonological knowledge simultaneously recruit the frontal and temporal cortex within 200 ms. Neuroimage 2017; 163:206-219. [PMID: 28943413 DOI: 10.1016/j.neuroimage.2017.09.041] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Revised: 08/20/2017] [Accepted: 09/20/2017] [Indexed: 11/27/2022] Open
Abstract
Language production models typically assume that retrieving a word for articulation is a sequential process with substantial functional delays between conceptual, lexical, phonological and motor processing, respectively. Nevertheless, explicit evidence contrasting the spatiotemporal dynamics between different word production components is scarce. Here, using anatomically constrained magnetoencephalography during overt meaningful speech production, we explore the speed with which lexico-semantic versus acoustic-articulatory information of a to-be-uttered word become first neurophysiologically manifest in the cerebral cortex. We demonstrate early modulations of brain activity by the lexical frequency of a word in the temporal cortex and the left inferior frontal gyrus, simultaneously with activity in the motor and the posterior superior temporal cortex reflecting articulatory-acoustic phonological features (+LABIAL vs. +CORONAL) of the word-initial speech sounds (e.g., Monkey vs. Donkey). The specific nature of the spatiotemporal pattern correlating with a word's frequency and initial phoneme demonstrates that, in the course of speech planning, lexico-semantic and phonological-articulatory processes emerge together rapidly, drawing in parallel on temporal and frontal cortex. This novel finding calls for revisions of current brain language theories of word production.
Collapse
|
43
|
Bocquelet F, Hueber T, Girin L, Chabardès S, Yvert B. Key considerations in designing a speech brain-computer interface. ACTA ACUST UNITED AC 2017; 110:392-401. [PMID: 28756027 DOI: 10.1016/j.jphysparis.2017.07.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 06/21/2017] [Accepted: 07/19/2017] [Indexed: 01/08/2023]
Abstract
Restoring communication in case of aphasia is a key challenge for neurotechnologies. To this end, brain-computer strategies can be envisioned to allow artificial speech synthesis from the continuous decoding of neural signals underlying speech imagination. Such speech brain-computer interfaces do not exist yet and their design should consider three key choices that need to be made: the choice of appropriate brain regions to record neural activity from, the choice of an appropriate recording technique, and the choice of a neural decoding scheme in association with an appropriate speech synthesis method. These key considerations are discussed here in light of (1) the current understanding of the functional neuroanatomy of cortical areas underlying overt and covert speech production, (2) the available literature making use of a variety of brain recording techniques to better characterize and address the challenge of decoding cortical speech signals, and (3) the different speech synthesis approaches that can be considered depending on the level of speech representation (phonetic, acoustic or articulatory) envisioned to be decoded at the core of a speech BCI paradigm.
Collapse
Affiliation(s)
- Florent Bocquelet
- INSERM, BrainTech Laboratory U1205, F-38000 Grenoble, France; Univ. Grenoble Alpes, BrainTech Laboratory U1205, F-38000 Grenoble, France
| | - Thomas Hueber
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France
| | - Laurent Girin
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France
| | | | - Blaise Yvert
- INSERM, BrainTech Laboratory U1205, F-38000 Grenoble, France; Univ. Grenoble Alpes, BrainTech Laboratory U1205, F-38000 Grenoble, France.
| |
Collapse
|
44
|
Andric M, Davis B, Hasson U. Visual cortex signals a mismatch between regularity of auditory and visual streams. Neuroimage 2017; 157:648-659. [DOI: 10.1016/j.neuroimage.2017.05.028] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2017] [Revised: 04/14/2017] [Accepted: 05/15/2017] [Indexed: 10/19/2022] Open
|
45
|
Rao VR, Leonard MK, Kleen JK, Lucas BA, Mirro EA, Chang EF. Chronic ambulatory electrocorticography from human speech cortex. Neuroimage 2017; 153:273-282. [PMID: 28396294 DOI: 10.1016/j.neuroimage.2017.04.008] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Revised: 03/15/2017] [Accepted: 04/04/2017] [Indexed: 01/07/2023] Open
Abstract
Direct intracranial recording of human brain activity is an important approach for deciphering neural mechanisms of cognition. Such recordings, usually made in patients with epilepsy undergoing inpatient monitoring for seizure localization, are limited in duration and depend on patients' tolerance for the challenges associated with recovering from brain surgery. Thus, typical intracranial recordings, similar to most non-invasive approaches in humans, provide snapshots of brain activity in acute, highly constrained settings, limiting opportunities to understand long timescale and natural, real-world phenomena. A new device for treating some forms of drug-resistant epilepsy, the NeuroPace RNS® System, includes a cranially-implanted neurostimulator and intracranial electrodes that continuously monitor brain activity and respond to incipient seizures with electrical counterstimulation. The RNS System can record epileptic brain activity over years, but whether it can record meaningful, behavior-related physiological responses has not been demonstrated. Here, in a human subject with electrodes implanted over high-level speech-auditory cortex (Wernicke's area; posterior superior temporal gyrus), we report that cortical evoked responses to spoken sentences are robust, selective to phonetic features, and stable over nearly 1.5 years. In a second subject with RNS System electrodes implanted over frontal cortex (Broca's area, posterior inferior frontal gyrus), we found that word production during a naming task reliably evokes cortical responses preceding speech onset. The spatiotemporal resolution, high signal-to-noise, and wireless nature of this system's intracranial recordings make it a powerful new approach to investigate the neural correlates of human cognition over long timescales in natural ambulatory settings.
Collapse
Affiliation(s)
- Vikram R Rao
- University of California, San Francisco, Department of Neurology, San Francisco, CA 94143, United States.
| | - Matthew K Leonard
- University of California, San Francisco, Department of Neurosurgery, San Francisco, CA 94143, United States; Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, United States
| | - Jonathan K Kleen
- University of California, San Francisco, Department of Neurology, San Francisco, CA 94143, United States
| | - Ben A Lucas
- University of California, San Francisco, Department of Neurosurgery, San Francisco, CA 94143, United States; Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, United States
| | - Emily A Mirro
- NeuroPace, Inc., Mountain View, CA 94043, United States
| | - Edward F Chang
- University of California, San Francisco, Department of Neurosurgery, San Francisco, CA 94143, United States; Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, United States
| |
Collapse
|
46
|
Leonard MK, Baud MO, Sjerps MJ, Chang EF. Perceptual restoration of masked speech in human cortex. Nat Commun 2016; 7:13619. [PMID: 27996973 PMCID: PMC5187421 DOI: 10.1038/ncomms13619] [Citation(s) in RCA: 91] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2016] [Accepted: 10/19/2016] [Indexed: 02/02/2023] Open
Abstract
Humans are adept at understanding speech despite the fact that our natural listening environment is often filled with interference. An example of this capacity is phoneme restoration, in which part of a word is completely replaced by noise, yet listeners report hearing the whole word. The neurological basis for this unconscious fill-in phenomenon is unknown, despite being a fundamental characteristic of human hearing. Here, using direct cortical recordings in humans, we demonstrate that missing speech is restored at the acoustic-phonetic level in bilateral auditory cortex, in real-time. This restoration is preceded by specific neural activity patterns in a separate language area, left frontal cortex, which predicts the word that participants later report hearing. These results demonstrate that during speech perception, missing acoustic content is synthesized online from the integration of incoming sensory cues and the internal neural dynamics that bias word-level expectation and prediction. We can often ‘fill in' missing or occluded sounds from a speech signal—an effect known as phoneme restoration. Leonard et al. found a real-time restoration of the missing sounds in the superior temporal auditory cortex in humans. Interestingly, neural activity in frontal regions prior to the stimulus can predict the word that the participant would later hear.
Collapse
Affiliation(s)
- Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, California 94158, USA.,Center for Integrative Neuroscience, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, California 94158, USA
| | - Maxime O Baud
- Department of Neurology, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, California 94158, USA
| | - Matthias J Sjerps
- Department of Linguistics, University of California, Berkeley, 1203 Dwinelle Hall #2650, Berkeley, California 94720-2650, USA.,Neurobiology of Language Department, Donders Institute for Brain, Cognition and Behavior, Centre for Cognitive Neuroimaging, Radboud University, Kapittelweg 29, Nijmegen 6525 EN, The Netherlands
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, California 94158, USA.,Center for Integrative Neuroscience, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, California 94158, USA.,Department of Physiology, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, California 94158, USA
| |
Collapse
|
47
|
Nozaradan S, Mouraux A, Jonas J, Colnat-Coulbois S, Rossion B, Maillard L. Intracerebral evidence of rhythm transform in the human auditory cortex. Brain Struct Funct 2016; 222:2389-2404. [PMID: 27990557 DOI: 10.1007/s00429-016-1348-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Accepted: 12/06/2016] [Indexed: 01/23/2023]
Abstract
Musical entrainment is shared by all human cultures and the perception of a periodic beat is a cornerstone of this entrainment behavior. Here, we investigated whether beat perception might have its roots in the earliest stages of auditory cortical processing. Local field potentials were recorded from 8 patients implanted with depth-electrodes in Heschl's gyrus and the planum temporale (55 recording sites in total), usually considered as human primary and secondary auditory cortices. Using a frequency-tagging approach, we show that both low-frequency (<30 Hz) and high-frequency (>30 Hz) neural activities in these structures faithfully track auditory rhythms through frequency-locking to the rhythm envelope. A selective gain in amplitude of the response frequency-locked to the beat frequency was observed for the low-frequency activities but not for the high-frequency activities, and was sharper in the planum temporale, especially for the more challenging syncopated rhythm. Hence, this gain process is not systematic in all activities produced in these areas and depends on the complexity of the rhythmic input. Moreover, this gain was disrupted when the rhythm was presented at fast speed, revealing low-pass response properties which could account for the propensity to perceive a beat only within the musical tempo range. Together, these observations show that, even though part of these neural transforms of rhythms could already take place in subcortical auditory processes, the earliest auditory cortical processes shape the neural representation of rhythmic inputs in favor of the emergence of a periodic beat.
Collapse
Affiliation(s)
- Sylvie Nozaradan
- Institute of Neuroscience (Ions), Université catholique de Louvain (UCL), 53, Avenue Mounier, UCL 53.75, 1200, Brussels, Belgium. .,The MARCS Institute, Western Sydney University, Sydney, NSW, 2214, Australia. .,International Laboratory for Brain, Music and Sound Research (Brams), Montreal, H3C 3J7, Canada.
| | - André Mouraux
- Institute of Neuroscience (Ions), Université catholique de Louvain (UCL), 53, Avenue Mounier, UCL 53.75, 1200, Brussels, Belgium
| | - Jacques Jonas
- Institute of Neuroscience (Ions), Université catholique de Louvain (UCL), 53, Avenue Mounier, UCL 53.75, 1200, Brussels, Belgium.,Service de Neurologie, Centre Hospitalier Universitaire de Nancy, 54035, Nancy, France.,CRAN UMR 7039 CNRS Université de Lorraine, 54035, Nancy, France
| | - Sophie Colnat-Coulbois
- Neurosurgery Department, Centre Hospitalier Universitaire de Nancy, 54035, Nancy, France
| | - Bruno Rossion
- Institute of Neuroscience (Ions), Université catholique de Louvain (UCL), 53, Avenue Mounier, UCL 53.75, 1200, Brussels, Belgium.,Service de Neurologie, Centre Hospitalier Universitaire de Nancy, 54035, Nancy, France.,Psychological Sciences Research Institute, Université Catholique de Louvain (UCL), 1348, Louvain-la-Neuve, Belgium
| | - Louis Maillard
- Service de Neurologie, Centre Hospitalier Universitaire de Nancy, 54035, Nancy, France.,CRAN UMR 7039 CNRS Université de Lorraine, 54035, Nancy, France
| |
Collapse
|
48
|
SyllabO+: A new tool to study sublexical phenomena in spoken Quebec French. Behav Res Methods 2016; 49:1852-1863. [DOI: 10.3758/s13428-016-0829-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
49
|
Abstract
The neural processes that underlie your ability to read and understand this sentence are unknown. Sentence comprehension occurs very rapidly, and can only be understood at a mechanistic level by discovering the precise sequence of underlying computational and neural events. However, we have no continuous and online neural measure of sentence processing with high spatial and temporal resolution. Here we report just such a measure: intracranial recordings from the surface of the human brain show that neural activity, indexed by γ-power, increases monotonically over the course of a sentence as people read it. This steady increase in activity is absent when people read and remember nonword-lists, despite the higher cognitive demand entailed, ruling out accounts in terms of generic attention, working memory, and cognitive load. Response increases are lower for sentence structure without meaning ("Jabberwocky" sentences) and word meaning without sentence structure (word-lists), showing that this effect is not explained by responses to syntax or word meaning alone. Instead, the full effect is found only for sentences, implicating compositional processes of sentence understanding, a striking and unique feature of human language not shared with animal communication systems. This work opens up new avenues for investigating the sequence of neural events that underlie the construction of linguistic meaning.
Collapse
|
50
|
Moses DA, Mesgarani N, Leonard MK, Chang EF. Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity. J Neural Eng 2016; 13:056004. [PMID: 27484713 DOI: 10.1088/1741-2560/13/5/056004] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
OBJECTIVE The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. APPROACH The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. MAIN RESULTS The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. SIGNIFICANCE These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.
Collapse
Affiliation(s)
- David A Moses
- Department of Neurological Surgery, UC San Francisco, CA, USA. Center for Integrative Neuroscience, UC San Francisco, CA, USA. Graduate Program in Bioengineering, UC Berkeley-UC San Francisco, CA, USA
| | | | | | | |
Collapse
|