1
|
Rupp KM, Hect JL, Harford EE, Holt LL, Ghuman AS, Abel TJ. A hierarchy of processing complexity and timescales for natural sounds in the human auditory cortex. Proc Natl Acad Sci U S A 2025; 122:e2412243122. [PMID: 40294254 DOI: 10.1073/pnas.2412243122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 03/21/2025] [Indexed: 04/30/2025] Open
Abstract
Efficient behavior is supported by humans' ability to rapidly recognize acoustically distinct sounds as members of a common category. Within the auditory cortex, critical unanswered questions remain regarding the organization and dynamics of sound categorization. We performed intracerebral recordings during epilepsy surgery evaluation as 20 patient-participants listened to natural sounds. We then built encoding models to predict neural responses using sound representations extracted from different layers within a deep neural network (DNN) pretrained to categorize sounds from acoustics. This approach yielded accurate models of neural responses throughout the auditory cortex. The complexity of a cortical site's representation (measured by the depth of the DNN layer that produced the best model) was closely related to its anatomical location, with shallow, middle, and deep layers associated with core (primary auditory cortex), lateral belt, and parabelt regions, respectively. Smoothly varying gradients of representational complexity existed within these regions, with complexity increasing along a posteromedial-to-anterolateral direction in core and lateral belt and along posterior-to-anterior and dorsal-to-ventral dimensions in parabelt. We then characterized the time (relative to sound onset) when feature representations emerged; this measure of temporal dynamics increased across the auditory hierarchy. Finally, we found separable effects of region and temporal dynamics on representational complexity: sites that took longer to begin encoding stimulus features had higher representational complexity independent of region, and downstream regions encoded more complex features independent of temporal dynamics. These findings suggest that hierarchies of timescales and complexity represent a functional organizational principle of the auditory stream underlying our ability to rapidly categorize sounds.
Collapse
Affiliation(s)
- Kyle M Rupp
- Department of Neurological Surgery, University of Pittsburgh, PA 15213
| | - Jasmine L Hect
- Department of Neurological Surgery, University of Pittsburgh, PA 15213
| | - Emily E Harford
- Department of Neurological Surgery, University of Pittsburgh, PA 15213
| | - Lori L Holt
- Department of Psychology, The University of Texas at Austin, TX 78712
| | | | - Taylor J Abel
- Department of Neurological Surgery, University of Pittsburgh, PA 15213
- Department of Bioengineering, University of Pittsburgh, PA 15261
| |
Collapse
|
2
|
Cusinato R, Seiler A, Schindler K, Tzovara A. Sleep Modulates Neural Timescales and Spatiotemporal Integration in the Human Cortex. J Neurosci 2025; 45:e1845242025. [PMID: 39965931 PMCID: PMC11984084 DOI: 10.1523/jneurosci.1845-24.2025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 12/19/2024] [Accepted: 01/25/2025] [Indexed: 02/20/2025] Open
Abstract
Spontaneous neural dynamics manifest across multiple temporal and spatial scales, which are thought to be intrinsic to brain areas and exhibit hierarchical organization across the cortex. In wake, a hierarchy of timescales is thought to naturally emerge from microstructural properties, gene expression, and recurrent connections. A fundamental question is timescales' organization and changes in sleep, where physiological needs are different. Here, we describe two measures of neural timescales, obtained from broadband activity and gamma power, which display complementary properties. We leveraged intracranial electroencephalography in 106 human epilepsy patients (48 females) to characterize timescale changes from wake to sleep across the cortical hierarchy. We show that both broadband and gamma timescales are globally longer in sleep than in wake. While broadband timescales increase along the sensorimotor-association axis, gamma ones decrease. During sleep, slow waves can explain the increase of broadband and gamma timescales, but only broadband ones show a positive association with slow-wave density across the cortex. Finally, we characterize spatial correlations and their relationship with timescales as a proxy for spatiotemporal integration, finding high integration at long distances in wake for broadband and at short distances in sleep for gamma timescales. Our results suggest that mesoscopic neural populations possess different timescales that are shaped by anatomy and are modulated by the sleep/wake cycle.
Collapse
Affiliation(s)
- Riccardo Cusinato
- Institute of Computer Science, University of Bern, Bern 3012, Switzerland
- Center for Experimental Neurology - Sleep Wake Epilepsy Center - NeuroTec, Department of Neurology, Inselspital Bern, University Hospital, University of Bern, Bern 3010, Switzerland
| | - Andrea Seiler
- Sleep-Wake-Epilepsy Center, Department of Neurology, Inselspital Bern, University Hospital, University of Bern, Bern 3010, Switzerland
| | - Kaspar Schindler
- Sleep-Wake-Epilepsy Center, Department of Neurology, Inselspital Bern, University Hospital, University of Bern, Bern 3010, Switzerland
| | - Athina Tzovara
- Institute of Computer Science, University of Bern, Bern 3012, Switzerland
- Center for Experimental Neurology - Sleep Wake Epilepsy Center - NeuroTec, Department of Neurology, Inselspital Bern, University Hospital, University of Bern, Bern 3010, Switzerland
| |
Collapse
|
3
|
Oderbolz C, Poeppel D, Meyer M. Asymmetric Sampling in Time: Evidence and perspectives. Neurosci Biobehav Rev 2025; 171:106082. [PMID: 40010659 DOI: 10.1016/j.neubiorev.2025.106082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2024] [Revised: 02/15/2025] [Accepted: 02/21/2025] [Indexed: 02/28/2025]
Abstract
Auditory and speech signals are undisputedly processed in both left and right hemispheres, but this bilateral allocation is likely unequal. The Asymmetric Sampling in Time (AST) hypothesis proposed a division of labor that has its neuroanatomical basis in the distribution of neuronal ensembles with differing temporal integration constants: left auditory areas house a larger proportion of ensembles with shorter temporal integration windows (tens of milliseconds), suited to process rapidly changing signals; right auditory areas host a larger proportion with longer time constants (∼150-300 ms), ideal for slowly changing signals. Here we evaluate the large body of findings that clarifies this relationship between auditory temporal structure and functional lateralization. In this reappraisal, we unpack whether this relationship is influenced by stimulus type (speech/nonspeech), stimulus temporal extent (long/short), task engagement (high/low), or (imaging) modality (hemodynamic/electrophysiology/behavior). We find that the right hemisphere displays a clear preference for slowly changing signals whereas the left-hemispheric preference for rapidly changing signals is highly dependent on the experimental design. We consider neuroanatomical properties potentially linked to functional lateralization, contextualize the results in an evolutionary perspective, and highlight future directions.
Collapse
Affiliation(s)
- Chantal Oderbolz
- Institute for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich, Switzerland; Department of Neuroscience, Georgetown University Medical Center, Washington D.C., USA.
| | - David Poeppel
- Department of Psychology, New York University, New York, NY, USA
| | - Martin Meyer
- Institute for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich, Switzerland
| |
Collapse
|
4
|
Kanwisher N. Animal models of the human brain: Successes, limitations, and alternatives. Curr Opin Neurobiol 2025; 90:102969. [PMID: 39914250 DOI: 10.1016/j.conb.2024.102969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 12/19/2024] [Accepted: 12/21/2024] [Indexed: 02/21/2025]
Abstract
The last three decades of research in human cognitive neuroscience have given us an initial "parts list" for the human mind in the form of a set of cortical regions with distinct and often very specific functions. But current neuroscientific methods in humans have limited ability to reveal exactly what these regions represent and compute, the causal role of each in behavior, and the interactions among regions that produce real-world cognition. Animal models can help to answer these questions when homologues exist in other species, like the face system in macaques. When homologues do not exist in animals, for example for speech and music perception, and understanding of language or other people's thoughts, intracranial recordings in humans play a central role, along with a new alternative to animal models: artificial neural networks.
Collapse
Affiliation(s)
- Nancy Kanwisher
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, United States.
| |
Collapse
|
5
|
Bagur S, Bourg J, Kempf A, Tarpin T, Bergaoui K, Guo Y, Ceballo S, Schwenkgrub J, Verdier A, Puel JL, Bourien J, Bathellier B. A spatial code for temporal information is necessary for efficient sensory learning. SCIENCE ADVANCES 2025; 11:eadr6214. [PMID: 39772691 PMCID: PMC11708902 DOI: 10.1126/sciadv.adr6214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 11/26/2024] [Indexed: 01/11/2025]
Abstract
The temporal structure of sensory inputs contains essential information for their interpretation. Sensory cortex represents these temporal cues through two codes: the temporal sequences of neuronal activity and the spatial patterns of neuronal firing rate. However, it is unknown which of these coexisting codes causally drives sensory decisions. To separate their contributions, we generated in the mouse auditory cortex optogenetically driven activity patterns differing exclusively along their temporal or spatial dimensions. Mice could rapidly learn to behaviorally discriminate spatial but not temporal patterns. Moreover, large-scale neuronal recordings across the auditory system revealed that the auditory cortex is the first region in which spatial patterns efficiently represent temporal cues on the timescale of several hundred milliseconds. This feature is shared by the deep layers of neural networks categorizing time-varying sounds. Therefore, the emergence of a spatial code for temporal sensory cues is a necessary condition to efficiently associate temporally structured stimuli with decisions.
Collapse
Affiliation(s)
- Sophie Bagur
- Université Paris Cité, Institut Pasteur, AP-HP, Inserm, Fondation Pour l’Audition, Institut de l’Audition, IHU reConnect, F-75012 Paris, France
| | - Jacques Bourg
- Université Paris Cité, Institut Pasteur, AP-HP, Inserm, Fondation Pour l’Audition, Institut de l’Audition, IHU reConnect, F-75012 Paris, France
| | - Alexandre Kempf
- Université Paris Cité, Institut Pasteur, AP-HP, Inserm, Fondation Pour l’Audition, Institut de l’Audition, IHU reConnect, F-75012 Paris, France
| | - Thibault Tarpin
- Université Paris Cité, Institut Pasteur, AP-HP, Inserm, Fondation Pour l’Audition, Institut de l’Audition, IHU reConnect, F-75012 Paris, France
| | - Khalil Bergaoui
- Université Paris Cité, Institut Pasteur, AP-HP, Inserm, Fondation Pour l’Audition, Institut de l’Audition, IHU reConnect, F-75012 Paris, France
| | - Yin Guo
- Université Paris Cité, Institut Pasteur, AP-HP, Inserm, Fondation Pour l’Audition, Institut de l’Audition, IHU reConnect, F-75012 Paris, France
| | - Sebastian Ceballo
- Université Paris Cité, Institut Pasteur, AP-HP, Inserm, Fondation Pour l’Audition, Institut de l’Audition, IHU reConnect, F-75012 Paris, France
| | - Joanna Schwenkgrub
- Université Paris Cité, Institut Pasteur, AP-HP, Inserm, Fondation Pour l’Audition, Institut de l’Audition, IHU reConnect, F-75012 Paris, France
| | - Antonin Verdier
- Université Paris Cité, Institut Pasteur, AP-HP, Inserm, Fondation Pour l’Audition, Institut de l’Audition, IHU reConnect, F-75012 Paris, France
| | - Jean Luc Puel
- Institut des Neurosciences de Montpellier, Université de Montpellier, INSERM, Montpellier, France
| | - Jérôme Bourien
- Institut des Neurosciences de Montpellier, Université de Montpellier, INSERM, Montpellier, France
| | - Brice Bathellier
- Université Paris Cité, Institut Pasteur, AP-HP, Inserm, Fondation Pour l’Audition, Institut de l’Audition, IHU reConnect, F-75012 Paris, France
| |
Collapse
|
6
|
Mukherjee S, Babadi B, Shamma S. Sparse high-dimensional decomposition of non-primary auditory cortical receptive fields. PLoS Comput Biol 2025; 21:e1012721. [PMID: 39746112 PMCID: PMC11774495 DOI: 10.1371/journal.pcbi.1012721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 01/28/2025] [Accepted: 12/16/2024] [Indexed: 01/04/2025] Open
Abstract
Characterizing neuronal responses to natural stimuli remains a central goal in sensory neuroscience. In auditory cortical neurons, the stimulus selectivity of elicited spiking activity is summarized by a spectrotemporal receptive field (STRF) that relates neuronal responses to the stimulus spectrogram. Though effective in characterizing primary auditory cortical responses, STRFs of non-primary auditory neurons can be quite intricate, reflecting their mixed selectivity. The complexity of non-primary STRFs hence impedes understanding how acoustic stimulus representations are transformed along the auditory pathway. Here, we focus on the relationship between ferret primary auditory cortex (A1) and a secondary region, dorsal posterior ectosylvian gyrus (PEG). We propose estimating receptive fields in PEG with respect to a well-established high-dimensional computational model of primary-cortical stimulus representations. These "cortical receptive fields" (CortRF) are estimated greedily to identify the salient primary-cortical features modulating spiking responses and in turn related to corresponding spectrotemporal features. Hence, they provide biologically plausible hierarchical decompositions of STRFs in PEG. Such CortRF analysis was applied to PEG neuronal responses to speech and temporally orthogonal ripple combination (TORC) stimuli and, for comparison, to A1 neuronal responses. CortRFs of PEG neurons captured their selectivity to more complex spectrotemporal features than A1 neurons; moreover, CortRF models were more predictive of PEG (but not A1) responses to speech. Our results thus suggest that secondary-cortical stimulus representations can be computed as sparse combinations of primary-cortical features that facilitate encoding natural stimuli. Thus, by adding the primary-cortical representation, we can account for PEG single-unit responses to natural sounds better than bypassing it and considering as input the auditory spectrogram. These results confirm with explicit details the presumed hierarchical organization of the auditory cortex.
Collapse
Affiliation(s)
- Shoutik Mukherjee
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
| | - Behtash Babadi
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
| | - Shihab Shamma
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Laboratoire des Systèmes Perceptifs, Department des Études Cognitive, École Normale Supériure, Paris Sciences et Lettres University, Paris, France
| |
Collapse
|
7
|
Lenc T, Lenoir C, Keller PE, Polak R, Mulders D, Nozaradan S. Measuring self-similarity in empirical signals to understand musical beat perception. Eur J Neurosci 2025; 61:e16637. [PMID: 39853878 PMCID: PMC11760665 DOI: 10.1111/ejn.16637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 10/15/2024] [Accepted: 11/26/2024] [Indexed: 01/26/2025]
Abstract
Experiencing music often entails the perception of a periodic beat. Despite being a widespread phenomenon across cultures, the nature and neural underpinnings of beat perception remain largely unknown. In the last decade, there has been a growing interest in developing methods to probe these processes, particularly to measure the extent to which beat-related information is contained in behavioral and neural responses. Here, we propose a theoretical framework and practical implementation of an analytic approach to capture beat-related periodicity in empirical signals using frequency-tagging. We highlight its sensitivity in measuring the extent to which the periodicity of a perceived beat is represented in a range of continuous time-varying signals with minimal assumptions. We also discuss a limitation of this approach with respect to its specificity when restricted to measuring beat-related periodicity only from the magnitude spectrum of a signal and introduce a novel extension of the approach based on autocorrelation to overcome this issue. We test the new autocorrelation-based method using simulated signals and by re-analyzing previously published data and show how it can be used to process measurements of brain activity as captured with surface EEG in adults and infants in response to rhythmic inputs. Taken together, the theoretical framework and related methodological advances confirm and elaborate the frequency-tagging approach as a promising window into the processes underlying beat perception and, more generally, temporally coordinated behaviors.
Collapse
Affiliation(s)
- Tomas Lenc
- Institute of Neuroscience (IONS), UCLouvainBrusselsBelgium
- Basque Center on Cognition, Brain and Language (BCBL)Donostia‐San SebastianSpain
| | - Cédric Lenoir
- Institute of Neuroscience (IONS), UCLouvainBrusselsBelgium
| | - Peter E. Keller
- MARCS Institute for Brain, Behaviour and DevelopmentWestern Sydney UniversitySydneyAustralia
- Center for Music in the Brain & Department of Clinical MedicineAarhus UniversityAarhusDenmark
| | - Rainer Polak
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and MotionUniversity of OsloOsloNorway
- Department of MusicologyUniversity of OsloOsloNorway
| | - Dounia Mulders
- Institute of Neuroscience (IONS), UCLouvainBrusselsBelgium
- Computational and Biological Learning Unit, Department of EngineeringUniversity of CambridgeCambridgeUK
- Institute for Information and Communication TechnologiesElectronics and Applied Mathematics, UCLouvainLouvain‐la‐NeuveBelgium
- Department of Brain and Cognitive Sciences and McGovern InstituteMassachusetts Institute of Technology (MIT)CambridgeMassachusettsUSA
| | - Sylvie Nozaradan
- Institute of Neuroscience (IONS), UCLouvainBrusselsBelgium
- International Laboratory for Brain, Music and Sound Research (BRAMS)MontrealCanada
| |
Collapse
|
8
|
Sohoglu E, Beckers L, Davis MH. Convergent neural signatures of speech prediction error are a biological marker for spoken word recognition. Nat Commun 2024; 15:9984. [PMID: 39557848 PMCID: PMC11574182 DOI: 10.1038/s41467-024-53782-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 10/17/2024] [Indexed: 11/20/2024] Open
Abstract
We use MEG and fMRI to determine how predictions are combined with speech input in superior temporal cortex. We compare neural responses to words in which first syllables strongly or weakly predict second syllables (e.g., "bingo", "snigger" versus "tango", "meagre"). We further compare neural responses to the same second syllables when predictions mismatch with input during pseudoword perception (e.g., "snigo" and "meago"). Neural representations of second syllables are suppressed by strong predictions when predictions match sensory input but show the opposite effect when predictions mismatch. Computational simulations show that this interaction is consistent with prediction error but not alternative (sharpened signal) computations. Neural signatures of prediction error are observed 200 ms after second syllable onset and in early auditory regions (bilateral Heschl's gyrus and STG). These findings demonstrate prediction error computations during the identification of familiar spoken words and perception of unfamiliar pseudowords.
Collapse
Affiliation(s)
- Ediz Sohoglu
- School of Psychology, University of Sussex, Brighton, UK.
| | - Loes Beckers
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
- Cochlear Ltd., Mechelen, Belgium
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK.
| |
Collapse
|
9
|
Hayashi M, Kida T, Inui K. Segmentation window of speech information processing in the human auditory cortex. Sci Rep 2024; 14:25044. [PMID: 39448758 PMCID: PMC11502806 DOI: 10.1038/s41598-024-76137-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Accepted: 10/10/2024] [Indexed: 10/26/2024] Open
Abstract
Humans perceive continuous speech signals as discrete sequences. To clarify the temporal segmentation window of speech information processing in the human auditory cortex, the relationship between speech perception and cortical responses was investigated using auditory evoked magnetic fields (AEFs). AEFs were measured while participants heard synthetic Japanese words /atataka/. There were eight types of /atataka/ with different speech rates. The durations of the words ranged from 75 to 600 ms. The results revealed a clear correlation between the AEFs and syllables. Specifically, when the durations of the words were between 375 and 600 ms, the evoked responses exhibited four clear responses from the superior temporal area, M100, that corresponded not only to the onset of speech but also to each group of consonant/vowel syllable units. The number of evoked M100 responses was correlated to the duration of the stimulus as well as the number of perceived syllables. The approximate range of the temporal segmentation window limit of speech perception was considered to be between 75 and 94 ms. This finding may contribute to optimizing the temporal performance of high-speed synthesized speech generation systems.
Collapse
Affiliation(s)
- Minoru Hayashi
- Department of Interdisciplinary Science and Engineering, School of Science and Engineering, Meisei University, Tokyo, 191-8506, Japan.
| | - Tetsuo Kida
- Department of Functioning and Disability, Institute for Developmental Research, Aichi Developmental Disability Center, Kasugai, Japan
- Section of Brain Function Information, National Institute for Physiological Sciences, Okazaki, Japan
| | - Koji Inui
- Department of Functioning and Disability, Institute for Developmental Research, Aichi Developmental Disability Center, Kasugai, Japan
- Section of Brain Function Information, National Institute for Physiological Sciences, Okazaki, Japan
| |
Collapse
|
10
|
Regev TI, Casto C, Hosseini EA, Adamek M, Ritaccio AL, Willie JT, Brunner P, Fedorenko E. Neural populations in the language network differ in the size of their temporal receptive windows. Nat Hum Behav 2024; 8:1924-1942. [PMID: 39187713 DOI: 10.1038/s41562-024-01944-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 07/03/2024] [Indexed: 08/28/2024]
Abstract
Despite long knowing what brain areas support language comprehension, our knowledge of the neural computations that these frontal and temporal regions implement remains limited. One important unresolved question concerns functional differences among the neural populations that comprise the language network. Here we leveraged the high spatiotemporal resolution of human intracranial recordings (n = 22) to examine responses to sentences and linguistically degraded conditions. We discovered three response profiles that differ in their temporal dynamics. These profiles appear to reflect different temporal receptive windows, with average windows of about 1, 4 and 6 words, respectively. Neural populations exhibiting these profiles are interleaved across the language network, which suggests that all language regions have direct access to distinct, multiscale representations of linguistic input-a property that may be critical for the efficiency and robustness of language processing.
Collapse
Affiliation(s)
- Tamar I Regev
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Colton Casto
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology (SHBT), Harvard University, Boston, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA.
| | - Eghbal A Hosseini
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Markus Adamek
- National Center for Adaptive Neurotechnologies, Albany, NY, USA
- Department of Neurosurgery, Washington University School of Medicine, St Louis, MO, USA
| | | | - Jon T Willie
- National Center for Adaptive Neurotechnologies, Albany, NY, USA
- Department of Neurosurgery, Washington University School of Medicine, St Louis, MO, USA
| | - Peter Brunner
- National Center for Adaptive Neurotechnologies, Albany, NY, USA
- Department of Neurosurgery, Washington University School of Medicine, St Louis, MO, USA
- Department of Neurology, Albany Medical College, Albany, NY, USA
| | - Evelina Fedorenko
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology (SHBT), Harvard University, Boston, MA, USA.
| |
Collapse
|
11
|
Norman-Haignere SV, Keshishian MK, Devinsky O, Doyle W, McKhann GM, Schevon CA, Flinker A, Mesgarani N. Temporal integration in human auditory cortex is predominantly yoked to absolute time, not structure duration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.23.614358. [PMID: 39386565 PMCID: PMC11463558 DOI: 10.1101/2024.09.23.614358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Sound structures such as phonemes and words have highly variable durations. Thus, there is a fundamental difference between integrating across absolute time (e.g., 100 ms) vs. sound structure (e.g., phonemes). Auditory and cognitive models have traditionally cast neural integration in terms of time and structure, respectively, but the extent to which cortical computations reflect time or structure remains unknown. To answer this question, we rescaled the duration of all speech structures using time stretching/compression and measured integration windows in the human auditory cortex using a new experimental/computational method applied to spatiotemporally precise intracranial recordings. We observed significantly longer integration windows for stretched speech, but this lengthening was very small (~5%) relative to the change in structure durations, even in non-primary regions strongly implicated in speech-specific processing. These findings demonstrate that time-yoked computations dominate throughout the human auditory cortex, placing important constraints on neurocomputational models of structure processing.
Collapse
Affiliation(s)
- Sam V Norman-Haignere
- University of Rochester Medical Center, Department of Biostatistics and Computational Biology
- University of Rochester Medical Center, Department of Neuroscience
- University of Rochester, Department of Brain and Cognitive Sciences
- University of Rochester, Department of Biomedical Engineering
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
| | - Menoua K. Keshishian
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
- Department of Electrical Engineering, Columbia University
| | - Orrin Devinsky
- Department of Neurology, NYU Langone Medical Center
- Comprehensive Epilepsy Center, NYU Langone Medical Center
| | - Werner Doyle
- Comprehensive Epilepsy Center, NYU Langone Medical Center
- Department of Neurosurgery, NYU Langone Medical Center
| | - Guy M. McKhann
- Department of Neurological Surgery, Columbia University Irving Medical Center
| | | | - Adeen Flinker
- Department of Neurology, NYU Langone Medical Center
- Comprehensive Epilepsy Center, NYU Langone Medical Center
- Department of Biomedical Engineering, NYU Tandon School of Engineering
| | - Nima Mesgarani
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
- Department of Electrical Engineering, Columbia University
| |
Collapse
|
12
|
Chandra NK, Sitek KR, Chandrasekaran B, Sarkar A. Functional connectivity across the human subcortical auditory system using an autoregressive matrix-Gaussian copula graphical model approach with partial correlations. IMAGING NEUROSCIENCE (CAMBRIDGE, MASS.) 2024; 2:10.1162/imag_a_00258. [PMID: 39421593 PMCID: PMC11485223 DOI: 10.1162/imag_a_00258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
The auditory system comprises multiple subcortical brain structures that process and refine incoming acoustic signals along the primary auditory pathway. Due to technical limitations of imaging small structures deep inside the brain, most of our knowledge of the subcortical auditory system is based on research in animal models using invasive methodologies. Advances in ultrahigh-field functional magnetic resonance imaging (fMRI) acquisition have enabled novel noninvasive investigations of the human auditory subcortex, including fundamental features of auditory representation such as tonotopy and periodotopy. However, functional connectivity across subcortical networks is still underexplored in humans, with ongoing development of related methods. Traditionally, functional connectivity is estimated from fMRI data with full correlation matrices. However, partial correlations reveal the relationship between two regions after removing the effects of all other regions, reflecting more direct connectivity. Partial correlation analysis is particularly promising in the ascending auditory system, where sensory information is passed in an obligatory manner, from nucleus to nucleus up the primary auditory pathway, providing redundant but also increasingly abstract representations of auditory stimuli. While most existing methods for learning conditional dependency structures based on partial correlations assume independently and identically Gaussian distributed data, fMRI data exhibit significant deviations from Gaussianity as well as high-temporal autocorrelation. In this paper, we developed an autoregressive matrix-Gaussian copula graphical model (ARMGCGM) approach to estimate the partial correlations and thereby infer the functional connectivity patterns within the auditory system while appropriately accounting for autocorrelations between successive fMRI scans. Our results show strong positive partial correlations between successive structures in the primary auditory pathway on each side (left and right), including between auditory midbrain and thalamus, and between primary and associative auditory cortex. These results are highly stable when splitting the data in halves according to the acquisition schemes and computing partial correlations separately for each half of the data, as well as across cross-validation folds. In contrast, full correlation-based analysis identified a rich network of interconnectivity that was not specific to adjacent nodes along the pathway. Overall, our results demonstrate that unique functional connectivity patterns along the auditory pathway are recoverable using novel connectivity approaches and that our connectivity methods are reliable across multiple acquisitions.
Collapse
Affiliation(s)
- Noirrit Kiran Chandra
- The University of Texas at Dallas, Department of Mathematical Sciences, Richardson, TX 76010, USA
| | - Kevin R. Sitek
- Northwestern University, Roxelyn and Richard Pepper Department of Communication Sciences and Disorders, Evanston, IL 60208, USA
| | - Bharath Chandrasekaran
- Northwestern University, Roxelyn and Richard Pepper Department of Communication Sciences and Disorders, Evanston, IL 60208, USA
| | - Abhra Sarkar
- The University of Texas at Austin, Department of Statistics and Data Sciences, Austin, TX 78712, USA
| |
Collapse
|
13
|
Chalas N, Meyer L, Lo CW, Park H, Kluger DS, Abbasi O, Kayser C, Nitsch R, Gross J. Dissociating prosodic from syntactic delta activity during natural speech comprehension. Curr Biol 2024; 34:3537-3549.e5. [PMID: 39047734 DOI: 10.1016/j.cub.2024.06.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/24/2024] [Accepted: 06/27/2024] [Indexed: 07/27/2024]
Abstract
Decoding human speech requires the brain to segment the incoming acoustic signal into meaningful linguistic units, ranging from syllables and words to phrases. Integrating these linguistic constituents into a coherent percept sets the root of compositional meaning and hence understanding. One important cue for segmentation in natural speech is prosodic cues, such as pauses, but their interplay with higher-level linguistic processing is still unknown. Here, we dissociate the neural tracking of prosodic pauses from the segmentation of multi-word chunks using magnetoencephalography (MEG). We find that manipulating the regularity of pauses disrupts slow speech-brain tracking bilaterally in auditory areas (below 2 Hz) and in turn increases left-lateralized coherence of higher-frequency auditory activity at speech onsets (around 25-45 Hz). Critically, we also find that multi-word chunks-defined as short, coherent bundles of inter-word dependencies-are processed through the rhythmic fluctuations of low-frequency activity (below 2 Hz) bilaterally and independently of prosodic cues. Importantly, low-frequency alignment at chunk onsets increases the accuracy of an encoding model in bilateral auditory and frontal areas while controlling for the effect of acoustics. Our findings provide novel insights into the neural basis of speech perception, demonstrating that both acoustic features (prosodic cues) and abstract linguistic processing at the multi-word timescale are underpinned independently by low-frequency electrophysiological brain activity in the delta frequency range.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany; Institute for Translational Neuroscience, University of Münster, Münster, Germany.
| | - Lars Meyer
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Chia-Wen Lo
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Hyojin Park
- Centre for Human Brain Health (CHBH), School of Psychology, University of Birmingham, Birmingham, UK
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
| | - Robert Nitsch
- Institute for Translational Neuroscience, University of Münster, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
14
|
Rupp KM, Hect JL, Harford EE, Holt LL, Ghuman AS, Abel TJ. A hierarchy of processing complexity and timescales for natural sounds in human auditory cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.24.595822. [PMID: 38826304 PMCID: PMC11142240 DOI: 10.1101/2024.05.24.595822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Efficient behavior is supported by humans' ability to rapidly recognize acoustically distinct sounds as members of a common category. Within auditory cortex, there are critical unanswered questions regarding the organization and dynamics of sound categorization. Here, we performed intracerebral recordings in the context of epilepsy surgery as 20 patient-participants listened to natural sounds. We built encoding models to predict neural responses using features of these sounds extracted from different layers within a sound-categorization deep neural network (DNN). This approach yielded highly accurate models of neural responses throughout auditory cortex. The complexity of a cortical site's representation (measured by the depth of the DNN layer that produced the best model) was closely related to its anatomical location, with shallow, middle, and deep layers of the DNN associated with core (primary auditory cortex), lateral belt, and parabelt regions, respectively. Smoothly varying gradients of representational complexity also existed within these regions, with complexity increasing along a posteromedial-to-anterolateral direction in core and lateral belt, and along posterior-to-anterior and dorsal-to-ventral dimensions in parabelt. When we estimated the time window over which each recording site integrates information, we found shorter integration windows in core relative to lateral belt and parabelt. Lastly, we found a relationship between the length of the integration window and the complexity of information processing within core (but not lateral belt or parabelt). These findings suggest hierarchies of timescales and processing complexity, and their interrelationship, represent a functional organizational principle of the auditory stream that underlies our perception of complex, abstract auditory information.
Collapse
Affiliation(s)
- Kyle M. Rupp
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Jasmine L. Hect
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Emily E. Harford
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Lori L. Holt
- Department of Psychology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Avniel Singh Ghuman
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Taylor J. Abel
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
15
|
Hakonen M, Dahmani L, Lankinen K, Ren J, Barbaro J, Blazejewska A, Cui W, Kotlarz P, Li M, Polimeni JR, Turpin T, Uluç I, Wang D, Liu H, Ahveninen J. Individual connectivity-based parcellations reflect functional properties of human auditory cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576475. [PMID: 38293021 PMCID: PMC10827228 DOI: 10.1101/2024.01.20.576475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Neuroimaging studies of the functional organization of human auditory cortex have focused on group-level analyses to identify tendencies that represent the typical brain. Here, we mapped auditory areas of the human superior temporal cortex (STC) in 30 participants by combining functional network analysis and 1-mm isotropic resolution 7T functional magnetic resonance imaging (fMRI). Two resting-state fMRI sessions, and one or two auditory and audiovisual speech localizer sessions, were collected on 3-4 separate days. We generated a set of functional network-based parcellations from these data. Solutions with 4, 6, and 11 networks were selected for closer examination based on local maxima of Dice and Silhouette values. The resulting parcellation of auditory cortices showed high intraindividual reproducibility both between resting state sessions (Dice coefficient: 69-78%) and between resting state and task sessions (Dice coefficient: 62-73%). This demonstrates that auditory areas in STC can be reliably segmented into functional subareas. The interindividual variability was significantly larger than intraindividual variability (Dice coefficient: 57%-68%, p<0.001), indicating that the parcellations also captured meaningful interindividual variability. The individual-specific parcellations yielded the highest alignment with task response topographies, suggesting that individual variability in parcellations reflects individual variability in auditory function. Connectional homogeneity within networks was also highest for the individual-specific parcellations. Furthermore, the similarity in the functional parcellations was not explainable by the similarity of macroanatomical properties of auditory cortex. Our findings suggest that individual-level parcellations capture meaningful idiosyncrasies in auditory cortex organization.
Collapse
Affiliation(s)
- M Hakonen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital Charlestown, MA, USA
- Department of Radiology, Harvard Medical School, Boston, MA, USA
| | - L Dahmani
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital Charlestown, MA, USA
- Department of Radiology, Harvard Medical School, Boston, MA, USA
| | - K Lankinen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital Charlestown, MA, USA
- Department of Radiology, Harvard Medical School, Boston, MA, USA
| | - J Ren
- Division of Brain Sciences, Changping Laboratory, Beijing, China
| | - J Barbaro
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital Charlestown, MA, USA
| | - A Blazejewska
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital Charlestown, MA, USA
- Department of Radiology, Harvard Medical School, Boston, MA, USA
| | - W Cui
- Division of Brain Sciences, Changping Laboratory, Beijing, China
| | - P Kotlarz
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital Charlestown, MA, USA
| | - M Li
- Division of Brain Sciences, Changping Laboratory, Beijing, China
| | - J R Polimeni
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital Charlestown, MA, USA
- Department of Radiology, Harvard Medical School, Boston, MA, USA
- Harvard-MIT Program in Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - T Turpin
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital Charlestown, MA, USA
| | - I Uluç
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital Charlestown, MA, USA
- Department of Radiology, Harvard Medical School, Boston, MA, USA
| | - D Wang
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital Charlestown, MA, USA
- Department of Radiology, Harvard Medical School, Boston, MA, USA
| | - H Liu
- Division of Brain Sciences, Changping Laboratory, Beijing, China
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
| | - J Ahveninen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital Charlestown, MA, USA
- Department of Radiology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
16
|
Kim SG, De Martino F, Overath T. Linguistic modulation of the neural encoding of phonemes. Cereb Cortex 2024; 34:bhae155. [PMID: 38687241 PMCID: PMC11059272 DOI: 10.1093/cercor/bhae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 05/02/2024] Open
Abstract
Speech comprehension entails the neural mapping of the acoustic speech signal onto learned linguistic units. This acousto-linguistic transformation is bi-directional, whereby higher-level linguistic processes (e.g. semantics) modulate the acoustic analysis of individual linguistic units. Here, we investigated the cortical topography and linguistic modulation of the most fundamental linguistic unit, the phoneme. We presented natural speech and "phoneme quilts" (pseudo-randomly shuffled phonemes) in either a familiar (English) or unfamiliar (Korean) language to native English speakers while recording functional magnetic resonance imaging. This allowed us to dissociate the contribution of acoustic vs. linguistic processes toward phoneme analysis. We show that (i) the acoustic analysis of phonemes is modulated by linguistic analysis and (ii) that for this modulation, both of acoustic and phonetic information need to be incorporated. These results suggest that the linguistic modulation of cortical sensitivity to phoneme classes minimizes prediction error during natural speech perception, thereby aiding speech comprehension in challenging listening situations.
Collapse
Affiliation(s)
- Seung-Goo Kim
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Federico De Martino
- Faculty of Psychology and Neuroscience, University of Maastricht, Universiteitssingel 40, 6229 ER Maastricht, Netherlands
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Duke Institute for Brain Sciences, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Center for Cognitive Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
| |
Collapse
|
17
|
Karimi-Rouzbahani H. Evidence for Multiscale Multiplexed Representation of Visual Features in EEG. Neural Comput 2024; 36:412-436. [PMID: 38363657 DOI: 10.1162/neco_a_01649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 12/01/2023] [Indexed: 02/18/2024]
Abstract
Distinct neural processes such as sensory and memory processes are often encoded over distinct timescales of neural activations. Animal studies have shown that this multiscale coding strategy is also implemented for individual components of a single process, such as individual features of a multifeature stimulus in sensory coding. However, the generalizability of this encoding strategy to the human brain has remained unclear. We asked if individual features of visual stimuli were encoded over distinct timescales. We applied a multiscale time-resolved decoding method to electroencephalography (EEG) collected from human subjects presented with grating visual stimuli to estimate the timescale of individual stimulus features. We observed that the orientation and color of the stimuli were encoded in shorter timescales, whereas spatial frequency and the contrast of the same stimuli were encoded in longer timescales. The stimulus features appeared in temporally overlapping windows along the trial supporting a multiplexed coding strategy. These results provide evidence for a multiplexed, multiscale coding strategy in the human visual system.
Collapse
Affiliation(s)
- Hamid Karimi-Rouzbahani
- Neurosciences Centre, Mater Hospital, Brisbane 4101, Australia
- Queensland Brain Institute, University of Queensland, Brisbane 4067, Australia
- Mater Research Institute, University of Queensland, Brisbane 4101, Australia
| |
Collapse
|
18
|
Young MJ, Fecchio M, Bodien YG, Edlow BL. Covert cortical processing: a diagnosis in search of a definition. Neurosci Conscious 2024; 2024:niad026. [PMID: 38327828 PMCID: PMC10849751 DOI: 10.1093/nc/niad026] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 10/22/2023] [Accepted: 12/10/2023] [Indexed: 02/09/2024] Open
Abstract
Historically, clinical evaluation of unresponsive patients following brain injury has relied principally on serial behavioral examination to search for emerging signs of consciousness and track recovery. Advances in neuroimaging and electrophysiologic techniques now enable clinicians to peer into residual brain functions even in the absence of overt behavioral signs. These advances have expanded clinicians' ability to sub-stratify behaviorally unresponsive and seemingly unaware patients following brain injury by querying and classifying covert brain activity made evident through active or passive neuroimaging or electrophysiologic techniques, including functional MRI, electroencephalography (EEG), transcranial magnetic stimulation-EEG, and positron emission tomography. Clinical research has thus reciprocally influenced clinical practice, giving rise to new diagnostic categories including cognitive-motor dissociation (i.e. 'covert consciousness') and covert cortical processing (CCP). While covert consciousness has received extensive attention and study, CCP is relatively less understood. We describe that CCP is an emerging and clinically relevant state of consciousness marked by the presence of intact association cortex responses to environmental stimuli in the absence of behavioral evidence of stimulus processing. CCP is not a monotonic state but rather encapsulates a spectrum of possible association cortex responses from rudimentary to complex and to a range of possible stimuli. In constructing a roadmap for this evolving field, we emphasize that efforts to inform clinicians, philosophers, and researchers of this condition are crucial. Along with strategies to sensitize diagnostic criteria and disorders of consciousness nosology to these vital discoveries, democratizing access to the resources necessary for clinical identification of CCP is an emerging clinical and ethical imperative.
Collapse
Affiliation(s)
- Michael J Young
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA 02114, USA
| | - Matteo Fecchio
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA 02114, USA
| | - Yelena G Bodien
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA 02114, USA
- Department of Physical Medicine and Rehabilitation, Spaulding Rehabilitation Hospital, Harvard Medical School, 300 1st Ave, Charlestown, Boston, MA 02129, USA
| | - Brian L Edlow
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA 02114, USA
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School, 149 13th St, Charlestown, Charlestown, MA 02129, USA
| |
Collapse
|
19
|
Tuckute G, Feather J, Boebinger D, McDermott JH. Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions. PLoS Biol 2023; 21:e3002366. [PMID: 38091351 PMCID: PMC10718467 DOI: 10.1371/journal.pbio.3002366] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 10/06/2023] [Indexed: 12/18/2023] Open
Abstract
Models that predict brain responses to stimuli provide one measure of understanding of a sensory system and have many potential applications in science and engineering. Deep artificial neural networks have emerged as the leading such predictive models of the visual system but are less explored in audition. Prior work provided examples of audio-trained neural networks that produced good predictions of auditory cortical fMRI responses and exhibited correspondence between model stages and brain regions, but left it unclear whether these results generalize to other neural network models and, thus, how to further improve models in this domain. We evaluated model-brain correspondence for publicly available audio neural network models along with in-house models trained on 4 different tasks. Most tested models outpredicted standard spectromporal filter-bank models of auditory cortex and exhibited systematic model-brain correspondence: Middle stages best predicted primary auditory cortex, while deep stages best predicted non-primary cortex. However, some state-of-the-art models produced substantially worse brain predictions. Models trained to recognize speech in background noise produced better brain predictions than models trained to recognize speech in quiet, potentially because hearing in noise imposes constraints on biological auditory representations. The training task influenced the prediction quality for specific cortical tuning properties, with best overall predictions resulting from models trained on multiple tasks. The results generally support the promise of deep neural networks as models of audition, though they also indicate that current models do not explain auditory cortical responses in their entirety.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Jenelle Feather
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
- University of Rochester Medical Center, Rochester, New York, New York, United States of America
| | - Josh H. McDermott
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
20
|
Skrill D, Norman-Haignere SV. Large language models transition from integrating across position-yoked, exponential windows to structure-yoked, power-law windows. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2023; 36:638-654. [PMID: 38434255 PMCID: PMC10907028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
Modern language models excel at integrating across long temporal scales needed to encode linguistic meaning and show non-trivial similarities to biological neural systems. Prior work suggests that human brain responses to language exhibit hierarchically organized "integration windows" that substantially constrain the overall influence of an input token (e.g., a word) on the neural response. However, little prior work has attempted to use integration windows to characterize computations in large language models (LLMs). We developed a simple word-swap procedure for estimating integration windows from black-box language models that does not depend on access to gradients or knowledge of the model architecture (e.g., attention weights). Using this method, we show that trained LLMs exhibit stereotyped integration windows that are well-fit by a convex combination of an exponential and a power-law function, with a partial transition from exponential to power-law dynamics across network layers. We then introduce a metric for quantifying the extent to which these integration windows vary with structural boundaries (e.g., sentence boundaries), and using this metric, we show that integration windows become increasingly yoked to structure at later network layers. None of these findings were observed in an untrained model, which as expected integrated uniformly across its input. These results suggest that LLMs learn to integrate information in natural language using a stereotyped pattern: integrating across position-yoked, exponential windows at early layers, followed by structure-yoked, power-law windows at later layers. The methods we describe in this paper provide a general-purpose toolkit for understanding temporal integration in language models, facilitating cross-disciplinary research at the intersection of biological and artificial intelligence.
Collapse
Affiliation(s)
- David Skrill
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY 14642
| | - Sam V Norman-Haignere
- Depts. of Biostatistics and Computational Biology, Neuroscience, University of Rochester Medical Center, Rochester, NY 14642
- Depts. of Brain and Cognitive Sciences, Biomedical Engineering, University of Rochester, Rochester, NY 14642
| |
Collapse
|
21
|
Kothinti SR, Elhilali M. Are acoustics enough? Semantic effects on auditory salience in natural scenes. Front Psychol 2023; 14:1276237. [PMID: 38098516 PMCID: PMC10720592 DOI: 10.3389/fpsyg.2023.1276237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/10/2023] [Indexed: 12/17/2023] Open
Abstract
Auditory salience is a fundamental property of a sound that allows it to grab a listener's attention regardless of their attentional state or behavioral goals. While previous research has shed light on acoustic factors influencing auditory salience, the semantic dimensions of this phenomenon have remained relatively unexplored owing both to the complexity of measuring salience in audition as well as limited focus on complex natural scenes. In this study, we examine the relationship between acoustic, contextual, and semantic attributes and their impact on the auditory salience of natural audio scenes using a dichotic listening paradigm. The experiments present acoustic scenes in forward and backward directions; the latter allows to diminish semantic effects, providing a counterpoint to the effects observed in forward scenes. The behavioral data collected from a crowd-sourced platform reveal a striking convergence in temporal salience maps for certain sound events, while marked disparities emerge in others. Our main hypothesis posits that differences in the perceptual salience of events are predominantly driven by semantic and contextual cues, particularly evident in those cases displaying substantial disparities between forward and backward presentations. Conversely, events exhibiting a high degree of alignment can largely be attributed to low-level acoustic attributes. To evaluate this hypothesis, we employ analytical techniques that combine rich low-level mappings from acoustic profiles with high-level embeddings extracted from a deep neural network. This integrated approach captures both acoustic and semantic attributes of acoustic scenes along with their temporal trajectories. The results demonstrate that perceptual salience is a careful interplay between low-level and high-level attributes that shapes which moments stand out in a natural soundscape. Furthermore, our findings underscore the important role of longer-term context as a critical component of auditory salience, enabling us to discern and adapt to temporal regularities within an acoustic scene. The experimental and model-based validation of semantic factors of salience paves the way for a complete understanding of auditory salience. Ultimately, the empirical and computational analyses have implications for developing large-scale models for auditory salience and audio analytics.
Collapse
Affiliation(s)
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD, United States
| |
Collapse
|
22
|
López Espejo M, David SV. A sparse code for natural sound context in auditory cortex. CURRENT RESEARCH IN NEUROBIOLOGY 2023; 6:100118. [PMID: 38152461 PMCID: PMC10749876 DOI: 10.1016/j.crneur.2023.100118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/27/2023] [Accepted: 11/14/2023] [Indexed: 12/29/2023] Open
Abstract
Accurate sound perception can require integrating information over hundreds of milliseconds or even seconds. Spectro-temporal models of sound coding by single neurons in auditory cortex indicate that the majority of sound-evoked activity can be attributed to stimuli with a few tens of milliseconds. It remains uncertain how the auditory system integrates information about sensory context on a longer timescale. Here we characterized long-lasting contextual effects in auditory cortex (AC) using a diverse set of natural sound stimuli. We measured context effects as the difference in a neuron's response to a single probe sound following two different context sounds. Many AC neurons showed context effects lasting longer than the temporal window of a traditional spectro-temporal receptive field. The duration and magnitude of context effects varied substantially across neurons and stimuli. This diversity of context effects formed a sparse code across the neural population that encoded a wider range of contexts than any constituent neuron. Encoding model analysis indicates that context effects can be explained by activity in the local neural population, suggesting that recurrent local circuits support a long-lasting representation of sensory context in auditory cortex.
Collapse
Affiliation(s)
- Mateo López Espejo
- Neuroscience Graduate Program, Oregon Health & Science University, Portland, OR, USA
| | - Stephen V. David
- Otolaryngology, Oregon Health & Science University, Portland, OR, USA
| |
Collapse
|
23
|
Harris I, Niven EC, Griffin A, Scott SK. Is song processing distinct and special in the auditory cortex? Nat Rev Neurosci 2023; 24:711-722. [PMID: 37783820 DOI: 10.1038/s41583-023-00743-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/30/2023] [Indexed: 10/04/2023]
Abstract
Is the singing voice processed distinctively in the human brain? In this Perspective, we discuss what might distinguish song processing from speech processing in light of recent work suggesting that some cortical neuronal populations respond selectively to song and we outline the implications for our understanding of auditory processing. We review the literature regarding the neural and physiological mechanisms of song production and perception and show that this provides evidence for key differences between song and speech processing. We conclude by discussing the significance of the notion that song processing is special in terms of how this might contribute to theories of the neurobiological origins of vocal communication and to our understanding of the neural circuitry underlying sound processing in the human cortex.
Collapse
Affiliation(s)
- Ilana Harris
- Institute of Cognitive Neuroscience, University College London, London, UK
| | - Efe C Niven
- Institute of Cognitive Neuroscience, University College London, London, UK
| | - Alex Griffin
- Department of Psychology, University of Cambridge, Cambridge, UK
| | - Sophie K Scott
- Institute of Cognitive Neuroscience, University College London, London, UK.
| |
Collapse
|
24
|
Purandare C, Mehta M. Mega-scale movie-fields in the mouse visuo-hippocampal network. eLife 2023; 12:RP85069. [PMID: 37910428 PMCID: PMC10619982 DOI: 10.7554/elife.85069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2023] Open
Abstract
Natural visual experience involves a continuous series of related images while the subject is immobile. How does the cortico-hippocampal circuit process a visual episode? The hippocampus is crucial for episodic memory, but most rodent single unit studies require spatial exploration or active engagement. Hence, we investigated neural responses to a silent movie (Allen Brain Observatory) in head-fixed mice without any task or locomotion demands, or rewards. Surprisingly, a third (33%, 3379/10263) of hippocampal -dentate gyrus, CA3, CA1 and subiculum- neurons showed movie-selectivity, with elevated firing in specific movie sub-segments, termed movie-fields, similar to the vast majority of thalamo-cortical (LGN, V1, AM-PM) neurons (97%, 6554/6785). Movie-tuning remained intact in immobile or spontaneously running mice. Visual neurons had >5 movie-fields per cell, but only ~2 in hippocampus. The movie-field durations in all brain regions spanned an unprecedented 1000-fold range: from 0.02s to 20s, termed mega-scale coding. Yet, the total duration of all the movie-fields of a cell was comparable across neurons and brain regions. The hippocampal responses thus showed greater continuous-sequence encoding than visual areas, as evidenced by fewer and broader movie-fields than in visual areas. Consistently, repeated presentation of the movie images in a fixed, but scrambled sequence virtually abolished hippocampal but not visual-cortical selectivity. The preference for continuous, compared to scrambled sequence was eight-fold greater in hippocampal than visual areas, further supporting episodic-sequence encoding. Movies could thus provide a unified way to probe neural mechanisms of episodic information processing and memory, even in immobile subjects, across brain regions, and species.
Collapse
Affiliation(s)
- Chinmay Purandare
- Department of Bioengineering, University of California, Los AngelesLos AngelesUnited States
- W.M. Keck Center for Neurophysics, Department of Physics and Astronomy, University of California, Los AngelesLos AngelesUnited States
- Department of Neurology, University of California, Los AngelesLos AngelesUnited States
| | - Mayank Mehta
- W.M. Keck Center for Neurophysics, Department of Physics and Astronomy, University of California, Los AngelesLos AngelesUnited States
- Department of Neurology, University of California, Los AngelesLos AngelesUnited States
- Department of Electrical and Computer Engineering, University of California, Los AngelesLos AngelesUnited States
| |
Collapse
|
25
|
Nocon JC, Gritton HJ, James NM, Mount RA, Qu Z, Han X, Sen K. Parvalbumin neurons enhance temporal coding and reduce cortical noise in complex auditory scenes. Commun Biol 2023; 6:751. [PMID: 37468561 PMCID: PMC10356822 DOI: 10.1038/s42003-023-05126-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 07/10/2023] [Indexed: 07/21/2023] Open
Abstract
Cortical representations supporting many cognitive abilities emerge from underlying circuits comprised of several different cell types. However, cell type-specific contributions to rate and timing-based cortical coding are not well-understood. Here, we investigated the role of parvalbumin neurons in cortical complex scene analysis. Many complex scenes contain sensory stimuli which are highly dynamic in time and compete with stimuli at other spatial locations. Parvalbumin neurons play a fundamental role in balancing excitation and inhibition in cortex and sculpting cortical temporal dynamics; yet their specific role in encoding complex scenes via timing-based coding, and the robustness of temporal representations to spatial competition, has not been investigated. Here, we address these questions in auditory cortex of mice using a cocktail party-like paradigm, integrating electrophysiology, optogenetic manipulations, and a family of spike-distance metrics, to dissect parvalbumin neurons' contributions towards rate and timing-based coding. We find that suppressing parvalbumin neurons degrades cortical discrimination of dynamic sounds in a cocktail party-like setting via changes in rapid temporal modulations in rate and spike timing, and over a wide range of time-scales. Our findings suggest that parvalbumin neurons play a critical role in enhancing cortical temporal coding and reducing cortical noise, thereby improving representations of dynamic stimuli in complex scenes.
Collapse
Affiliation(s)
- Jian Carlo Nocon
- Neurophotonics Center, Boston University, Boston, 02215, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, 02215, MA, USA
- Hearing Research Center, Boston University, Boston, 02215, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, 02215, MA, USA
| | - Howard J Gritton
- Department of Comparative Biosciences, University of Illinois, Urbana, 61820, IL, USA
- Department of Bioengineering, University of Illinois, Urbana, 61820, IL, USA
| | - Nicholas M James
- Neurophotonics Center, Boston University, Boston, 02215, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, 02215, MA, USA
- Hearing Research Center, Boston University, Boston, 02215, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, 02215, MA, USA
| | - Rebecca A Mount
- Neurophotonics Center, Boston University, Boston, 02215, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, 02215, MA, USA
- Hearing Research Center, Boston University, Boston, 02215, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, 02215, MA, USA
| | - Zhili Qu
- Department of Comparative Biosciences, University of Illinois, Urbana, 61820, IL, USA
- Department of Bioengineering, University of Illinois, Urbana, 61820, IL, USA
| | - Xue Han
- Neurophotonics Center, Boston University, Boston, 02215, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, 02215, MA, USA
- Hearing Research Center, Boston University, Boston, 02215, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, 02215, MA, USA
| | - Kamal Sen
- Neurophotonics Center, Boston University, Boston, 02215, MA, USA.
- Center for Systems Neuroscience, Boston University, Boston, 02215, MA, USA.
- Hearing Research Center, Boston University, Boston, 02215, MA, USA.
- Department of Biomedical Engineering, Boston University, Boston, 02215, MA, USA.
| |
Collapse
|
26
|
Thoret E, Ystad S, Kronland-Martinet R. Hearing as adaptive cascaded envelope interpolation. Commun Biol 2023; 6:671. [PMID: 37355702 PMCID: PMC10290642 DOI: 10.1038/s42003-023-05040-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/12/2023] [Indexed: 06/26/2023] Open
Abstract
The human auditory system is designed to capture and encode sounds from our surroundings and conspecifics. However, the precise mechanisms by which it adaptively extracts the most important spectro-temporal information from sounds are still not fully understood. Previous auditory models have explained sound encoding at the cochlear level using static filter banks, but this vision is incompatible with the nonlinear and adaptive properties of the auditory system. Here we propose an approach that considers the cochlear processes as envelope interpolations inspired by cochlear physiology. It unifies linear and nonlinear adaptive behaviors into a single comprehensive framework that provides a data-driven understanding of auditory coding. It allows simulating a broad range of psychophysical phenomena from virtual pitches and combination tones to consonance and dissonance of harmonic sounds. It further predicts the properties of the cochlear filters such as frequency selectivity. Here we propose a possible link between the parameters of the model and the density of hair cells on the basilar membrane. Cascaded Envelope Interpolation may lead to improvements in sound processing for hearing aids by providing a non-linear, data-driven, way to preprocessing of acoustic signals consistent with peripheral processes.
Collapse
Affiliation(s)
- Etienne Thoret
- Aix Marseille Univ, CNRS, UMR7061 PRISM, UMR7020 LIS, Marseille, France.
- Institute of Language, Communication, and the Brain (ILCB), Marseille, France.
| | - Sølvi Ystad
- CNRS, Aix Marseille Univ, UMR 7061 PRISM, Marseille, France
| | | |
Collapse
|
27
|
Lestang JH, Cai H, Averbeck BB, Cohen YE. Functional network properties of the auditory cortex. Hear Res 2023; 433:108768. [PMID: 37075536 PMCID: PMC10205700 DOI: 10.1016/j.heares.2023.108768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 03/27/2023] [Accepted: 04/11/2023] [Indexed: 04/21/2023]
Abstract
The auditory system transforms auditory stimuli from the external environment into perceptual auditory objects. Recent studies have focused on the contribution of the auditory cortex to this transformation. Other studies have yielded important insights into the contributions of neural activity in the auditory cortex to cognition and decision-making. However, despite this important work, the relationship between auditory-cortex activity and behavior/perception has not been fully elucidated. Two of the more important gaps in our understanding are (1) the specific and differential contributions of different fields of the auditory cortex to auditory perception and behavior and (2) the way networks of auditory neurons impact and facilitate auditory information processing. Here, we focus on recent work from non-human-primate models of hearing and review work related to these gaps and put forth challenges to further our understanding of how single-unit activity and network activity in different cortical fields contribution to behavior and perception.
Collapse
Affiliation(s)
- Jean-Hugues Lestang
- Departments of Otorhinolaryngology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Huaizhen Cai
- Departments of Otorhinolaryngology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Bruno B Averbeck
- Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Yale E Cohen
- Departments of Otorhinolaryngology, University of Pennsylvania, Philadelphia, PA 19104, USA; Neuroscience, University of Pennsylvania, Philadelphia, PA 19104, USA; Bioengineering, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
28
|
Cusinato R, Alnes SL, van Maren E, Boccalaro I, Ledergerber D, Adamantidis A, Imbach LL, Schindler K, Baud MO, Tzovara A. Intrinsic Neural Timescales in the Temporal Lobe Support an Auditory Processing Hierarchy. J Neurosci 2023; 43:3696-3707. [PMID: 37045604 PMCID: PMC10198454 DOI: 10.1523/jneurosci.1941-22.2023] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 02/21/2023] [Accepted: 03/02/2023] [Indexed: 04/14/2023] Open
Abstract
During rest, intrinsic neural dynamics manifest at multiple timescales, which progressively increase along visual and somatosensory hierarchies. Theoretically, intrinsic timescales are thought to facilitate processing of external stimuli at multiple stages. However, direct links between timescales at rest and sensory processing, as well as translation to the auditory system are lacking. Here, we measured intracranial EEG in 11 human patients with epilepsy (4 women), while listening to pure tones. We show that, in the auditory network, intrinsic neural timescales progressively increase, while the spectral exponent flattens, from temporal to entorhinal cortex, hippocampus, and amygdala. Within the neocortex, intrinsic timescales exhibit spatial gradients that follow the temporal lobe anatomy. Crucially, intrinsic timescales at baseline can explain the latency of auditory responses: as intrinsic timescales increase, so do the single-electrode response onset and peak latencies. Our results suggest that the human auditory network exhibits a repertoire of intrinsic neural dynamics, which manifest in cortical gradients with millimeter resolution and may provide a variety of temporal windows to support auditory processing.SIGNIFICANCE STATEMENT Endogenous neural dynamics are often characterized by their intrinsic timescales. These are thought to facilitate processing of external stimuli. However, a direct link between intrinsic timing at rest and sensory processing is missing. Here, with intracranial EEG, we show that intrinsic timescales progressively increase from temporal to entorhinal cortex, hippocampus, and amygdala. Intrinsic timescales at baseline can explain the variability in the timing of intracranial EEG responses to sounds: cortical electrodes with fast timescales also show fast- and short-lasting responses to auditory stimuli, which progressively increase in the hippocampus and amygdala. Our results suggest that a hierarchy of neural dynamics in the temporal lobe manifests across cortical and limbic structures and can explain the temporal richness of auditory responses.
Collapse
Affiliation(s)
- Riccardo Cusinato
- Institute of Computer Science, University of Bern, Bern 3012, Switzerland
- Center for Experimental Neurology, Sleep Wake Epilepsy Center, NeuroTec, Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | - Sigurd L Alnes
- Institute of Computer Science, University of Bern, Bern 3012, Switzerland
- Center for Experimental Neurology, Sleep Wake Epilepsy Center, NeuroTec, Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | - Ellen van Maren
- Center for Experimental Neurology, Sleep Wake Epilepsy Center, NeuroTec, Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | - Ida Boccalaro
- Center for Experimental Neurology, Sleep Wake Epilepsy Center, NeuroTec, Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | | | - Antoine Adamantidis
- Center for Experimental Neurology, Sleep Wake Epilepsy Center, NeuroTec, Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | - Lukas L Imbach
- Swiss Epilepsy Center, Klinik Lengg, Zurich 8008, Switzerland
| | - Kaspar Schindler
- Center for Experimental Neurology, Sleep Wake Epilepsy Center, NeuroTec, Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | - Maxime O Baud
- Center for Experimental Neurology, Sleep Wake Epilepsy Center, NeuroTec, Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
| | - Athina Tzovara
- Institute of Computer Science, University of Bern, Bern 3012, Switzerland
- Center for Experimental Neurology, Sleep Wake Epilepsy Center, NeuroTec, Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland
- Helen Wills Neuroscience Institute, University of California-Berkeley, Berkeley 94720, California
| |
Collapse
|
29
|
Kitazawa Y, Sonoda M, Sakakura K, Mitsuhashi T, Firestone E, Ueda R, Kambara T, Iwaki H, Luat AF, Marupudi NI, Sood S, Asano E. Intra- and inter-hemispheric network dynamics supporting object recognition and speech production. Neuroimage 2023; 270:119954. [PMID: 36828156 PMCID: PMC10112006 DOI: 10.1016/j.neuroimage.2023.119954] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/14/2023] [Accepted: 02/17/2023] [Indexed: 02/25/2023] Open
Abstract
We built normative brain atlases that animate millisecond-scale intra- and inter-hemispheric white matter-level connectivity dynamics supporting object recognition and speech production. We quantified electrocorticographic modulations during three naming tasks using event-related high-gamma activity from 1,114 nonepileptogenic intracranial electrodes (i.e., non-lesional areas unaffected by epileptiform discharges). Using this electrocorticography data, we visualized functional connectivity modulations defined as significant naming-related high-gamma modulations occurring simultaneously at two sites connected by direct white matter streamlines on diffusion-weighted imaging tractography. Immediately after stimulus onset, intra- and inter-hemispheric functional connectivity enhancements were confined mainly across modality-specific perceptual regions. During response preparation, left intra-hemispheric connectivity enhancements propagated in a posterior-to-anterior direction, involving the left precentral and prefrontal areas. After overt response onset, inter- and intra-hemispheric connectivity enhancements mainly encompassed precentral, postcentral, and superior-temporal (STG) gyri. We found task-specific connectivity enhancements during response preparation as follows. Picture naming enhanced activity along the left arcuate fasciculus between the inferior-temporal and precentral/posterior inferior-frontal (pIFG) gyri. Nonspeech environmental sound naming augmented functional connectivity via the left inferior longitudinal and fronto-occipital fasciculi between the medial-occipital and STG/pIFG. Auditory descriptive naming task enhanced usage of the left frontal U-fibers, involving the middle-frontal gyrus. Taken together, the commonly observed network enhancements include inter-hemispheric connectivity optimizing perceptual processing exerted in each hemisphere, left intra-hemispheric connectivity supporting semantic and lexical processing, and inter-hemispheric connectivity for symmetric oral movements during overt speech. Our atlases improve the currently available models of object recognition and speech production by adding neural dynamics via direct intra- and inter-hemispheric white matter tracts.
Collapse
Affiliation(s)
- Yu Kitazawa
- Department of Pediatrics, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA; Department of Neurology and Stroke Medicine, Yokohama City University, Yokohama, 2360004, Japan
| | - Masaki Sonoda
- Department of Pediatrics, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA; Department of Neurosurgery, Yokohama City University, Yokohama, 2360004, Japan
| | - Kazuki Sakakura
- Department of Pediatrics, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA; Department of Neurosurgery, University of Tsukuba, Tsukuba, 3058575, Japan
| | - Takumi Mitsuhashi
- Department of Pediatrics, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA; Department of Neurosurgery, Juntendo University, Tokyo, 1138421, Japan
| | - Ethan Firestone
- Department of Pediatrics, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA; Department of Physiology, Wayne State University, Detroit, 48201, USA
| | - Riyo Ueda
- Department of Pediatrics, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA
| | - Toshimune Kambara
- Department of Pediatrics, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA; Department of Psychology, Hiroshima University, Hiroshima, 7398524, Japan
| | - Hirotaka Iwaki
- Department of Pediatrics, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA; Department of Psychiatry, Hachinohe City Hospital, Hachinohe, 0318555, Japan
| | - Aimee F Luat
- Department of Pediatrics, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA; Department of Neurology, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA; Department of Pediatrics, Central Michigan University, Mount Pleasant, 48858, USA
| | - Neena I Marupudi
- Department of Neurosurgery, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA
| | - Sandeep Sood
- Department of Neurosurgery, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA
| | - Eishi Asano
- Department of Pediatrics, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA; Department of Neurology, Children's Hospital of Michigan, Wayne State University, Detroit, 48201, USA.
| |
Collapse
|
30
|
The importance of temporal-fine structure to perceive time-compressed speech with and without the restoration of the syllabic rhythm. Sci Rep 2023; 13:2874. [PMID: 36806145 PMCID: PMC9938863 DOI: 10.1038/s41598-023-29755-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 02/09/2023] [Indexed: 02/20/2023] Open
Abstract
Intelligibility of time-compressed (TC) speech decreases with increasing speech rate. However, intelligibility can be restored by 'repackaging' the TC speech by inserting silences between the syllables so that the original 'rhythm' is restored. Although restoration of the speech rhythm affects solely the temporal envelope, it is unclear to which extent repackaging also affects the perception of the temporal-fine structure (TFS). Here we investigate to which extent TFS contributes to the perception of TC and repackaged TC speech in quiet. Intelligibility of TC sentences with a speech rate of 15.6 syllables per second (sps) and the repackaged sentences, by adding 100 ms of silence between the syllables of the TC speech (i.e., a speech rate of 6.1 sps), was assessed for three TFS conditions: the original TFS and the TFS conveyed by an 8- and 16-channel noise vocoder. An overall positive effect on intelligibility of both the repackaging process and of the amount of TFS available to the listener was observed. Furthermore, the benefit associated with the repackaging TC speech depended on the amount of TFS available. The results show TFS contributes significantly to the perception of fast speech even when the overall rhythm/envelope of TC speech is restored.
Collapse
|
31
|
Mischler G, Keshishian M, Bickel S, Mehta AD, Mesgarani N. Deep neural networks effectively model neural adaptation to changing background noise and suggest nonlinear noise filtering methods in auditory cortex. Neuroimage 2023; 266:119819. [PMID: 36529203 PMCID: PMC10510744 DOI: 10.1016/j.neuroimage.2022.119819] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 11/28/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
The human auditory system displays a robust capacity to adapt to sudden changes in background noise, allowing for continuous speech comprehension despite changes in background environments. However, despite comprehensive studies characterizing this ability, the computations that underly this process are not well understood. The first step towards understanding a complex system is to propose a suitable model, but the classical and easily interpreted model for the auditory system, the spectro-temporal receptive field (STRF), cannot match the nonlinear neural dynamics involved in noise adaptation. Here, we utilize a deep neural network (DNN) to model neural adaptation to noise, illustrating its effectiveness at reproducing the complex dynamics at the levels of both individual electrodes and the cortical population. By closely inspecting the model's STRF-like computations over time, we find that the model alters both the gain and shape of its receptive field when adapting to a sudden noise change. We show that the DNN model's gain changes allow it to perform adaptive gain control, while the spectro-temporal change creates noise filtering by altering the inhibitory region of the model's receptive field. Further, we find that models of electrodes in nonprimary auditory cortex also exhibit noise filtering changes in their excitatory regions, suggesting differences in noise filtering mechanisms along the cortical hierarchy. These findings demonstrate the capability of deep neural networks to model complex neural adaptation and offer new hypotheses about the computations the auditory cortex performs to enable noise-robust speech perception in real-world, dynamic environments.
Collapse
Affiliation(s)
- Gavin Mischler
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United States
| | - Menoua Keshishian
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United States
| | - Stephan Bickel
- Hofstra Northwell School of Medicine, Manhasset, New York, United States
| | - Ashesh D Mehta
- Hofstra Northwell School of Medicine, Manhasset, New York, United States
| | - Nima Mesgarani
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United States.
| |
Collapse
|
32
|
Basiński K, Quiroga-Martinez DR, Vuust P. Temporal hierarchies in the predictive processing of melody - From pure tones to songs. Neurosci Biobehav Rev 2023; 145:105007. [PMID: 36535375 DOI: 10.1016/j.neubiorev.2022.105007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 11/30/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022]
Abstract
Listening to musical melodies is a complex task that engages perceptual and memoryrelated processes. The processes underlying melody cognition happen simultaneously on different timescales, ranging from milliseconds to minutes. Although attempts have been made, research on melody perception is yet to produce a unified framework of how melody processing is achieved in the brain. This may in part be due to the difficulty of integrating concepts such as perception, attention and memory, which pertain to different temporal scales. Recent theories on brain processing, which hold prediction as a fundamental principle, offer potential solutions to this problem and may provide a unifying framework for explaining the neural processes that enable melody perception on multiple temporal levels. In this article, we review empirical evidence for predictive coding on the levels of pitch formation, basic pitch-related auditory patterns,more complex regularity processing extracted from basic patterns and long-term expectations related to musical syntax. We also identify areas that would benefit from further inquiry and suggest future directions in research on musical melody perception.
Collapse
Affiliation(s)
- Krzysztof Basiński
- Division of Quality of Life Research, Medical University of Gdańsk, Poland
| | - David Ricardo Quiroga-Martinez
- Helen Wills Neuroscience Institute & Department of Psychology, University of California Berkeley, USA; Center for Music in the Brain, Aarhus University & The Royal Academy of Music, Denmark
| | - Peter Vuust
- Center for Music in the Brain, Aarhus University & The Royal Academy of Music, Denmark
| |
Collapse
|
33
|
Mansuri J, Aleem H, Grzywacz NM. Systematic errors in the perception of rhythm. Front Hum Neurosci 2022; 16:1009219. [DOI: 10.3389/fnhum.2022.1009219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/14/2022] [Indexed: 11/11/2022] Open
Abstract
One hypothesis for why humans enjoy musical rhythms relates to their prediction of when each beat should occur. The ability to predict the timing of an event is important from an evolutionary perspective. Therefore, our brains have evolved internal mechanisms for processing the progression of time. However, due to inherent noise in neural signals, this prediction is not always accurate. Theoretical considerations of optimal estimates suggest the occurrence of certain systematic errors made by the brain when estimating the timing of beats in rhythms. Here, we tested psychophysically whether these systematic errors exist and if so, how they depend on stimulus parameters. Our experimental data revealed two main types of systematic errors. First, observers perceived the time of the last beat of a rhythmic pattern as happening earlier than actual when the inter-beat interval was short. Second, the perceived time of the last beat was later than the actual when the inter-beat interval was long. The magnitude of these systematic errors fell as the number of beats increased. However, with many beats, the errors due to long inter-beat intervals became more apparent. We propose a Bayesian model for these systematic errors. The model fits these data well, allowing us to offer possible explanations for how these errors occurred. For instance, neural processes possibly contributing to the errors include noisy and temporally asymmetric impulse responses, priors preferring certain time intervals, and better-early-than-late loss functions. We finish this article with brief discussions of both the implications of systematic errors for the appreciation of rhythm and the possible compensation by the brain’s motor system during a musical performance.
Collapse
|
34
|
Rajendran T, Summa-Chadwick M. The scope and potential of music therapy in stroke rehabilitation. JOURNAL OF INTEGRATIVE MEDICINE 2022; 20:284-287. [PMID: 35534380 DOI: 10.1016/j.joim.2022.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 04/24/2022] [Indexed: 06/14/2023]
Abstract
There is a growing interest in the use of music therapy in neurological rehabilitation. Of all the major neurological illnesses, stroke rehabilitation has been observed to have some of the strongest potential for music therapy's beneficial effect. The current burden of stroke has raised the need to embrace novel, cost-effective, rehabilitation designs that will enhance the existing physical, occupation, and speech therapies. Music therapy addresses a broad spectrum of motor, speech, and cognitive deficits, as well as behavioral and emotional issues. Several music therapy designs have focused on gait, cognitive, and speech rehabilitation, but most of the existing randomized controlled trials based on these interventions have a high risk of bias and are statistically insignificant. More randomized controlled trials with greater number of participants are required to strengthen the current data. Fostering an open and informed dialogue between patients, healthcare providers, and music therapists may help increase quality of life, dispel fallacies, and guide patients to specific musical interventions.
Collapse
Affiliation(s)
- Tara Rajendran
- Department of Music, Faculty of Fine Arts, Annamalai University, Annamalai Nagar, Chidambaram, Tamil Nadu 608002, India.
| | - Martha Summa-Chadwick
- Music Therapy Gateway in Communications, Signal Mountain, Tennessee 37377, United States
| |
Collapse
|
35
|
Norman-Haignere SV, Feather J, Boebinger D, Brunner P, Ritaccio A, McDermott JH, Schalk G, Kanwisher N. A neural population selective for song in human auditory cortex. Curr Biol 2022; 32:1470-1484.e12. [PMID: 35196507 PMCID: PMC9092957 DOI: 10.1016/j.cub.2022.01.069] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 10/26/2021] [Accepted: 01/24/2022] [Indexed: 12/18/2022]
Abstract
How is music represented in the brain? While neuroimaging has revealed some spatial segregation between responses to music versus other sounds, little is known about the neural code for music itself. To address this question, we developed a method to infer canonical response components of human auditory cortex using intracranial responses to natural sounds, and further used the superior coverage of fMRI to map their spatial distribution. The inferred components replicated many prior findings, including distinct neural selectivity for speech and music, but also revealed a novel component that responded nearly exclusively to music with singing. Song selectivity was not explainable by standard acoustic features, was located near speech- and music-selective responses, and was also evident in individual electrodes. These results suggest that representations of music are fractionated into subpopulations selective for different types of music, one of which is specialized for the analysis of song.
Collapse
Affiliation(s)
- Sam V Norman-Haignere
- Zuckerman Institute, Columbia University, New York, NY, USA; HHMI Fellow of the Life Sciences Research Foundation, Chevy Chase, MD, USA; Laboratoire des Sytèmes Perceptifs, Département d'Études Cognitives, ENS, PSL University, CNRS, Paris, France; Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY, USA; Department of Neuroscience, University of Rochester Medical Center, Rochester, NY, USA; Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Jenelle Feather
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA; Center for Brains, Minds and Machines, Cambridge, MA, USA
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA; Program in Speech and Hearing Biosciences and Technology, Harvard University, Cambridge, MA, USA
| | - Peter Brunner
- Department of Neurology, Albany Medical College, Albany, NY, USA; National Center for Adaptive Neurotechnologies, Albany, NY, USA; Department of Neurosurgery, Washington University School of Medicine, St. Louis, MO, USA
| | - Anthony Ritaccio
- Department of Neurology, Albany Medical College, Albany, NY, USA; Department of Neurology, Mayo Clinic, Jacksonville, FL, USA
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA; Center for Brains, Minds and Machines, Cambridge, MA, USA; Program in Speech and Hearing Biosciences and Technology, Harvard University, Cambridge, MA, USA
| | - Gerwin Schalk
- Department of Neurology, Albany Medical College, Albany, NY, USA
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA; Center for Brains, Minds and Machines, Cambridge, MA, USA
| |
Collapse
|
36
|
Schmitt LM, Obleser J. What auditory cortex is waiting for. Nat Hum Behav 2022; 6:324-325. [PMID: 35145279 DOI: 10.1038/s41562-021-01262-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Lea-Maria Schmitt
- Department of Psychology, University of Lübeck, Lübeck, Germany.,Center of Brain, Behavior and Metabolism, University of Lübeck, Lübeck, Germany
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Lübeck, Germany. .,Center of Brain, Behavior and Metabolism, University of Lübeck, Lübeck, Germany.
| |
Collapse
|
37
|
Keshishian M, Norman-Haignere SV, Mesgarani N. Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2021; 34:24455-24467. [PMID: 38737583 PMCID: PMC11087060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/14/2024]
Abstract
Natural signals such as speech are hierarchically structured across many different timescales, spanning tens (e.g., phonemes) to hundreds (e.g., words) of milliseconds, each of which is highly variable and context-dependent. While deep neural networks (DNNs) excel at recognizing complex patterns from natural signals, relatively little is known about how DNNs flexibly integrate across multiple timescales. Here, we show how a recently developed method for studying temporal integration in biological neural systems - the temporal context invariance (TCI) paradigm - can be used to understand temporal integration in DNNs. The method is simple: we measure responses to a large number of stimulus segments presented in two different contexts and estimate the smallest segment duration needed to achieve a context invariant response. We applied our method to understand how the popular DeepSpeech2 model learns to integrate across time in speech. We find that nearly all of the model units, even in recurrent layers, have a compact integration window within which stimuli substantially alter the response and outside of which stimuli have little effect. We show that training causes these integration windows to shrink at early layers and expand at higher layers, creating a hierarchy of integration windows across the network. Moreover, by measuring integration windows for time-stretched/compressed speech, we reveal a transition point, midway through the trained network, where integration windows become yoked to the duration of stimulus structures (e.g., phonemes or words) rather than absolute time. Similar phenomena were observed in a purely recurrent and purely convolutional network although structure-yoked integration was more prominent in the recurrent network. These findings suggest that deep speech recognition systems use a common motif to encode the hierarchical structure of speech: integrating across short, time-yoked windows at early layers and long, structure-yoked windows at later layers. Our method provides a straightforward and general-purpose toolkit for understanding temporal integration in black-box machine learning models.
Collapse
Affiliation(s)
- Menoua Keshishian
- Department of Electrical Engineering, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027
| | - Sam V Norman-Haignere
- Department of Electrical Engineering, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027
| | - Nima Mesgarani
- Department of Electrical Engineering, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027
| |
Collapse
|